After completing high school, I seemed to run after the best colleges or universities in the country or abroad to complete my graduation in flying colours because that was necessary to land myself a good job or take the next step in my career. However, once I completed my graduation, I was unsure and did not know what to do next – whether to get placed with a big organization or continue higher studies.
While some decide on earning straightway, others tend to fulfill their dream of getting a management degree and completing their MBA from a reputed organization. The sky is the limit when you enroll in an MBA program full of belief, hope, desire, and enthusiasm. However, once the graduation time comes closer, we tend to be lost in our choices as to what to do next after acquiring an MBA degree.
Facts About MBA Graduates
At first, I would look into some statistics. Since 2010, the recent grad’s employer demand has reached its highest level as announced in its forward-looking Corporate Recruiters Survey by GMAC which is an organization that administers in the GMAT exam. In a survey which was conducted in the year, 2016 in the month of February and March stated that the recruiters prefer to hire recent graduates from the business schools which was about eighty-eight percent compared to the eighty percent of the companies that hired in the previous year.
In the current MBA market, jobs are still there and there is no reason to worry. However, you might not be satisfied with the job role or the company you are getting recruited for and hence you need to reskill yourself like I did for that particular role and be ready with the opportunity comes. You need to have patience and keeping improving your skills.
Have a Plan
All MBA graduates must have a plan, and hence not having a plan for an MBA graduate is not typical of. It’s a once in a lifetime experience that students get to pursue an MBA. Generally, in advance, students know what they would do after graduation. On the back of several years of work experience, and after careful consideration, the decision to do an MBA was taken. Most MBA graduates have a clear picture of their career and know what they want to do. It is not a wise strategy to decide what to do after obtaining an MBA degree.
I was fully aware of what I would want to achieve in life and was always looked after by the admission committee. Based on self-understanding, a clear thought out career strategy is what the admissions officers would want to see. Why you want to do an MBA and your goals after completing is what they are most interested in.
Based on the professional path of mine, the MBA program was considered by me. The right MBA programs could only be selected when in mind you have clear career goals. Whether an MBA meets your need or how your career could be boosted could only be known after that. On your own scale, each program value could be rated and the facts could be uncovered. For a newly minted MBA grad, it is very important to be realistic. Adam Heyler once said in his youtube channel that your CV could become credible and your network would be expanded if you have the MBA degree. But the lack of work experience would not certainly be made up the MBA degree. Time management is also an important factor which is taught by MBA.
The Post MBA Dilemma
The current job market possesses a tremendous challenge to every professional, even someone with a lucrative degree as MBA. Gone are those days when an MBA degree guarantees you a high paying job in a big firm. Nowadays, the wave of entrepreneurship has engulfed many and hence so many graduates are moving towards entrepreneurship and starting out their venture. However, it is not a piece of cake to be an entrepreneur in this competitive market and almost ninety percent of start-ups fail after its inception. I was not into entrepreneurship and choose to upgrade myself and follow my dream.
I pursued my MBA in Finance which is often a go-to choice for many students during their graduation certainly due to the prospects of working in major insurance or a banking company. I wanted to work as a Business Analyst, Risk Analyst, and so on. Thus it was pertinent for me to develop an analyst intuition and master the analytical tools such as SQL, Excel, Tableau, etc. If you are interested in working as a Decision Scientist or a Data Scientist, you need to upgrade your skills like me to more advanced skills like Machine Learning, Deep Learning and so on.
However, once I found the potential that data carries and the diverse nature of this field, I wanted to expand my horizons and work as a Data Science consultant in some big corporation and hence I started exploring other domains like Marketing, Human Resource and so on. MBA in Marketing is another such lucrative career with high post-graduation opportunity. Some of the work designation after completing MBA in marketing are – Research Manager or a Senior Analyst, Marketing Analyst, and so on. Data is the new oil and all marketing firms are using the unprecedented potential of data to market their product to the right customers and stay ahead in the race.
Being a Marketing Analyst, you would be responsible to gather data from various sources and thus having the skills of data collection or web scraping is very important. Additionally, I learned at least of data visualization tool like Excel, Tableau, Power BI and others to analyse the performance of different marketing camp gain which could be presented to the shareholders who would make the final business decision. Overall, it was about finding patterns in the data using various tools and ease the process of making decisions for the stakeholders.
MBA in Human Resource may not be as lucrative as the above two but certainly has its own share of value in terms of responsibility and decision making. Whether or not you have been employed as an HR after your graduation has no relation to the fact that you need to Master HR Analytics which I did that would help in dealing with employees.
As an HR professional, you would be engaged mostly in employee relations and thus it is necessary to understand the satisfaction level of each employee and deal with them separately. Onboarding a resource garners a huge amount of financial cost and hence predicting the attrition probability of an employee could avoid financial loss. Thus, data collection and machine learning are two of the important skills which I learned further along with my interpersonal skills.
Supply Chain Management has been in demand and I realized it is important to understand applications of Data Science in this regard because it could take me a long way in my career. The impact of supply chain dynamics could be analysed using the right analytical tools. Data could be collected and leveraged to identify the efficiency of the supply chain.
Additionally, the price fluctuations, commodities availability could also be analysed using Data. If you master Data Analytics like me, you could reduce the risk burden of an organization.
Healthcare management is another important field where students pursue an MBA which deals with practices related to the Healthcare industry. As Data Science had a vast application in the healthcare industry, I had to get my hands dirty and learn the nitty-gritty of analyzing a healthcare dataset. In the HealthCare careful usage of data could lead to ground-breaking achievements in the field of medical science. Applying analytics with relevant data could help in reducing medical cost and also channel the right medicine for a patient.
Deep Learning has made tremendous progress in the HealthCare industry and hence I took some time to understand the underlying working structure of neural networks. It could unearth hidden information from the patient’s data and help in prescribing the appropriate Medicare of the patient.
Though this was a generic overview of the skills I mastered for my own career aspirations after pursuing my MBA. In general, analytics is the need of the hour and every MBA graduate or each professional irrespective of the field they are in could certainly dive into this field without any prior relevant experience. In the beginning, I felt a bit overwhelmed by the vastness of the field but as I moved along I found it interesting and gradually get inclined towards the field.
Overall, along with the management skills, the technical expertise to deal with data and derive relevant information from it would land you a much higher role of a manager or a consultant in a firm which I eventually managed to achieve where you would be the decision maker for your team. Upskilling is very important in today’s world to stay relevant and keep in touch with the rapid advancement in technology.
Dimensionlesshas several blogs and training to get you started with Data Analytics and Data Science.
Now, in theory, it is possible to become a data scientist, without paying a dime. What we want to do in this article is to list out the best of the best options to learn what you need to know to become a data scientist. Many articles offer 4-5 courses under each heading. What I have done is to search through the Internet covering all free courses and choose the single best course for each topic.
These courses have been carefully curated and offer the best possible option if you’re learning for free. However – there’s a caveat. An interesting twist to this entire story. Interested? Read on! And please – make sure you complete the full article.
Topics For A Data Scientist Course
The basic topics that a data scientist needs to know are:
Machine Learning Theory and Applications
Statistics & Probability
Calculus Basics (short)
Machine Learning in Python
Machine Learning in R
So let’s get to it. Here is the list of the best possible options to learn every one of these topics, carefully selected and curated.
Machine Learning – Stanford University – Andrew Ng (audit option)
The world-famous course for machine learning with the highest rating of all the MOOCs in Coursera, from Andrew Ng, a giant in the ML field and now famous worldwide as an online instructor. Uses MATLAB/Octave. From the website:
This course provides a broad introduction to machine learning, data mining, and statistical pattern recognition. Topics include:
(ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning)
(iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)
The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
This course is extremely effective and has many benefits. However, you will need high levels of self-discipline and self-motivation. Statistics show that90% of those who sign up for a MOOC without a classroom or group environment never complete the course.
Learn Python The Hard Way – Zed Shaw – Free Online Access
You may ask me, why do I want to learn the hard way? Shouldn’t we learn the smart way and not the hard way? Don’t worry. This ebook, online course, and web site is a highly popular way to learn Python. Ok, so it says the hard way. Well, the only way to learn how to code is to practice what you have learned. This course integrates practice with learning. Other Python books you have to take the initiative to practice.
Here, this book shows you what to practice, how to practice. There is only one con here – although this is the best self-driven method, most people will not complete all of it. The main reason is that there is no external instructor for supervision and a group environment to motivate you. However, if you want to learn Python by yourself, then this is the best way. But not the optimal one, as you will see at the end of this article since the cost of the book is 30$ USD (2100 INR approx).
Interactive R and Data Science Programming – SwiRl
Swirlstats is a wonderful tool to learn R and data science scripting in R interactively and intuitively by teaching you R commands from within the R console. This might seem like a very simple tool, but as you use it, you will notice its elegance in teaching you literally how to express yourselves in R and the finer nuances of the language and integration with the console and tidyverse. This is a powerful method of learning R and what is more, it is also a lot of fun!
KhanAcademy is a free non-profit organization on a mission – they want to provide a world-class education to you regardless of where you may be in the world. And they’re doing a fantastic job! This course has been covered in several very high profile blogs and Quora posts as the best online course for statistics – period. What is more, it is extremely high quality and suitable for beginners – and – free! This organization is doing wonderful work. More power to them!
Mathematics for Data Science
Now the basic mathematics for data science content includes linear algebra, single-variable, discrete mathematics, and multivariable calculus (selected topics) and basics of differential equations. Now you could take all of these topics separately in KhanAcademy and that is a good option for Linear Algebra and Multivariate Calculus (in addition to Statistics and Probability).
For Linear Algebra, the link of what you need to know given in a course in KhanAcademy is given below:
These courses are completely free and very accessible to beginners.
This topic deserves a section to itself because discrete mathematics is the foundation of all computer science. There are a variety of options available to learn discrete mathematics, from ebooks to MOOCs, but today, we’ll focus on the best possible option. MIT (Massachusetts Institute of Technology) is known as one of the best colleges in the world and they have an Open information initiative known as MIT OpenCourseWare (MIT OCW). These are actual videos of the lectures taken by the students at one of the best engineering colleges in the world. You will benefit a lot if you follow the lectures at this link, they give all the basic concepts as clearly as possible. It’s a bit technical because this is open mostly for students at an advanced level. The link is given below:
It is also technical and from MIT but might be a little more accessible than the earlier option.
SQL (see-quel) or Structured Query Language is a must-learn if you are a data scientist. You will be working with a lot of databases, and SQL is the language used to access and generate data from database systems like Oracle and Microsoft SQL Server. The best free course I could find online is undoubtedly the one below:
We have covered Python, R, Machine Learning using MATLAB, Data Science with R (SwiRl teaches data science as well), Statistics, Probability, Linear Algebra, and Basic Calculus. Now we just need to get a course for Data Science with Python, and we are done! Now I looked at many options but was not satisfied. So instead of a course, I have provided you with a link to the scikit-learn documentation. Why?
Because that’s as good as an online course by itself. If you read through the main sections, get the code (Ctrl-X, Ctrl-V) and execute it in an Anaconda environment, and then play around with it, experiment, and observe and read up on what every line does, you will already know who to solve standard textbook problems. I recommend the following order:
This book is free to learn online. Get the data files, get the script files, use RStudio, and just as with Python, play, enjoy, experiment, execute, and explore. A little hard work will have you up and running with R in no time! But make sure you try as many code examples as possible. The libraries you can focus on are:
dplyr (data manipulation)
tidyr (data preprocessing “tidying”)
ggplot2 (graphical package)
purrr (functional toolkit)
readr (reading rectangular data files easily)
stringr (string manipulation)
To make it short, simple, and sweet, since we have already covered SQL and this content is for beginners, I recommend the following course:
This is a course on Udemy rated 4.2/5 and completely free. You will learn everything you need to work with Tableau (the most commonly used corporate-level visualization tool). This is an extremely important part of your skill set. You can make all the greatest analyses, but if you don’t visualize them and do it well, management will never buy into your machine learning solution, and neither will anyone who doesn’t know the technical details of ML (which is a large set of people on this planet). Visualization is important. Please make sure to learn the basics (at least!) of Tableau.
Kaggle Micro-Courses (Add-Ons – Short Concise Tutorials)
Kaggle is a wonderful site to practice your data science skills, but recently, they have added a set of hands-on courses to learn data science practicals. And, if I do say, so myself, it’s brilliant. Very nicely presented, superb examples, clear and concise explanations. And of course, you will cover more than we discussed earlier. Please, if you read through all the courses discussed so far in this article, and if you do just the courses at Kaggle.com, you will have spent your time wisely (though not optimally – as we shall see).
Now, if you are reading this article, you might have a fundamental question. This is a blog of a company that offers courses in data science, deep learning, and cloud computing. Why would we want to list all our competitors and publish it on our site? Isn’t that negative publicity?
Quite the opposite.
This is the caveat we were talking about.
Our course is a better solution than every single option given above!
We have nothing to hide.
And we have an absolutely brilliant top-class product.
Every option given above is a separate course by itself.
And they all suffer from a very prickly problem – you need to have excellent levels of discipline and self-motivation to complete just one of the courses above – let alone all ten.
You also have no classroom environment, no guidance for doubts and questions, and you need to know the basics about programming.
Our product is the most cost-effective option in the market for learning data science, as well as the most effective methodology for everyone – every course is conducted live in a classroom environment from the comfort of your home. You can work at a standard job, spend two hours on the internet every day, do extra work and reading on weekends, and become a professional data scientist in 6 months time.
We also have personalized GitHub project portfolio creation, management, and faculty guidance. Not to mention individual attention for each student.
And IITians for faculty who also happen to have 9+ years of industry experience.
So when we say that our product is the best on the market, we really mean it. Because of the live session teaching of the classes, which no other option on the Internet today has.
Am I kidding? Absolutely not. And you can get started with Dimensionless Technologies Data Science with Python and R course for just 70-odd USD. Which is the most cost-effective option on the market!
And unlike all the 10 courses and resources detailed above, instead of doing 10 courses, you just need to do one single course, with the extracted meat of all that you need to know as a data scientist. And yes, we cover:
Statistics & Probability
Machine Learning in Python
Machine Learning in R
GitHub Personal Project Portfolio Creation
Live Remote Daily Sessions
Experts with Industrial Experience
A Classroom Environment (to keep you motivated)
Individual Attention to Every Student
I hope this information has you seriously interested. Please sign up for the course – you will not regret it.
And we even have a two-week trial for you to experience the course for yourself.
Choose wisely and optimally.
Unleash the data scientist within!
An excellent general article on emerging state-of-the-art technology, AI, and blockchain:
Data science is one of the hottest topics in the 21st century because we are generating data at a rate which is much higher than what we can actually process. A lot of business and tech firms are now leveraging key benefits by harnessing the benefits of data science. Due to this, data science right now is really booming.
In this blog, we will deep dive into the world of machine learning. We will walk you through machine learning basics and have a look at the process of building an ML model. We will also build a random forest model in python to ease out the understanding process.
What is Machine Learning?
Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in an autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.
There are many different types of machine learning algorithms, with hundreds published each day, and they’re typically grouped by either learning style (i.e. supervised learning, unsupervised learning, semi-supervised learning) or by similarity in form or function (i.e. classification, regression, decision tree, clustering, deep learning, etc.). Regardless of learning style or function, all combinations of machine learning algorithms consist of the following:
Representation (a set of classifiers or the language that a computer understands)
Evaluation (aka objective/scoring function)
Optimization (search method; often the highest-scoring classifier, for example; there are both off-the-shelf and custom optimization methods used)
Steps for Building ML Model
Here is a step-by-step example of how a hospital might use machine learning to improve both patient outcomes and ROI:
1. Define Project Objectives
The first step of the life cycle is to identify an opportunity to tangibly improve operations, increase customer satisfaction, or otherwise create value. In the medical industry, discharged patients sometimes develop conditions that necessitate their return to the hospital. In addition to being dangerous and troublesome for the patient, these readmissions mean the hospital will spend additional time and resources on treating patients for the second time.
2. Acquire and Explore Data
The next step is to collect and prepare all of the relevant data for use in machine learning. This means consulting medical domain experts to determine what data might be relevant in predicting readmission rates, gathering that data from historical patient records, and getting it into a format suitable for analysis, most likely into a flat file format such as a .csv.
3. Model Data
In order to gain insights from your data with machine learning, you have to determine your target variable, the factor of which you are trying to gain a deeper understanding. In this case, the hospital will choose “readmitted,” which is included as a feature in its historical dataset during data collection. Then, they will run machine learning algorithms on the dataset that build models that learn by example from the historical data. Finally, the hospital runs the trained models on data the model hasn’t been trained on to forecast whether new patients are likely to be readmitted, allowing it to make better patient care decisions.
4. Interpret and Communicate
One of the most difficult tasks of machine learning projects is explaining a model’s outcomes to those without any data science background, particularly in highly regulated industries such as healthcare. Traditionally, machine learning has been thought of as a “black box” because of how difficult it is to interpret insights and communicate their value to stakeholders and regulatory bodies alike. The more interpretable your model, the easier it will be to meet regulatory requirements and communicate its value to management and other key stakeholders.
5. Implement, Document, and Maintain
The final step is to implement, document, and maintain the data science project so the hospital can continue to leverage and improve upon its models. Model deployment often poses a problem because of the coding and data science experience it requires, and the time-to-implementation from the beginning of the cycle using traditional data science methods is prohibitively long.
A certain car manufacturing company X is looking to target its customers for their particular car model. Customers are identified by their age, salary, and Gender. The organisation wants to identify or predict which customers will affect the sales of their new car and actually purchase it.
We have a purchased column here which holds two values i.e 0 and 1. 0 indicates that the car has not been purchased by a certain individual. 1 indicates the sale of the car.
Importing the Required Libraries
You need to import all the required libraries first which will ease the model building parts for us. We are using keras to build our random forest model. We are using the matplotlib library to plot the charts and graphs and visualise results. In the end, we are also importing functions from the sklearn module which can help us in splitting our data into training and testing parts
# Importing the libraries
import numpy asnp
import matplotlib.pyplot asplt
import pandas aspd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
Loading the Dataset
In this step, you need to load your dataset in the memory. After that, we separate out the dependent and the independent variables for the training of our classifier. In most of the cases, you need to separate the dependent and he the independent variables
# Importing the dataset
Splitting the Dataset to Form Training and Test Data
In all the cases, you need to make some partitions in your data. A major chunk of your data acts as a training set and a smaller chunk acts as a test set. There are no clearly defined criteria on the proportion of the training and the test set. But most people follow 70–30 or 75–25 rule where a larger chunk is your training set. We train the data on the training set and test it on the test set. This process is known as validation. The prime idea behind this purpose is that one needs to gauge the performance of the model on the data which model has never seen before. In the real-world scenarios, the model will be predicting values on the unseen data. Furthermore, techniques like validation help us in avoiding overfitting or underfitting the model.
Overfitting refers to the case when our model has learnt all about the specific data on which it trained. It will work well on the training data but will have poor accuracy for any unseen data point. Overfitting is like your model is very specific to the data it has and has no generality. Similarly, underfitting is the case where your model is very general and is not able to predict well for your specific use-case. To achieve the best model accuracy, you need to strike a perfect balance between overfitting and under-fitting.
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
In this case, we are fitting our model with the training data. We are using the random forest model exposed by the sklearn package in python. Ultimately, we pass the dependent and independent features separately through which our model makes an internal mapping between them using mathematical coefficients.
# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
In this part, we are passing unseen values to our model on which it is making predictions. We use a confusion matrix to derive metrics like accuracy, precision, and recall for our model. These metrics help us to understand the performance of the model.
# Predicting the Test set results
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
Visualising the Predictions
Additionally, we have made an attempt to visualise the predictions of our model using the below code.
Hence, in this Machine Learning Tutorial, we studied the basics of ML. Earlier machine learning was the theory that computers can learn without being programmed to perform specific tasks. But now, the researchers interested in artificial intelligence wanted to see if computers could learn from data. They learn from previous computations to produce reliable decisions and results. It’s a science that’s not new — but one that’s gaining fresh momentum.
Follow this link, if you are looking to learn more about data science online!
We discussed earlier in Part 1 of Blockchain Applications of Data Science on this blog how the world could be made to become much more profitable for not just a select set of the super-rich but also to the common man, to anyone who participates in creating a digitally trackable product. We discussed how large scale adoption of cryptocurrencies and blockchain technology worldwide could herald a change in the economic demography of the world that could last for generations to come. In this article, we discuss how AI and data science can be used to tackle one of the most pressing questions of the blockchain revolution – how to model the future price of the Bitcoin cryptocurrency for trading for massive profit.
But first, we take a short detour to explore another aspect of cryptocurrency that is not commonly talked about. Looking at the state of the world right now, it should be discussed more and I feel compelled to share this information with you before we skip to the juicy part about cryptocurrency price forecasting.
The Environmental Impact of Cryptocurrency Mining
Now, two fundamental assumptions. I assume you’ve read Part 1, which contained a link to a visual guide of how cryptocurrencies work. In case you missed the latter, here’s a link for you to check again.
The following articles speak about the impact of cryptocurrency mining on the environment. Read at least one partially at the very least so that you will understand as we progress with this article:
So cryptocurrency mining involves a huge wastage of computational resources, energy, and enough electrical power to run an entire country. This is mainly due to the model of the Proof-of-Work PoW mining system used by Bitcoin. For more, see the following article..
In PoW mining, miners compete against each other in a desperate race to see who can find the solution to a mathematical hashing problem the quickest. And in every race, only one miner is rewarded with the Bitcoin value.
In a significant step forward, Vitalin Buterik’s Ethereum cryptocurrency has shifted to Proof-of-Stake based (PoS) mining system. This makes the mining process significantly less energy intensive than PoW. Some claim the energy savings may be 99.9% more efficient than PoW. Whatever the statistics may be, a PoS based mining process is a big step forward and may completely change the way the environmentalists feel about cryptocurrencies.
So by shifting to PoS mining we can save a huge amount of energy. That is a caveat you need to remember and be aware about because Bitcoin uses PoW mining only. It would be a dream come true for an environmentalist if Bitcoin could shift to PoS mining. Let’s hope and pray that it happens.
Now back to our main topic.
Use AI and Data Science to Predict Future Prices of Cryptocurrency – Including the Burst of the Bitcoin Bubble
What is a blockchain? A distributed database that is decentralized and has no central point of control. As on Feb 2018, the Bitcoin blockchain on a full node was 160-odd GB in size. Now in April 2019, it is 210 GB in size. So this is the question I am going to pose to you. Would it be possible to use the data in the blockchain distributed database to identify patterns and statistical invariances to invest minimally with maximum possible profit? Can we forecast and build models to predict the prices of cryptocurrency in the future using AI and data science? The answer is a definite yes.
You may wonder if applying data science techniques and statistical analysis can actually produce information that can help in forecasting the future price of bitcoin. I came across a remarkable kernel on www.Kaggle.com (a website for data scientists to practice problems and compete with each other in competitions) by a user with the handle wayward artisan and the profile name Tania J. I thought it was worth sharing since this is a statistical analysis of the rise and the fall of the bitcoin bubble vividly illustrating how statistical methods helped this user to forecast the future price of bitcoin. The entire kernel is very large and interesting, please do visit it at the link given below. Just the start and the middle section of the kernel is given here because of space considerations and intellectual property considerations as well.
A Kaggle Kernel That Modelled the Bitcoin Bubble Burst Within Reasonable Error Limits
This following kernel uses cryptocurrency financial data scraped from www.coinmarketcap.com. It is a sobering example of how AI predictions actually predicted the collapse of the bitcoin bubble, prompting as many sellers to sell as they did. Coming across this kernel is one of the main motivations to write this article. I have omitted a lot of details, especially building the model and analyzing its accuracy. I just wanted to show that it was possible.
The dataset is available at the following link as a csv file in Microsoft Excel:
We focus on one of the middle sections with the first ARIMA model with SARIMAX (do look up Wikipedia and Google Search to learn about ARIMA and SARIMAX) which does the actual prediction at the time that the bitcoin bubble burst (only a subset of the code is shown). Visit the Kaggle kernel page on the link below this extract to get the entire code:
<data analysis and model analysis code section not shown here for brevity>
This code and the code earlier in the kernel (not shown for the sake of brevity) that built the model for accuracy gave the following predictions as output:
What do we learn? Surprisingly, the model captures the Bitcoin bubble burst with a remarkably accurate prediction (error levels ~ 10%)!
So, does AI and data science have anything to do with blockchain technology and cryptocurrency? The answer is a resounding, yes. Expect data science, statistical analysis, neural networks, and probability model distributions to play a heavy part when you want to forecast cryptocurrency prices.
For all the data science students out there, I am going to include one more screen from the same kernel on Kaggle (link):
The reason I want to show you this screen is that the terms and statistical lingo like kurtosis and heteroskedasticity are statistics concepts that you need to master in order to conduct forecasts like this, the main reason being to analyze the accuracy of the model you have constructed. The output window is given below:
Unless you’ve been living with your head under a rock for the last 4 years, you will definitely have heard of Bitcoin. You would also have heard about the technology behind Bitcoin, Blockchain. Now cryptocurrencies are banned in most cases in India and China, but the Americas and Europe still use cryptocurrencies extensively. And in my opinion, Asia stands to lose a lot if blockchain is not adopted extensively everywhere. Because make no mistake about it – blockchain technology will change the world as we know it. Forever.
Blockchain is the technology powering Bitcoin and other cryptocurrencies. To explain what blockchain is and what bitcoin is you can go through anyone of the articles below. Don’t worry these articles are carefully selected to be as interesting and fun to read as possible. (This also gives me space to add my own original ideas instead of copying or rewording existing articles – and I have plenty (of ideas)!
In fact, that last link is so amazingly simple visual and clear that I recommend everyone read it. Just so that we’re on the same page.
Cut to the chase. A little confession here. I was asked to do this article nearly 16 days ago. Now I have some experience with blockchain before since having gone through it extensively as a research topic for my own blog. Then a remarkable idea hit me. An idea for a startup that could (in theory) become a multi-billion dollar enterprise. I spent a few days refining it, even going so far as to see if I could start this company with this area myself, until reality set in – I lacked the experience and the business skills.
No sooner had this realization struck me and the excitement cooled a little, another idea to improve blockchain struck me, and I promise to sketch out that idea as well. I am doing this for two reasons:
I am staunch support of the FOSS (free open source software movement and would like to be credited with the idea, and I am starting a free to use, open source project on GitHub – working on it, currently moving towards an alpha release as of now.
I believe in the power of technology to remove economic inequality. Now you may say that technology has evolved to the point that 4-5 monolithic companies dominate the entire world. But I believe that technology when used ethically has the potential to create more opportunities than it removes.
Blockchain has two major problems – energy consumption and resource consumption. But there are techniques that can alleviate both of these problems. We’ll deal with that as well in Part 2.
Finally, the vaunted hype about security for blockchain and cryptocurrencies is ridiculous when you think about it. For the sake of brevity, I will address the main security issues with blockchain in a separate article on Medium – (not here, since it has no relation to data science).
Application – A Personal Blockchain For Every Person On The Planet
In points (I assume you’ve gone through the graphical explanation of blockchain at least – if not you can review it here):
The trouble with end products of all types that are produced today is that there are so many intermediaries between the producer and the consumer that the producers receives a pittance compared to the end final price. It would be nice if we could track a product everywhere that it is used.
This is also applicable for books, music, articles, poems, pictures, any digital content of any sort. Currently Amazon and YouTube monopolize content distribution, the latter with a complete disregard for copyright and media ownership and payment. Suppose we had a tracking system that viewed every view of a video, and rewarded the original producer for it?
To emphasize the previous point, let us consider the case of Lindsey Stirling. Lindsey Stirling is a famous contemporary violinist who dances while playing. Her 118 video uploads have earned her 2,575,305,706 views, 2.5 billion approx, and her earnings from YouTube ads last month was 100K a month. Her net worth as on 10th April 2019 is 12 million USD (12,000,000).
But suppose Lindsey Stirling distributed her videos at a price of 1 USD every view. Her net worth would be 2.6 billion USD at the very least! She would be a multi-billionaire had this platform existed. It doesn’t – yet. And because it doesn’t exist she is 2.49 billion USD poorer!
Now everyone who knows blockchain technology will now realize this idea, the concept, and how blockchain can be used to overcome this problem – and its power. Disruptive power!
The blockchain is a service that immutably assigns ownership.
The blockchain is also a database that stores every single transaction on a particular digitisable entity.
Finally, the Ethereum smart contract technology means that we can assign payments to go to every person on his own personal blockchain of all his digitisable goods.
This means we can build a world where producer pays a user-defined amount to every entity which created a particular digitisable product.
On this platform or website or marketplace, producers can adjust their prices and their payments and consumers can buy directly from them.
Everything can be tracked on the blockchain. Your own database of your own transactions can be used with smart contracts to pay the maximum possible fee to the most deserving person in the supply chain – fixed by each producer.
Hugely, Massively Disruptive
If you are interested or want to know more, you can leave a comment below with your email address. If you want to be a part of this new revolution and the new decentralised world – with all services provided free – please provide a comment below asking for my email ID with a statement of what and how you want to contribute to this endeavor. I promise to reply to every sincere query.
This is a fledgling project and a lot of work remains to be done. I will be writing articles and creating a team to work on this idea. Those of you who are interested please mail me at firstname.lastname@example.org.
This will be an open source project and all services have to be offered free of cost. How do you go about making a profit from this? You don’t! The only way this can be fair to all players in countries like India is if it is specially designed to be applicable to anyone.
So this article gave a small glimpse into a world without intermediaries, corporations, money-making middlemen, and running purely on smart contracts. This is applicable to AI and data science since this technology will not reach anywhere significant without extensive use of AI and data science.
The more data that is available, the more analysis can be performed on it. And unless we have analysts who are running monitoring fraud detection systems fulltime on such a system, we might as well never build it – because blockchain data integrity cannot be hacked, but cryptocurrencies are hackable and have been hacked extensively since the beginning of Bitcoin.
For Part 2 of this series on Blockchain Applications of Data Science, you can go to the link below: