Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2

Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2


We discussed earlier in Part 1 of Blockchain Applications of Data Science on this blog how the world could be made to become much more profitable for not just a select set of the super-rich but also to the common man, to anyone who participates in creating a digitally trackable product. We discussed how large scale adoption of cryptocurrencies and blockchain technology worldwide could herald a change in the economic demography of the world that could last for generations to come. In this article, we discuss how AI and data science can be used to tackle one of the most pressing questions of the blockchain revolution – how to model the future price of the Bitcoin cryptocurrency for trading for massive profit.

A Detour

But first, we take a short detour to explore another aspect of cryptocurrency that is not commonly talked about. Looking at the state of the world right now, it should be discussed more and I feel compelled to share this information with you before we skip to the juicy part about cryptocurrency price forecasting.

The Environmental Impact of Cryptocurrency Mining

Now, two fundamental assumptions. I assume you’ve read Part 1, which contained a link to a visual guide of how cryptocurrencies work. In case you missed the latter, here’s a link for you to check again.

The following articles speak about the impact of cryptocurrency mining on the environment. Read at least one partially at the very least so that you will understand as we progress with this article:

So cryptocurrency mining involves a huge wastage of computational resources, energy, and enough electrical power to run an entire country. This is mainly due to the model of the Proof-of-Work PoW mining system used by Bitcoin. For more, see the following article..

In PoW mining, miners compete against each other in a desperate race to see who can find the solution to a mathematical hashing problem the quickest. And in every race, only one miner is rewarded with the Bitcoin value.

image result for Ethereum goes Green
Ethereum goes Green! (From Pixabay)


In a significant step forward, Vitalin Buterik’s Ethereum cryptocurrency has shifted to Proof-of-Stake based (PoS) mining system. This makes the mining process significantly less energy intensive than PoW. Some claim the energy savings may be 99.9% more efficient than PoW. Whatever the statistics may be, a PoS based mining process is a big step forward and may completely change the way the environmentalists feel about cryptocurrencies.

So by shifting to PoS mining we can save a huge amount of energy. That is a caveat you need to remember and be aware about because Bitcoin uses PoW mining only. It would be a dream come true for an environmentalist if Bitcoin could shift to PoS mining. Let’s hope and pray that it happens.

Now back to our main topic.

Use AI and Data Science to Predict Future Prices of Cryptocurrency – Including the Burst of the Bitcoin Bubble

What is a blockchain? A distributed database that is decentralized and has no central point of control. As on Feb 2018, the Bitcoin blockchain on a full node was 160-odd GB in size. Now in April 2019, it is 210 GB in size. So this is the question I am going to pose to you. Would it be possible to use the data in the blockchain distributed database to identify patterns and statistical invariances to invest minimally with maximum possible profit? Can we forecast and build models to predict the prices of cryptocurrency in the future using AI and data science? The answer is a definite yes.

Practical Considerations

You may wonder if applying data science techniques and statistical analysis can actually produce information that can help in forecasting the future price of bitcoin. I came across a remarkable kernel on (a website for data scientists to practice problems and compete with each other in competitions) by a user with the handle wayward artisan and the profile name Tania J. I thought it was worth sharing since this is a statistical analysis of the rise and the fall of the bitcoin bubble vividly illustrating how statistical methods helped this user to forecast the future price of bitcoin. The entire kernel is very large and interesting, please do visit it at the link given below. Just the start and the middle section of the kernel is given here because of space considerations and intellectual property considerations as well.

image result for home for data science @ kaggle
Your home for data science.

A Kaggle Kernel That Modelled the Bitcoin Bubble Burst Within Reasonable Error Limits

This following kernel uses cryptocurrency financial data scraped from It is a sobering example of how AI predictions actually predicted the collapse of the bitcoin bubble, prompting as many sellers to sell as they did. Coming across this kernel is one of the main motivations to write this article. I have omitted a lot of details, especially building the model and analyzing its accuracy. I just wanted to show that it was possible.

For more details, visit the kernel on Kaggle at the link: (Please visit this page, all aspiring data scientists. And pay attention to every concept discussed and used. Use Google and Wikipedia and you will learn a lot.)

A subset of the code is given below (the first section):

<subsequent code not shown for brevity>

The dataset is available at the following link as a csv file in Microsoft Excel:


We focus on one of the middle sections with the first ARIMA model with SARIMAX (do look up Wikipedia and Google Search to learn about ARIMA and SARIMAX) which does the actual prediction at the time that the bitcoin bubble burst (only a subset of the code is shown). Visit the Kaggle kernel page on the link below this extract to get the entire code:

<data analysis and model analysis code section not shown here for brevity>

<more code, not shown>

Kaggle Code

This code and the code earlier in the kernel (not shown for the sake of brevity) that built the model for accuracy gave the following predictions as output:

Bitcoin price forecasting at the time of the burst of the Bitcoin bubble

What do we learn? Surprisingly, the model captures the Bitcoin bubble burst with a remarkably accurate prediction (error levels ~ 10%)!


So, does AI and data science have anything to do with blockchain technology and cryptocurrency? The answer is a resounding, yes. Expect data science, statistical analysis, neural networks, and probability model distributions to play a heavy part when you want to forecast cryptocurrency prices.

For all the data science students out there, I am going to include one more screen from the same kernel on Kaggle (link):

The reason I want to show you this screen is that the terms and statistical lingo like kurtosis and heteroskedasticity are statistics concepts that you need to master in order to conduct forecasts like this, the main reason being to analyze the accuracy of the model you have constructed. The output window is given below:

So yes, blockchain technology and cryptocurrencies have a lot of overlap with applications. But also remember, data science can be applied to any field where finance is a factor.

For more on blockchain and data science, see:

A Beginner’s Guide to Big Data and Blockchain

Enjoy data science!

Follow this link, if you are looking to learn data science online!

You can follow this link for our Big Data course, which is a step further into advanced data analysis and processing!

Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course

Furthermore, if you want to read more about data science, read our Data Science Blogs

What is the Difference Between: Data Science, Data Mining and Machine Learning

data mining vs machine learning vs data science


The advancement in the analytical eco-space has reached new heights in the recent past. The emergence of new tools and techniques has certainly made life easier for an analytics professional to play around with the data. Moreover, the massive amounts of data that’s getting generated from diverse sources need huge computational power and storage system for analysis.

Three of the most commonly used terms in analytics are Data mining, Machine Learning, and Data Science which is a combination of both. In this blog post, we would look into each of these three buzzwords along with examples.

Data Mining:

By term ‘mining’ we refer to extracting some object by digging. Similarly, that analogy could be applied to data where information could be extracted by digging into it. Data mining is one of the most used terms these days. Unlike previously, our life is circulated entirely by big data and we have the tools and techniques to handle such voluminous diverse meaningful data.

In the data, there are a lot of patterns which people could discover once the data has been gathered from relevant sources. The hidden patterns could be extracted to provide valuable insights by combining multiple sources of data even if it is junk. This entire process is known as Data mining.

Now the data used for mining could be enterprise data which are restricted and secured and has privacy issues. It could also be an integration of multiple sources which includes financial data, third-party data, etc. The more the data available to us, the better it is as we need to find patterns and insights in sequential and non-sequential data.

The steps involved in data mining are –

  • Data Collection – This is one of the most important steps in Data mining as getting the correct data is always a challenge in any organization. To find patterns in the data, we need to ensure that the source of the data is accurate and as much as possible data is gathered.
  • Data Cleaning – A lot of the times the data we get is not clean enough to draw insights from it. There could be missing values, outliers, NULL in the data which needs to be handled either by deletion or by imputation based on its significance to the business.
  • Data Analysis – Once the data is gathered, and cleaned the next step is to analyze the data which in short known as Exploratory Data Analysis. Several techniques and methodologies are applied in this step to derive relevant insights from the data.
  • Data Interpretation – Only analyzing the data is worthless unless it is interpreted through the form of graphs or charts to the stakeholders or the business who would make conclusions based on the analysis.

Data mining has several usages in the real world. For example, if we take the logs data for login in a web application, we would see that the data is messy containing information like timestamp, activities of the user, time spent on the website, etc. However, if we clean the data, and then analyze it, we would find some relevant information from it such as the user’s regular habit, the peak time for most of the activities, and so on. All this information could help to increase the efficiency of the system.

Another example of data mining is in crime prevention. Though data mining has most usage in education and healthcare, it is also used by agencies in the crime department to spot patterns in the data. This data would consist of information about some of the criminal activities that have taken place. Hence, mining, and gathering information from the data would help the agencies to predict future crime events and prevent it from occurring. The agencies could mine the data and find out the place where the next crime could take place. They could also prevent cross-border calamity by understanding which vehicle to check, the age of the occupants, etc.

However, a few of the important points one should remember about Data Mining –

  • Data mining should not be considered as the first solution to any analysis task if other accurate solutions are applicable. It should be used when such solutions fail to provide value.
  • Sufficient amount of data should be present to draw insights from it.
  • The problem should be understood to be a Regression or a Classification one.

Machine Learning:

Previously, we learned about Data mining which is about gathering, cleaning, analyzing, and interpreting relevant insights from the data for the business to draw conclusions from it.

If Data mining is about describing a set of events, Machine Learning is about predicting the future events. It is the term coined to define a system which learns from past data to generalize and predict the future events from the unknown set of data.

Machine Learning could be divided into three categories –

  • Supervised Learning – In supervised learning, the target is labeled i.e., for every corresponding row there is an output value.
  • Unsupervised Learning – The data set is unlabelled in unsupervised learning i.e., one has to cluster the data into various groups based on the similarities in the pattern of the data points.
  • Reinforcement Learning – It is a special category of Machine Learning which is mostly used in self-driving cars. In reinforcement learning, the learner is rewarded for every correct move, and penalized for any incorrect move.

The field of Machine Learning is vast, and it requires a blend of statistics, programming, and most importantly data intuition to master it. Supervised and unsupervised learning are used to solve regression, classification, and clustering problems.

  • In regression problems, the target is numeric i.e., continuous or discrete in nature. A continuous value could be an integer, float, or a decimal, whereas a discrete value is a number or an integer.
  • In classification problems, the target is categorical i.e., binary, multinomial, or ordinal in nature.
  • In clustering problems, the dataset is grouped into different clusters based on the similar properties among the data in a particular group.

Machine Learning has a vast usage in various fields such as Banking, Insurance, Healthcare, Manufacturing, Oil and Gas, and so on. Professionals from various disciplines feel the need to predict future outcomes in order to work efficiently and prepare for the best by taking appropriate actions. Some of the real-life examples where Machine Learning has found its usage is –

  • Email Spam filtering – This is the first application of Machine Learning where an email is classified as ‘Spam’ or ‘Not Spam’ based on certain keywords in the mail. It is a binary classification supervised learning problem where the system is initially trained with a set of sample emails to learn the patterns which would help in filtering out irrelevant emails. Once the system has generalized well, it is passed through a validation set to check for its efficiency, and then through a test set to find its accuracy.
  • Credit Risk Analytics – Machine Learning has vast influence in the Banking, and Insurance domain with one of its usage being in predicting the delinquency of a loan by a borrower. Defaulting a credit loan is a prevalent issue in which the lender or the bank has lost millions by failing to identify the possibility of a borrower not repaying back the loans or meeting the contractual agreements. However, Machine Learning has been introduced by various banks which takes into several features of a borrower and builds a predictive model which helps in mitigating the risk involved in giving credit card loans to them.
  • Product Recommendations – Flipkart, and Amazon are of the two biggest e-commerce industry in the world where millions of users shop every day the products of their choice. However, there is a recommendation algorithm that works behind the scenes which simplify the life of the customer by displaying them the products they make like based on their previous shopping or search patterns. This is an example of unsupervised learning where a customer is grouped based on their shopping patterns.

Data Science: 

So far, we have learned about the two most common and important terms in Analytics i.e., Data mining and Machine Learning.

If Data mining deals with understanding and finding hidden insights in the data, then Machine Learning is about taking the cleaned data and predicting future outcomes. All of these together form the core of Data Science.

Data Science is a holistic study which involves both Descriptive and Predictive Analytics. A Data Scientist needs to understand and perform exploratory analysis as well as employ tools, and techniques to make predictions from the data.

A Data Scientist role is a mixture of the work done by a Data Analyst, a Machine Learning Engineer, a Deep Learning Engineer, or an AI researcher. Apart from that, a Data Scientist might also be required to build data pipelines which is the work of a Data Engineer. The skill set of a Data Scientist consists of Mathematics, Statistics, Programming, Machine Learning, Big Data, and communication.

Some of the applications of Data Science in the modern world are –

  • Virtual assistant – Amazon’s Alexa, and Apple’s Siri are two of the biggest achievements in the recent past where AI has been used to build human-like intelligent systems. A virtual assistant could perform most of the tasks that a human being could with proper instructions.
  • ChatBot – Another common usage of Data Science is the ChatBot development which is now being integrated into almost every corporation. A technique called Natural Language Processing is in the core of ChatBot development.
  • Identifying cancer cells – Deep Learning has made tremendous progress in the healthcare sector where it is used to identify the pattern in the cells to predict whether it is cancerous or not. Deep Learning uses neural networks which functions like the human brain.



Data mining, Machine Learning, and Data Science is a broad field and it would require quite a few things to learn to master all these skills.

Dimensionless has several resources to get started with.

To Learn Data Science, Get Data Science Training in Pune and Mumbai from Dimensionless Technologies.

To Learn more about data science, Click to read Data Science Blog.

Also Read:

Machine Learning for Transactional Analytics: Acquisition Cost Vs Lifetime Value

7 Technical Concept Every Data Science Beginner Should Know