A-Z of Analytics


source: PPM Express

As we move towards a data-driven world, we tend to realize how the power of analytics could unearth the most minute details of our lives. From drawing insights from data to making predictions of some unknown scenarios, both small and large industries are thriving under the power of big data analytics.

A-Z Glossary

There are various terms, keywords, concepts that are associated with Analytics. This field of study is broad, and hence, it could be overwhelming to know each one of it. This blog covers some of the critical concepts in analytics from A-Z, and explain the intuition behind that.

A: Artificial Intelligence – AI is the field of study which deals with the creation of intelligent machines that could behave like humans. Some of the widespread use cases where Artificial Intelligence has found its way are ChatBots, Speech Recognition, and so on.

There are two main types of Artificial Intelligence –Narrow AI, and Strong AI. A poker game is an example for the weak or the narrow AI where you feed all the instructions into the machines. It is trained to understand every scenario and incapable of performing something on their own.

On the other hand, a Strong AI thinks and acts like a human being. It is still far-fetched, and a lot of work is being done to achieve ground-breaking results.

B: Big Data – The term Big Data is quite popular and is being used frequently in the analytical ecosystem. The concept of big data came into being with the advent of the enormous amount of unstructured data. The data is getting generated from a multitude of sources which bears the properties of volume, veracity, value, and velocity. 

Traditional file storage systems are incapable of handling such volumes of data, and hence companies are looking into distributed computing to mine such data. Industries which makes full use of the big data are way ahead off their peers in the market.

C: Customer Analytics – Based on the customer’s behavior, relevant offers delivered to them. This process is known as Customer Analytics. Understanding the customer’s lifestyle and buying habits would ensure better prediction of their purchase behaviors, which would eventually lead to more sales for the company. 

The accurate analysis of customer behavior would increase customer loyalty. It could reduce campaign costs as well. The ROI would increase when the right message delivered to each segmented group.

D: Data Science – Data Science is a holistic term which involves a lot of processes which includes data extraction, data pre-processing, building predictive models, data visualization, and so on. Generally, in big companies, the role of a Data Scientist is well defined unlike in startups where you would need to look after all the aspects of an end-to-end project.

Data Science

source: Towards Data Science

To be a Data Scientist, you need to be fluent in Probability, and Statistics as well, which makes it a lucrative career. There are not many qualified Data Scientists out there, and hence mastering the relevant skills could put you in a pole position in the job market.

E: Excel –An old, and yet the most used after visualization tool in the market is Microsoft Excel. Excel is used in a variety of ways while presenting the data to the stakeholders. The graphs and charts lay down the proper demonstration of the work done, which makes it easier for the business to take relevant decisions.

Moreover, Excel has a rich set of utilities which could useful in analyzing structured data. Most companies still need personnel with the knowledge of MS Excel, and hence, you must master it.

F: Financial Analytics – Financial Data such as accounts, transactions, etc., are private and confidential to an individual. Banks refrain from sharing such sensitive data as it could breach privacy and lead to financial damage of a customer.

However, such data if used ethically could save losses for a bank by identifying potential fraudulent behaviors. It would also be used to predict the loan defaulting probability. Credit scoring is another such use case of financial analytics.

G: Google Analytics – For analyzing website traffic, Google provides a free tool known as Google Analytics. It is useful to track any marketing campaign which would give an idea about the behavior of customers. 

There are four levels via which the Google Analytics collects the data – User level which understands each user’s actions, Session level which monitors the individual visit, Page view level which gives information about each page views, and Event level which is about the number of button clicks, views of videos, and so on.

H: Hadoop –The framework most commonly used to store, and manipulate big data is known as Hadoop. As a result of high computing power, the data is processed fast in Hadoop.

Moreover, parallel computing in multiple clusters protects the loss of data and provides more flexibility. It is also cheaper, and could easily be scaled to handle more data.

I: Impala – Impala is a component of Hadoop which provides a SQL query engine for data processing. Written in Java, and C++, Impala is better than other SQL engines. Use SQL; the communication enabled between users and the HDFS, which is faster than Hive. Additionally, different formats of a file could also be read using Impala.

J: Journey Analytics – A sequential journey related to customer experience, which meets a specific business referred to as Journey Analytics. Over time, a customer’s interaction with the company compiled from its journey analytics.

K: K-means clustering – Clustering is a technique where you group a dataset into some small groups based on the similar properties shared among the members of the same group.

K-Means clustering is one such clustering algorithm where an unsupervised dataset split into k number of groups or clusters. K-Means clustering could be used to group a set of customers or products resembling similar properties.

L: Latent Dirichlet Allocation – LDA or Latent Dirichlet Allocation is a technique used over textual data in use cases such as topic modeling. Here, a set of topics imagined by the LDA representing a set of words. Then, it maps all the documents to the topics ensuring that those imaginary topics capture words in each text.

M: Machine LearningMachine Learning is a field of Data Science which deals with building predictive models to make better business decisions.

A machine or a computer is first trained with some set of historical data so that it finds patterns in it, and then predict the outcome on an unknown test set. There are several algorithms used in Machine Learning, one such being K-means clustering. 

Machine Learning

source: TechLeer

N: Neural Networks – Deep Learning is the branch of Machine Learning, which thrives on large complex volumes of data and is used to cases where traditional algorithms are incapable of producing excellent results

Under the hood, the architecture behind Deep Learning is Neural Networks, which is quite similar to the neurons in the human brain.

O: Operational Analytics –The analytics behind the business, which focuses on improving the present state of operations, referred to as Operational Analytics.

Various data aggregation and data mining tools used which provides a piece of transparent information about the business. People who are expert in this field would use operational software provided knowledge to perform targeted analysis.

 P: Pig –Apache Pig is a component of Hadoop which is used to analyze large datasets by parallelized computation. The language used is called Pig Latin.

Several tasks, such as Data Management could be served using Pig Latin. Data checking and filtering could be done efficiently and quickly with Pig.

 Q: Q-Learning –It is a model-free reinforcement learning algorithm which learns a policy by informing an agent the actions to be taken under specific certain circumstances. The problems handled with stochastic transitions and rewards, and it doesn’t require adaptations.

 R: Recurrent Neural Networks –RNN is a neural network where the input to the current step is the output from the previous step.

It used in cases such as text summarization was to predict the next word, the last words are needed to remember. The issue of the hidden layer was solved with the advent of RNN as it recalls sequence information.

 S: SQL –One of the essential skill in analytics is Structured Query Language or SQL. It is used in RDBMS to fetch data from tables using queries.

Most companies use SQL for their initial data validation and analysis. Some of the standard SQL operations used are joins, sub-queries, window functions, etc.

 T: Traffic Analytics –The study of analyzing a website’s source of traffic by looking into its clickstream data is known as traffic analytics. It could help in understanding whether direct, social, paid traffic, etc., are bringing in more users.

 U: Unsupervised Machine Learning –The type of machine learning which deals with unlabeled data is known as unsupervised machine learning.

Here, no labels provided for a corresponding set of features, and information is grouped based on the similarity in the properties shared by the members of each group. Some of the unsupervised algorithms are PCA, K-Means, and so on.

 V: Visualization –The analysis of data is useless if not presented in the forms of graphs and charts to the business. Hence, Data visualization is an integral part of any analytics project and also one of the key steps in data pre-processing and feature engineering.

 W: Word2vec –It is a neural network used for text processing which takes in a text as input and output are a set of feature vectors of the words.

Some of the applications of word2vec are in genes, social media graphs, likes, etc. In a vector space, similar words are grouped using word2vec without the need for human intervention.

 X: XGBoost –Boosting is a technique in machine learning by which a strong learner strengthens a weak learner in subsequent steps.

XGBoost is one such boosting algorithm which is robust to outliers, or NULL values. It is the go-to algorithm in Machine Learning competitions for its speed and accuracy.

 Y: Yarn –YARN is a component of Hadoop which lies between HDFS, and the processing engines. In individual cluster nodes, the processing operations monitored by YARN.

The dynamic allocation of resources is also handled by it, which improves application performance and resource utilization.

 Z: Z-test –A type of hypothesis testing used to determine whether to reject or accept the NULL hypothesis. By how many standard deviations, a data point is further away from the mean could be calculated using Z-test.


In this blog post, we covered some of the terms related to the analytics starting with each letter in the English.

If you are willing to learn more about Analytics, follow the blogs and courses of Dimensionless.

Follow this link, if you are looking to learn more about data science online!

Additionally, if you are having an interest in learning Data Science, Learn online Data Science Course to boost your career in Data Science.

Furthermore, if you want to read more about data science, you can read our blogs here

Top 5 Careers in Data Science You Need to Know About

Top 5 Careers in Data Science You Need to Know About


Reports suggest that around 2.5 quintillion bytes of data are generated every single day. As the online usage growth increases at a tremendous rate, there is a need for immediate Data Science professionals who can clean the data, obtain insights from it, visualize it, train model and eventually come up with solutions using Big data for the betterment of the world.

By 2020, experts predict that there will be more than 2.7 million data science and analytics jobs openings. Having a glimpse of the entire Data Science pipeline, it is definitely tiresome for a single human to perform and at the same time excel at all the levels. Hence, Data Science has a plethora of career options that require a spectrum set of skill sets.

Let us explore the top 5 data science career options in 2019 (In no particular order).


1. Data Scientist

Data Scientist is one of the ‘high demand’ job roles. The day to day responsibilities involves the examination of big data. As a result of the analysis of the big data, they also actively perform data cleaning and organize the big data. They are well aware of the machine learning algorithms and understand when to use the appropriate algorithm. During the due course of data analysis and the outcome of machine learning models, patterns are identified in order to solve the business statement.

The reason why this role is so crucial in any organisation is that the company tends to take business decisions with the help of the insights discovered by the Data Scientist to have an edge over the company’s competitors. It is to be noted that the Data Scientist role is inclined more towards the technical domain. As the role demands a wide range of skill set, Data Scientists are one among the highest paid jobs.


Core Skills of a Data Scientist

  1. Communication
  2. Business Awareness
  3. Database and querying
  4. Data warehousing solutions
  5. Data visualization
  6. Machine learning algorithms


2. Business Intelligence Developer

BI Developer is a job role inclined more towards the Non-Technical domain but has a fair share of Technical responsibilities as well (if required) as a part of their day to day responsibilities. BI developers are responsible for creating and implementing business policies as a result of the insights obtained from the Technical team.

Apart from being a policymaker involving the usage of dedicated (or custom) Business Intelligence analytics tools, they will also have a fair share of coding in order to explore the dataset, present the insights of the dataset in a non-verbal manner. They help in bridging the gap between the technical team that works with the deepest technical understanding and the clients that want the results in the most non-technical manner. They are expected to generate reports from the insights and make it ‘less technical’ for others in the organisation. It is noted that the BI Developers have a deep understanding of Business when compared to Data Scientist.


Core Skills of a Business Analytics Developer

  1. Business model analysis
  2. Data warehousing
  3. Design of business workflow
  4. Business Intelligence software integration


3. Machine Learning Engineer

Once the data is clean and ready for analysis, the machine learning engineers work on these big data to train a predictive model that predicts the target variable. These models are used to analyze the trends of the data in the future so that the organisation can take the right business decisions. As the dataset involved in a real-life scenario would involve a lot of dimensions, it is difficult for a human eye to interpret insights from it. This is one of the reasons for training machine learning algorithms as it easily deals with such complex dataset. These engineers carry out a number of tests and analyze the outcomes of the model.

The reason for conducting constant tests on the model using various samples is to test the accuracy of the developed model. Apart from the training models, they also perform exploratory data analysis sometimes in order to understand the dataset completely which will, in turn, help them in training better predictive models.


Core Skills of Machine Learning Engineers

  1. Machine Learning Algorithms
  2. Data Modelling and Evaluation
  3. Software Engineering


4. Data Engineer

The pipeline of any data-oriented company begins with the collection of big data from numerous sources. That’s where the data engineers operate in any given project. These engineers integrate data from various sources and optimize them according to the problem statement. The work usually involves writing queries on big data for easy and smooth accessibility. Their day to day responsibility is to provide a streamlined flow of big data from various distributed systems. Data engineering differs from the other data science careers as in, it is concentrated on the system and hardware that aids the company’s data analysis, rather than the analysis of data itself. They provide the organisation with efficient warehousing methods as well.


Core Skills of Data Engineer

  1. Database Knowledge
  2. Data Warehousing
  3. Machine Learning algorithm


5. Business Analyst

Business Analyst is one of the most essential roles in the Data Science field. These analysts are responsible for understanding the data and it’s related trend post the decision making about a particular product. They store a good amount of data about various domains of the organisation. These data are really important because if any product of the organisation fails, these analysts work on these big data to understand the reason behind the failure of the project. This type of analysis is vital for all the organisations as it makes them understand the loopholes in the company. The analysts not only backtrack the loophole and in turn provide solutions for the same making sure the organisation takes the right decision in the future. At times, the business analyst act as a bridge between the technical team and the rest of the working community.


Core skills of Business Analyst

  1. Business awareness
  2. Communication
  3. Process Modelling



The data science career options mentioned above are in no particular order. In my opinion, every career option in Data Science field works complimentary with one another. In any data-driven organization, regardless of the salary, every career role is important at the respective stages in a project.

Follow this link, if you are looking to learn data science online!

You can follow this link for our Big Data courseThis course will equip you with the exact skills required. 

Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course

Furthermore, if you want to read more about data science, read our Data Science Blogs

What is Quantum Computing and How is it Useful in Artificial Intelligence?

What is Quantum Computing and How is it Useful in Artificial Intelligence?


After decades of a heavy slog with no promise of success, quantum computing is suddenly buzzing! Nearly two years ago, IBM made a quantum computer available to the world. The 5-quantum-bit (qubit) resource they now call the IBM Q experience. It was more like a toy for researchers than a way of getting any serious number crunching done. But 70,000 users worldwide have registered for it, and the qubit count in this resource has now quadrupled. With so many promises by quantum computing and data science being at the helm currently, are there any offerings by quantum computing for the AI? Let us explore that in this blog!


What is Quantum Computing?

A traditional computer works on bits of data that are binary, or Boolean, with only two possible values: 0 or 1. In contrast, a quantum bit has possible values of 1, 0 or a superposition of 1 and 0. According to scientists, qubits are more like physical atoms and molecular structures. However, many find it helpful to theorize a qubit as a binary data unit with superposition.

The use of qubits makes the practical quantum computer model quite difficult. Traditional hardware requires altering to read and use these unknown values. Another idea, known as entanglement, uses quantum theory to suggest that accurate values cannot be obtained in the ways that traditional computers read binary bits. It also has been suggested that a quantum computer is based on a non-deterministic model, where the computer has more than one possible outcome for any given case or situation. Each of these ideas provides a foundation for the theory of actual quantum computing, which is still problematic in today’s tech world.


Use of Quantum Computing

Let us look at some of the use cases of the quantum computing below. This will help you understand the scale of the application of quantum computing currently.

Use cases can be: 

1. Cryptography

The most common area people associate quantum computing with is advanced cryptography. The ordinary computers we use today make it infeasible to break the encryption that uses very large prime number factorization (300+ integers). With quantum computers, this decryption could become trivial, leading to much stronger protection of our digital lives and assets. Of course, we’ll also be able to break traditional encryption much faster.

2. Aviation

Quantum technology could enable much more complex computer modelling like aeronautical scenarios. Aiding in the routing and scheduling of aircraft has enormous commercial benefits for time and costs. Large companies like Airbus and Lockheed Martin are actively researching and investing in the space to take advantage of the computing power and the optimization potential of the technology.

3. Data Analytics

Quantum mechanics and quantum computing can help solve problems on a huge scale. A field of study called topological analysis where geometric shapes behave in specific ways describes computations that are simply impossible with today’s conventional computers due to the data set used.

NASA is looking at using quantum computing for analyzing the enormous amount of data they collect about the universe, as well as research better and safer methods of space travel.

4. Forecasting

Predicting and forecasting various scenarios rely on large and complex data sets. Traditional simulation of, for example, the weather is limited in the inputs that can be handled with classical computing. If you add too many factors, then the simulation takes longer than for the actual weather to evolve.

5. Pattern Matching

Finding patterns in data and using these to predict future patterns is highly valuable. Volkswagen is currently looking into how they can use quantum computing to inform drivers of traffic conditions 45 minutes in advance. Matching traffic patterns and predicting the behaviour of a system as complex as modern day traffic is so far not possible for today’s computers, but this is going to change with quantum computers.

6. Medical Research

There are literally billions of possibilities to how something could react across the human body and even more when you consider that this could be a drug administered to billions of people, each with slight differences in their makeup.

Today, it takes pharmaceutical companies up to 10+ years and often billions of dollars to discover a new drug and bring it to market. Improving the front end of the process with quantum computing can dramatically cut costs and time to market, repurpose pre-approved drugs more easily for new applications, and empower computational chemists to make new discoveries faster that could lead to cures for a range of diseases.

7. Self-Driving Cars

Car companies like Tesla and tech companies like Apple and Google are actively developing driverless cars. Not only will these improve the standard of living for most people, but also cut pollution, reduce congestion and bring about a bunch of other benefits.


AI and Quantum Computing

Quantum computing is not a replacement for AI but you can see it more like an enhancement. AI is a major task which we are trying to solve and quantum computing helps us in optimising the sub-tasks of it. Currently, we have a limited scope of quantum computing in AI as technology is still currently new. But on a broad level, quantum computing affects the following tasks in AI

1. Simulation
Simulation modelling is the process of creating and analyzing a digital prototype of a physical model to predict its performance in the real world. It is used to help designers and engineers understand whether, under what conditions, and in which ways a part could fail and what loads it can withstand. This modelling can also help to predict fluid flow and heat transfer patterns. It analyses the approximate working conditions by applying the simulation software.

2. Optimisation
An optimization problem is a problem of finding the best solution from all feasible solutions. Optimization problems can be divided into two categories depending on whether the variables are continuous or discrete. An optimization problem with discrete variables is known as a discrete optimization. In a discrete optimization problem, we are looking for an object such as an integer, permutation or graph from a countable set. Problems with continuous variables include constrained problems and multimodal problems.

3. Sampling
Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the larger data set being examined. It enables data scientists, predictive modelers and other data analysts to work with a small, manageable amount of data about a statistical population to build and run analytical models more quickly, while still producing accurate findings.


Benefits of Quantum Computing in AI

1. Less time in training
The big advantage of quantum computing is that it allows an exponential increase in the number of dimensions it can process. While a classical perceptron can process an input of N dimensions, a quantum perceptron can process 2N dimensions.

2. Better Results
It turns out that quantum perceptron can easily classify the patterns in these simple images. We use the quantum model of perceptron as an elementary nonlinear classifier of simple patterns

3. Achieving parallelism
The earliest examples of a quantum algorithm are exponentially faster than any possible deterministic classical algorithm. Quantum computing allows solving the problem since it is capable of simultaneously evaluating f(0)and f(1). This possibility stems from ‘quantum parallelism’. The quantum parallelism allows computing 2n entries for a state consisting of n-qubits. That is: from a linear growth in the number of qubits, we can achieve exponential growth in computing space.



  • Sensitivity to interaction with the environment 
    Quantum computers are extremely sensitive to interaction with the surroundings since any interaction (or measurement) leads to a collapse of the state function. This phenomenon is called decoherence. It is extremely difficult to isolate a quantum system, especially an engineered one for a computation, without it getting entangled with the environment. The larger the number of qubits the harder is it to maintain the coherence.
  • Error-correction
    Quantum error correction (QEC) is used in quantum computing to protect quantum information from errors due to decoherence and other quantum noise. Quantum error correction is essential if one is to achieve fault-tolerant quantum computation that can deal not only with noise on stored quantum information, but also with faulty quantum gates, faulty quantum preparation, and faulty measurements. Copying quantum information is not possible due to the no-cloning theorem. This theorem seems to present an obstacle to formulating a theory of quantum error correction
  • Constraints on state preparation
    State preparation is the essential first step to be considered before the beginning of any quantum computation. In most schemes, the qubits need to be in a superposition state for the quantum computation to proceed correctly. We have a variety of problems due to the nature of superposition and entanglements, and state transition using local transformations is not realistic in a large system. Macrosystems that have been used as model quantum computing systems [14, 33,34] appear to implement not pure states but mixtures. Thus it appears that the NMR experiments do not validate the quantum algorithm.



Three decades after they were first proposed, quantum computers remain largely theoretical. Even so, there’s been some encouraging progress toward realizing a quantum machine. There’s no doubt that these are hugely important advances. and the signs are growing steadily more encouraging that quantum technology will eventually deliver a computing revolution. The potential of quantum computing in artificial intelligence will be evident soon, but still, we do not know how to translate that potential into reality. Undoubtedly, time will put things in place

Follow this link, if you are looking to learn data science online!

You can follow this link for our Big Data course!

Additionally, if you are having an interest in learning Data Science, click here to start Best Online Data Science Courses 

Furthermore, if you want to read more about data science, read our Data Science Blogs

How to Make Machine Learning Models for Beginners

Difference Between A Data Scientist and Statistician

Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2


Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2

Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2


We discussed earlier in Part 1 of Blockchain Applications of Data Science on this blog how the world could be made to become much more profitable for not just a select set of the super-rich but also to the common man, to anyone who participates in creating a digitally trackable product. We discussed how large scale adoption of cryptocurrencies and blockchain technology worldwide could herald a change in the economic demography of the world that could last for generations to come. In this article, we discuss how AI and data science can be used to tackle one of the most pressing questions of the blockchain revolution – how to model the future price of the Bitcoin cryptocurrency for trading for massive profit.

A Detour

But first, we take a short detour to explore another aspect of cryptocurrency that is not commonly talked about. Looking at the state of the world right now, it should be discussed more and I feel compelled to share this information with you before we skip to the juicy part about cryptocurrency price forecasting.

The Environmental Impact of Cryptocurrency Mining

Now, two fundamental assumptions. I assume you’ve read Part 1, which contained a link to a visual guide of how cryptocurrencies work. In case you missed the latter, here’s a link for you to check again.

The following articles speak about the impact of cryptocurrency mining on the environment. Read at least one partially at the very least so that you will understand as we progress with this article:


So cryptocurrency mining involves a huge wastage of computational resources, energy, and enough electrical power to run an entire country. This is mainly due to the model of the Proof-of-Work PoW mining system used by Bitcoin. For more, see the following article..


In PoW mining, miners compete against each other in a desperate race to see who can find the solution to a mathematical hashing problem the quickest. And in every race, only one miner is rewarded with the Bitcoin value.

image result for Ethereum goes Green
Ethereum goes Green! (From Pixabay)


In a significant step forward, Vitalin Buterik’s Ethereum cryptocurrency has shifted to Proof-of-Stake based (PoS) mining system. This makes the mining process significantly less energy intensive than PoW. Some claim the energy savings may be 99.9% more efficient than PoW. Whatever the statistics may be, a PoS based mining process is a big step forward and may completely change the way the environmentalists feel about cryptocurrencies.

So by shifting to PoS mining we can save a huge amount of energy. That is a caveat you need to remember and be aware about because Bitcoin uses PoW mining only. It would be a dream come true for an environmentalist if Bitcoin could shift to PoS mining. Let’s hope and pray that it happens.

Now back to our main topic.

Use AI and Data Science to Predict Future Prices of Cryptocurrency – Including the Burst of the Bitcoin Bubble

What is a blockchain? A distributed database that is decentralized and has no central point of control. As on Feb 2018, the Bitcoin blockchain on a full node was 160-odd GB in size. Now in April 2019, it is 210 GB in size. So this is the question I am going to pose to you. Would it be possible to use the data in the blockchain distributed database to identify patterns and statistical invariances to invest minimally with maximum possible profit? Can we forecast and build models to predict the prices of cryptocurrency in the future using AI and data science? The answer is a definite yes.

Practical Considerations

You may wonder if applying data science techniques and statistical analysis can actually produce information that can help in forecasting the future price of bitcoin. I came across a remarkable kernel on www.Kaggle.com (a website for data scientists to practice problems and compete with each other in competitions) by a user with the handle wayward artisan and the profile name Tania J. I thought it was worth sharing since this is a statistical analysis of the rise and the fall of the bitcoin bubble vividly illustrating how statistical methods helped this user to forecast the future price of bitcoin. The entire kernel is very large and interesting, please do visit it at the link given below. Just the start and the middle section of the kernel is given here because of space considerations and intellectual property considerations as well.

image result for home for data science @ kaggle
Your home for data science.

A Kaggle Kernel That Modelled the Bitcoin Bubble Burst Within Reasonable Error Limits

This following kernel uses cryptocurrency financial data scraped from www.coinmarketcap.com. It is a sobering example of how AI predictions actually predicted the collapse of the bitcoin bubble, prompting as many sellers to sell as they did. Coming across this kernel is one of the main motivations to write this article. I have omitted a lot of details, especially building the model and analyzing its accuracy. I just wanted to show that it was possible.

For more details, visit the kernel on Kaggle at the link:
https://www.kaggle.com/taniaj/cryptocurrency-price-forecasting (Please visit this page, all aspiring data scientists. And pay attention to every concept discussed and used. Use Google and Wikipedia and you will learn a lot.)

A subset of the code is given below (the first section):

<subsequent code not shown for brevity>

The dataset is available at the following link as a csv file in Microsoft Excel:


We focus on one of the middle sections with the first ARIMA model with SARIMAX (do look up Wikipedia and Google Search to learn about ARIMA and SARIMAX) which does the actual prediction at the time that the bitcoin bubble burst (only a subset of the code is shown). Visit the Kaggle kernel page on the link below this extract to get the entire code:

<data analysis and model analysis code section not shown here for brevity>

<more code, not shown>

Kaggle Code

This code and the code earlier in the kernel (not shown for the sake of brevity) that built the model for accuracy gave the following predictions as output:

Bitcoin price forecasting at the time of the burst of the Bitcoin bubble

What do we learn? Surprisingly, the model captures the Bitcoin bubble burst with a remarkably accurate prediction (error levels ~ 10%)!


So, does AI and data science have anything to do with blockchain technology and cryptocurrency? The answer is a resounding, yes. Expect data science, statistical analysis, neural networks, and probability model distributions to play a heavy part when you want to forecast cryptocurrency prices.

For all the data science students out there, I am going to include one more screen from the same kernel on Kaggle (link):

The reason I want to show you this screen is that the terms and statistical lingo like kurtosis and heteroskedasticity are statistics concepts that you need to master in order to conduct forecasts like this, the main reason being to analyze the accuracy of the model you have constructed. The output window is given below:

So yes, blockchain technology and cryptocurrencies have a lot of overlap with applications. But also remember, data science can be applied to any field where finance is a factor.

For more on blockchain and data science, see:

A Beginner’s Guide to Big Data and Blockchain

Enjoy data science!

Follow this link, if you are looking to learn data science online!

You can follow this link for our Big Data course, which is a step further into advanced data analysis and processing!

Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course

Furthermore, if you want to read more about data science, read our Data Science Blogs

A Personal Digital Assets Manager – Blockchain Applications of Data Science Part 1

A Personal Digital Assets Manager – Blockchain Applications of Data Science Part 1

The Potential of Blockchain Technology


Unless you’ve been living with your head under a rock for the last 4 years, you will definitely have heard of Bitcoin. You would also have heard about the technology behind Bitcoin, Blockchain. Now cryptocurrencies are banned in most cases in India and China, but the Americas and Europe still use cryptocurrencies extensively. And in my opinion, Asia stands to lose a lot if blockchain is not adopted extensively everywhere. Because make no mistake about it – blockchain technology will change the world as we know it. Forever.

Blockchain is the technology powering Bitcoin and other cryptocurrencies. To explain what blockchain is and what bitcoin is you can go through anyone of the articles below. Don’t worry these articles are carefully selected to be as interesting and fun to read as possible. (This also gives me space to add my own original ideas instead of copying or rewording existing articles – and I have plenty (of ideas)!

References to Understand Blockchain

For technical readers:


For non-technical readers:

Whats blockchain in plain English- Quora

For the researchers:


For those of you with no time and who like visual explanations:

Graphics Reuters- Visual Explanations

In fact, that last link is so amazingly simple visual and clear that I recommend everyone read it. Just so that we’re on the same page.

Exciting Applications

Image result for bitcoin
From Pixabay


Cut to the chase. A little confession here. I was asked to do this article nearly 16 days ago. Now I have some experience with blockchain before since having gone through it extensively as a research topic for my own blog. Then a remarkable idea hit me. An idea for a startup that could (in theory) become a multi-billion dollar enterprise. I spent a few days refining it, even going so far as to see if I could start this company with this area myself, until reality set in – I lacked the experience and the business skills.

No sooner had this realization struck me and the excitement cooled a little, another idea to improve blockchain struck me, and I promise to sketch out that idea as well. I am doing this for two reasons:

  • I am staunch support of the FOSS (free open source software movement and would like to be credited with the idea, and I am starting a free to use, open source project on GitHub – working on it, currently moving towards an alpha release as of now.  
  • I believe in the power of technology to remove economic inequality. Now you may say that technology has evolved to the point that 4-5 monolithic companies dominate the entire world. But I believe that technology when used ethically has the potential to create more opportunities than it removes.
  • Blockchain has two major problems – energy consumption and resource consumption. But there are techniques that can alleviate both of these problems. We’ll deal with that as well in Part 2.
  • Finally, the vaunted hype about security for blockchain and cryptocurrencies is ridiculous when you think about it. For the sake of brevity, I will address the main security issues with blockchain in a separate article on Medium – (not here, since it has no relation to data science).

Application – A Personal Blockchain For Every Person On The Planet

In points (I assume you’ve gone through the graphical explanation of blockchain at least – if not you can review it here):

  • The trouble with end products of all types that are produced today is that there are so many intermediaries between the producer and the consumer that the producers receives a pittance compared to the end final price. It would be nice if we could track a product everywhere that it is used.
  • This is also applicable for books, music, articles, poems, pictures, any digital content of any sort. Currently Amazon and YouTube monopolize content distribution, the latter with a complete disregard for copyright and media ownership and payment. Suppose we had a tracking system that viewed every view of a video, and rewarded the original producer for it?
  • To emphasize the previous point, let us consider the case of Lindsey Stirling. Lindsey Stirling is a famous contemporary violinist who dances while playing. Her 118 video uploads have earned her 2,575,305,706 views, 2.5 billion approx, and her earnings from YouTube ads last month was 100K a month. Her net worth as on 10th April 2019 is 12 million USD (12,000,000).
  • But suppose Lindsey Stirling distributed her videos at a price of 1 USD every view. Her net worth would be 2.6 billion USD at the very least! She would be a multi-billionaire had this platform existed. It doesn’t – yet. And because it doesn’t exist she is 2.49 billion USD poorer!

Now everyone who knows blockchain technology will now realize this idea, the concept, and how blockchain can be used to overcome this problem – and its power. Disruptive power!

The Solution

The blockchain is a service that immutably assigns ownership.

The blockchain is also a database that stores every single transaction on a particular digitisable entity.

Finally, the Ethereum smart contract technology means that we can assign payments to go to every person on his own personal blockchain of all his digitisable goods.

This means we can build a world where producer pays a user-defined amount to every entity which created a particular digitisable product.

On this platform or website or marketplace, producers can adjust their prices and their payments and consumers can buy directly from them.

Everything can be tracked on the blockchain. Your own database of your own transactions can be used with smart contracts to pay the maximum possible fee to the most deserving person in the supply chain – fixed by each producer.

Hugely, Massively Disruptive

If you are interested or want to know more, you can leave a comment below with your email address. If you want to be a part of this new revolution and the new decentralised world – with all services provided free – please provide a comment below asking for my email ID with a statement of what and how you want to contribute to this endeavor. I promise to reply to every sincere query.

This is a fledgling project and a lot of work remains to be done. I will be writing articles and creating a team to work on this idea. Those of you who are interested please mail me at thomascherickal@gmail.com.

This will be an open source project and all services have to be offered free of cost. How do you go about making a profit from this? You don’t! The only way this can be fair to all players in countries like India is if it is specially designed to be applicable to anyone.

So this article gave a small glimpse into a world without intermediaries, corporations, money-making middlemen, and running purely on smart contracts. This is applicable to AI and data science since this technology will not reach anywhere significant without extensive use of AI and data science.

The more data that is available, the more analysis can be performed on it. And unless we have analysts who are running monitoring fraud detection systems fulltime on such a system, we might as well never build it – because blockchain data integrity cannot be hacked, but cryptocurrencies are hackable and have been hacked extensively since the beginning of Bitcoin.

For Part 2 of this series on Blockchain Applications of Data Science, you can go to the link below:


For more on AI and Blockchain, I suggest that you refer to the article below:

A Beginner’s Guide to Big Data and Blockchain

and you can go to the link below to access our course on Python and R.

Data Science using R & Python

As always, enjoy artificial intelligence! You are privileged to be at the forefront of humanity’s push into the new uncharted future. All the best!