Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2

Accurate Bitcoin Price Forecasting with Python – Blockchain Applications of Data Science Part 2

Recap

We discussed earlier in Part 1 of Blockchain Applications of Data Science on this blog how the world could be made to become much more profitable for not just a select set of the super-rich but also to the common man, to anyone who participates in creating a digitally trackable product. We discussed how large scale adoption of cryptocurrencies and blockchain technology worldwide could herald a change in the economic demography of the world that could last for generations to come. In this article, we discuss how AI and data science can be used to tackle one of the most pressing questions of the blockchain revolution – how to model the future price of the Bitcoin cryptocurrency for trading for massive profit.

A Detour

But first, we take a short detour to explore another aspect of cryptocurrency that is not commonly talked about. Looking at the state of the world right now, it should be discussed more and I feel compelled to share this information with you before we skip to the juicy part about cryptocurrency price forecasting.

The Environmental Impact of Cryptocurrency Mining

Now, two fundamental assumptions. I assume you’ve read Part 1, which contained a link to a visual guide of how cryptocurrencies work. In case you missed the latter, here’s a link for you to check again.

The following articles speak about the impact of cryptocurrency mining on the environment. Read at least one partially at the very least so that you will understand as we progress with this article:

https://www.theguardian.com/technology/2018/jan/17/bitcoin-electricity-usage-huge-climate-cryptocurrency

So cryptocurrency mining involves a huge wastage of computational resources, energy, and enough electrical power to run an entire country. This is mainly due to the model of the Proof-of-Work PoW mining system used by Bitcoin. For more, see the following article..

https://www.investopedia.com/tech/whats-environmental-impact-cryptocurrency/

In PoW mining, miners compete against each other in a desperate race to see who can find the solution to a mathematical hashing problem the quickest. And in every race, only one miner is rewarded with the Bitcoin value.

image result for Ethereum goes Green
Ethereum goes Green! (From Pixabay)

 

In a significant step forward, Vitalin Buterik’s Ethereum cryptocurrency has shifted to Proof-of-Stake based (PoS) mining system. This makes the mining process significantly less energy intensive than PoW. Some claim the energy savings may be 99.9% more efficient than PoW. Whatever the statistics may be, a PoS based mining process is a big step forward and may completely change the way the environmentalists feel about cryptocurrencies.

So by shifting to PoS mining we can save a huge amount of energy. That is a caveat you need to remember and be aware about because Bitcoin uses PoW mining only. It would be a dream come true for an environmentalist if Bitcoin could shift to PoS mining. Let’s hope and pray that it happens.

Now back to our main topic.

Use AI and Data Science to Predict Future Prices of Cryptocurrency – Including the Burst of the Bitcoin Bubble

What is a blockchain? A distributed database that is decentralized and has no central point of control. As on Feb 2018, the Bitcoin blockchain on a full node was 160-odd GB in size. Now in April 2019, it is 210 GB in size. So this is the question I am going to pose to you. Would it be possible to use the data in the blockchain distributed database to identify patterns and statistical invariances to invest minimally with maximum possible profit? Can we forecast and build models to predict the prices of cryptocurrency in the future using AI and data science? The answer is a definite yes.

Practical Considerations

You may wonder if applying data science techniques and statistical analysis can actually produce information that can help in forecasting the future price of bitcoin. I came across a remarkable kernel on www.Kaggle.com (a website for data scientists to practice problems and compete with each other in competitions) by a user with the handle wayward artisan and the profile name Tania J. I thought it was worth sharing since this is a statistical analysis of the rise and the fall of the bitcoin bubble vividly illustrating how statistical methods helped this user to forecast the future price of bitcoin. The entire kernel is very large and interesting, please do visit it at the link given below. Just the start and the middle section of the kernel is given here because of space considerations and intellectual property considerations as well.

image result for home for data science @ kaggle
Your home for data science.

A Kaggle Kernel That Modelled the Bitcoin Bubble Burst Within Reasonable Error Limits

This following kernel uses cryptocurrency financial data scraped from www.coinmarketcap.com. It is a sobering example of how AI predictions actually predicted the collapse of the bitcoin bubble, prompting as many sellers to sell as they did. Coming across this kernel is one of the main motivations to write this article. I have omitted a lot of details, especially building the model and analyzing its accuracy. I just wanted to show that it was possible.

For more details, visit the kernel on Kaggle at the link:
https://www.kaggle.com/taniaj/cryptocurrency-price-forecasting (Please visit this page, all aspiring data scientists. And pay attention to every concept discussed and used. Use Google and Wikipedia and you will learn a lot.)

A subset of the code is given below (the first section):

<subsequent code not shown for brevity>

The dataset is available at the following link as a csv file in Microsoft Excel:

 

We focus on one of the middle sections with the first ARIMA model with SARIMAX (do look up Wikipedia and Google Search to learn about ARIMA and SARIMAX) which does the actual prediction at the time that the bitcoin bubble burst (only a subset of the code is shown). Visit the Kaggle kernel page on the link below this extract to get the entire code:

<data analysis and model analysis code section not shown here for brevity>

<more code, not shown>

From
Kaggle Code

This code and the code earlier in the kernel (not shown for the sake of brevity) that built the model for accuracy gave the following predictions as output:

Bitcoin price forecasting at the time of the burst of the Bitcoin bubble

What do we learn? Surprisingly, the model captures the Bitcoin bubble burst with a remarkably accurate prediction (error levels ~ 10%)!

Conclusion

So, does AI and data science have anything to do with blockchain technology and cryptocurrency? The answer is a resounding, yes. Expect data science, statistical analysis, neural networks, and probability model distributions to play a heavy part when you want to forecast cryptocurrency prices.

For all the data science students out there, I am going to include one more screen from the same kernel on Kaggle (link):

The reason I want to show you this screen is that the terms and statistical lingo like kurtosis and heteroskedasticity are statistics concepts that you need to master in order to conduct forecasts like this, the main reason being to analyze the accuracy of the model you have constructed. The output window is given below:

So yes, blockchain technology and cryptocurrencies have a lot of overlap with applications. But also remember, data science can be applied to any field where finance is a factor.

For more on blockchain and data science, see:

A Beginner’s Guide to Big Data and Blockchain

Enjoy data science!

Follow this link, if you are looking to learn data science online!

You can follow this link for our Big Data course, which is a step further into advanced data analysis and processing!

Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course

Furthermore, if you want to read more about data science, read our Data Science Blogs

A Personal Digital Assets Manager – Blockchain Applications of Data Science Part 1

A Personal Digital Assets Manager – Blockchain Applications of Data Science Part 1

The Potential of Blockchain Technology

 

Unless you’ve been living with your head under a rock for the last 4 years, you will definitely have heard of Bitcoin. You would also have heard about the technology behind Bitcoin, Blockchain. Now cryptocurrencies are banned in most cases in India and China, but the Americas and Europe still use cryptocurrencies extensively. And in my opinion, Asia stands to lose a lot if blockchain is not adopted extensively everywhere. Because make no mistake about it – blockchain technology will change the world as we know it. Forever.

Blockchain is the technology powering Bitcoin and other cryptocurrencies. To explain what blockchain is and what bitcoin is you can go through anyone of the articles below. Don’t worry these articles are carefully selected to be as interesting and fun to read as possible. (This also gives me space to add my own original ideas instead of copying or rewording existing articles – and I have plenty (of ideas)!

References to Understand Blockchain

For technical readers:

Wiki-Blockchain

For non-technical readers:

Whats blockchain in plain English- Quora

For the researchers:

NCBI-Article

For those of you with no time and who like visual explanations:

Graphics Reuters- Visual Explanations

In fact, that last link is so amazingly simple visual and clear that I recommend everyone read it. Just so that we’re on the same page.

Exciting Applications

Image result for bitcoin
From Pixabay

 

Cut to the chase. A little confession here. I was asked to do this article nearly 16 days ago. Now I have some experience with blockchain before since having gone through it extensively as a research topic for my own blog. Then a remarkable idea hit me. An idea for a startup that could (in theory) become a multi-billion dollar enterprise. I spent a few days refining it, even going so far as to see if I could start this company with this area myself, until reality set in – I lacked the experience and the business skills.

No sooner had this realization struck me and the excitement cooled a little, another idea to improve blockchain struck me, and I promise to sketch out that idea as well. I am doing this for two reasons:

  • I am staunch support of the FOSS (free open source software movement and would like to be credited with the idea, and I am starting a free to use, open source project on GitHub – working on it, currently moving towards an alpha release as of now.  
  • I believe in the power of technology to remove economic inequality. Now you may say that technology has evolved to the point that 4-5 monolithic companies dominate the entire world. But I believe that technology when used ethically has the potential to create more opportunities than it removes.
  • Blockchain has two major problems – energy consumption and resource consumption. But there are techniques that can alleviate both of these problems. We’ll deal with that as well in Part 2.
  • Finally, the vaunted hype about security for blockchain and cryptocurrencies is ridiculous when you think about it. For the sake of brevity, I will address the main security issues with blockchain in a separate article on Medium – (not here, since it has no relation to data science).

Application – A Personal Blockchain For Every Person On The Planet

In points (I assume you’ve gone through the graphical explanation of blockchain at least – if not you can review it here):

  • The trouble with end products of all types that are produced today is that there are so many intermediaries between the producer and the consumer that the producers receives a pittance compared to the end final price. It would be nice if we could track a product everywhere that it is used.
  • This is also applicable for books, music, articles, poems, pictures, any digital content of any sort. Currently Amazon and YouTube monopolize content distribution, the latter with a complete disregard for copyright and media ownership and payment. Suppose we had a tracking system that viewed every view of a video, and rewarded the original producer for it?
  • To emphasize the previous point, let us consider the case of Lindsey Stirling. Lindsey Stirling is a famous contemporary violinist who dances while playing. Her 118 video uploads have earned her 2,575,305,706 views, 2.5 billion approx, and her earnings from YouTube ads last month was 100K a month. Her net worth as on 10th April 2019 is 12 million USD (12,000,000).
  • But suppose Lindsey Stirling distributed her videos at a price of 1 USD every view. Her net worth would be 2.6 billion USD at the very least! She would be a multi-billionaire had this platform existed. It doesn’t – yet. And because it doesn’t exist she is 2.49 billion USD poorer!

Now everyone who knows blockchain technology will now realize this idea, the concept, and how blockchain can be used to overcome this problem – and its power. Disruptive power!

The Solution

The blockchain is a service that immutably assigns ownership.

The blockchain is also a database that stores every single transaction on a particular digitisable entity.

Finally, the Ethereum smart contract technology means that we can assign payments to go to every person on his own personal blockchain of all his digitisable goods.

This means we can build a world where producer pays a user-defined amount to every entity which created a particular digitisable product.

On this platform or website or marketplace, producers can adjust their prices and their payments and consumers can buy directly from them.

Everything can be tracked on the blockchain. Your own database of your own transactions can be used with smart contracts to pay the maximum possible fee to the most deserving person in the supply chain – fixed by each producer.

Hugely, Massively Disruptive

If you are interested or want to know more, you can leave a comment below with your email address. If you want to be a part of this new revolution and the new decentralised world – with all services provided free – please provide a comment below asking for my email ID with a statement of what and how you want to contribute to this endeavor. I promise to reply to every sincere query.

This is a fledgling project and a lot of work remains to be done. I will be writing articles and creating a team to work on this idea. Those of you who are interested please mail me at thomascherickal@gmail.com.

This will be an open source project and all services have to be offered free of cost. How do you go about making a profit from this? You don’t! The only way this can be fair to all players in countries like India is if it is specially designed to be applicable to anyone.

So this article gave a small glimpse into a world without intermediaries, corporations, money-making middlemen, and running purely on smart contracts. This is applicable to AI and data science since this technology will not reach anywhere significant without extensive use of AI and data science.

The more data that is available, the more analysis can be performed on it. And unless we have analysts who are running monitoring fraud detection systems fulltime on such a system, we might as well never build it – because blockchain data integrity cannot be hacked, but cryptocurrencies are hackable and have been hacked extensively since the beginning of Bitcoin.

For Part 2 of this series on Blockchain Applications of Data Science, you can go to the link below:

https://dimensionless.in/how-a-kaggle-jupyter-notebook-with-a-python-kernel-correctly-predicted-the-burst-of-the-bitcoin-bubble-with-reasonable-accuracy

For more on AI and Blockchain, I suggest that you refer to the article below:

A Beginner’s Guide to Big Data and Blockchain

and you can go to the link below to access our course on Python and R.

Data Science using R & Python

As always, enjoy artificial intelligence! You are privileged to be at the forefront of humanity’s push into the new uncharted future. All the best!
The Upcoming Revolution in Predictive Analytics (And Data Science)

The Upcoming Revolution in Predictive Analytics (And Data Science)

The Next Generation of Data Science

Quite literally, I am stunned.

I have just completed my survey of data (from articles, blogs, white papers, university websites, curated tech websites, and research papers all available online) about predictive analytics.

And I have a reason to believe that we are standing on the brink of a revolution that will transform everything we know about data science and predictive analytics.

But before we go there, you need to know: why the hype about predictive analytics? What is predictive analytics?

Let’s cover that first.

 Importance of Predictive Analytics

Black Samsung Tablet Computer

By PhotoMix Ltd

 

According to Wikipedia:

Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. The enhancement of predictive web analytics calculates statistical probabilities of future events online. Predictive analytics statistical techniques include data modeling, machine learning, AI, deep learning algorithms and data mining.

Predictive analytics is why every business wants data scientists. Analytics is not just about answering questions, it is also about finding the right questions to answer. The applications for this field are many, nearly every human endeavor can be listed in the excerpt from Wikipedia that follows listing the applications of predictive analytics:

From Wikipedia:

Predictive analytics is used in actuarial science, marketing, financial services, insurance, telecommunications, retail, travel, mobility, healthcare, child protection, pharmaceuticals, capacity planning, social networking, and a multitude of numerous other fields ranging from the military to online shopping websites, Internet of Things (IoT), and advertising.

In a very real sense, predictive analytics means applying data science models to given scenarios that forecast or generate a score of the likelihood of an event occurring. The data generated today is so voluminous that experts estimate that less than 1% is actually used for analysis, optimization, and prediction. In the case of Big Data, that estimate falls to 0.01% or less.

Common Example Use-Cases of Predictive Analytics

 

Components of Predictive Analytics

Components of Predictive Analytics

 

A skilled data scientist can utilize the prediction scores to optimize and improve the profit margin of a business or a company by a massive amount. For example:

  • If you buy a book for children on the Amazon website, the website identifies that you have an interest in that author and that genre and shows you more books similar to the one you just browsed or purchased.
  • YouTube also has a very similar algorithm behind its video suggestions when you view a particular video. The site identifies (or rather, the analytics algorithms running on the site identifies) more videos that you would enjoy watching based upon what you are watching now. In ML, this is called a recommender system.
  • Netflix is another famous example where recommender systems play a massive role in the suggestions for ‘shows you may like’ section, and the recommendations are well-known for their accuracy in most cases
  • Google AdWords (text ads at the top of every Google Search) that are displayed is another example of a machine learning algorithm whose usage can be classified under predictive analytics.
  • Departmental stores often optimize products so that common groups are easy to find. For example, the fresh fruits and vegetables would be close to the health foods supplements and diet control foods that weight-watchers commonly use. Coffee/tea/milk and biscuits/rusks make another possible grouping. You might think this is trivial, but department stores have recorded up to 20% increase in sales when such optimal grouping and placement was performed – again, through a form of analytics.
  • Bank loans and home loans are often approved with the credit scores of a customer. How is that calculated? An expert system of rules, classification, and extrapolation of existing patterns – you guessed it – using predictive analytics.
  • Allocating budgets in a company to maximize the total profit in the upcoming year is predictive analytics. This is simple at a startup, but imagine the situation in a company like Google, with thousands of departments and employees, all clamoring for funding. Predictive Analytics is the way to go in this case as well.
  • IoT (Internet of Things) smart devices are one of the most promising applications of predictive analytics. It will not be too long before the sensor data from aircraft parts use predictive analytics to tell its operators that it has a high likelihood of failure. Ditto for cars, refrigerators, military equipment, military infrastructure and aircraft, anything that uses IoT (which is nearly every embedded processing device available in the 21st century).
  • Fraud detection, malware detection, hacker intrusion detection, cryptocurrency hacking, and cryptocurrency theft are all ideal use cases for predictive analytics. In this case, the ML system detects anomalous behavior on an interface used by the hackers and cybercriminals to identify when a theft or a fraud is taking place, has taken place, or will take place in the future. Obviously, this is a dream come true for law enforcement agencies.

So now you know what predictive analytics is and what it can do. Now let’s come to the revolutionary new technology.

Meet Endor – The ‘Social Physics’ Phenomenon

 

Image result for endor image free to use

End-to-End Predictive Analytics Product – for non-tech users!

 

In a remarkable first, a research team at MIT, USA have created a new science called social physics, or sociophysics. Now, much about this field is deliberately kept highly confidential, because of its massive disruptive power as far as data science is concerned, especially predictive analytics. The only requirement of this science is that the system being modeled has to be a human-interaction based environment. To keep the discussion simple, we shall explain the entire system in points.

  • All systems in which human beings are involved follow scientific laws.
  • These laws have been identified, verified experimentally and derived scientifically.
  • Bylaws we mean equations, such as (just an example) Newton’s second law: F = m.a (Force equals mass times acceleration)
  • These equations establish laws of invariance – that are the same regardless of which human-interaction system is being modeled.
  • Hence the term social physics – like Maxwell’s laws of electromagnetism or Newton’s theory of gravitation, these laws are a new discovery that are universal as long as the agents interacting in the system are humans.
  • The invariance and universality of these laws have two important consequences:
    1. The need for large amounts of data disappears – Because of the laws, many of the predictive capacities of the model can be obtained with a minimal amount of data. Hence small companies now have the power to use analytics that was mostly used by the FAMGA (Facebook, Amazon, Microsoft, Google, Apple) set of companies since they were the only ones with the money to maintain Big Data warehouses and data lakes.
    2. There is no need for data cleaning. Since the model being used is canonical, it is independent of data problems like outliers, missing data, nonsense data, unavailable data, and data corruption. This is due to the orthogonality of the model ( a Knowledge Sphere) being constructed and the data available.
  • Performance is superior to deep learning, Google TensorFlow, Python, R, Julia, PyTorch, and scikit-learn. Consistently, the model has outscored the latter models in Kaggle competitions, without any data pre-processing or data preparation and cleansing!
  • Data being orthogonal to interpretation and manipulation means that encrypted data can be used as-is. There is no need to decrypt encrypted data to perform a data science task or experiment. This is significant because the independence of the model functioning even for encrypted data opens the door to blockchain technology and blockchain data to be used in standard data science tasks. Furthermore, this allows hashing techniques to be used to hide confidential data and perform the data mining task without any knowledge of what the data indicates.

Are You Serious?

Image result for OMG image

That’s a valid question given these claims! And that is why I recommend everyone who has the slightest or smallest interest in data science to visit and completely read and explore the following links:

  1. https://www.endor.com
  2. https://www.endor.com/white-paper
  3. http://socialphysics.media.mit.edu/
  4. https://en.wikipedia.org/wiki/Social_physics

Now when I say completely read, I mean completely read. Visit every section and read every bit of text that is available on the three sites above. You will soon understand why this is such a revolutionary idea.

  1. https://ssir.org/book_reviews/entry/going_with_the_idea_flow#
  2. https://www.datanami.com/2014/05/21/social-physics-harnesses-big-data-predict-human-behavior/

These links above are articles about the social physics book and about the science of sociophysics in general.

For more details, please visit the following articles on Medium. These further document Endor.coin, a cryptocurrency built around the idea of sharing data with the public and getting paid for using the system and usage of your data. Preferably, read all, if busy, at least read Article No, 1.

  1. https://medium.com/endor/ama-session-with-prof-alex-sandy-pentland
  2. https://medium.com/endor/endor-token-distribution
  3. https://medium.com/endor/https-medium-com-endor-paradigm-shift-ai-predictive-analytics
  4. https://medium.com/endor/unleash-the-power-of-your-data

Operation of the Endor System

Upon every data set, the first action performed by the Endor Analytics Platform is clustering, also popularly known as automatic classification. Endor constructs what is known as a Knowledge Sphere, a canonical representation of the data set which can be constructed even with 10% of the data volume needed for the same project when deep learning was used.

Creation of the Knowledge Sphere takes 1-4 hours for a billion records dataset (which is pretty standard these days).

Now an explanation of the mathematics behind social physics is beyond our scope, but I will include the change in the data science process when the Endor platform was compared to a deep learning system built to solve the same problem the traditional way (with a 6-figure salary expert data scientist).

An edited excerpt from Link here

From Appendix A: Social Physics Explained, Section 3.1, pages 28-34 (some material not included):

Prediction Demonstration using the Endor System:

Data:
The data that was used in this example originated from a retail financial investment platform
and contained the entire investment transactions of members of an investment community.
The data was anonymized and made public for research purposes at MIT (the data can be
shared upon request).

 

Summary of the dataset:
– 7 days of data
– 3,719,023 rows
– 178,266 unique users

 

Automatic Clusters Extraction:
Upon first analysis of the data the Endor system detects and extracts “behavioral clusters” – groups of
users whose data dynamics violates the mathematical invariances of the Social Physics. These clusters
are based on all the columns of the data, but is limited only to the last 7 days – as this is the data that
was provided to the system as input.

 

Behavioural Clusters Summary

Number of clusters:268,218
Clusters sizes: 62 (Mean), 15 (Median), 52508 (Max), 5 (Min)
Clusters per user:164 (Mean), 118 (Median), 703 (Max), 2 (Min)
Users in clusters: 102,770 out of the 178,266 users
Records per user: 6 (Median), 33 (Mean): applies only to users in clusters

 

Prediction Queries
The following prediction queries were defined:
1. New users to become “whales”: users who joined in the last 2 weeks that will generate at least
$500 in commission in the next 90 days
2. Reducing activity : users who were active in the last week that will reduce activity by 50% in the
next 30 days (but will not churn, and will still continue trading)
3. Churn in “whales”: currently active “whales” (as defined by their activity during the last 90 days),
who were active in the past week, to become inactive for the next 30 days
4. Will trade in Apple share for the first time: users who had never invested in Apple share, and
would buy it for the first time in the coming 30 days

 

Knowledge Sphere Manifestation of Queries
It is again important to note that the definition of the search queries is completely orthogonal to the
extraction of behavioral clusters and the generation of the Knowledge Sphere, which was done
independently of the queries definition.

Therefore, it is interesting to analyze the manifestation of the queries in the clusters detected by the system: Do the clusters contain information that is relevant to the definition of the queries, despite the fact that:

1. The clusters were extracted in a fully automatic way, using no semantic information about the
data, and –

2. The queries were defined after the clusters were extracted, and did not affect this process.

This analysis is done by measuring the number of clusters that contain a very high concentration of
“samples”; In other words, by looking for clusters that contain “many more examples than statistically
expected”.

A high number of such clusters (provided that it is significantly higher than the amount
received when randomly sampling the same population) proves the ability of this process to extract
valuable relevant semantic insights in a fully automatic way.

 

Comparison to Google TensorFlow

In this section a comparison between prediction process of the Endor system and Google’s
TensorFlow is presented. It is important to note that TensorFlow, like any other Deep Learning library,
faces some difficulties when dealing with data similar to the one under discussion:

1. An extremely uneven distribution of the number of records per user requires some canonization
of the data, which in turn requires:

2. Some manual work, done by an individual who has at least some understanding of data
science.

3. Some understanding of the semantics of the data, that requires an investment of time, as
well as access to the owner or provider of the data

4. A single-class classification, using an extremely uneven distribution of positive vs. negative
samples, tends to lead to the overfitting of the results and require some non-trivial maneuvering.

This again necessitates the involvement of an expert in Deep Learning (unlike the Endor system
which can be used by Business, Product or Marketing experts, with no perquisites in Machine
Learning or Data Science).

 

Traditional Methods

An expert in Deep Learning spent 2 weeks crafting a solution that would be based
on TensorFlow and has sufficient expertise to be able to handle the data. The solution that was created
used the following auxiliary techniques:

1.Trimming the data sequence to 200 records per customer, and padding the streams for users
who have less than 200 records with neutral records.

2.Creating 200 training sets, each having 1,000 customers (50% known positive labels, 50%
unknown) and then using these training sets to train the model.

3.Using sequence classification (RNN with 128 LSTMs) with 2 output neurons (positive,
negative), with the overall result being the difference between the scores of the two.

Observations (all statistics available in the white paper – and it’s stunning)

1.Endor outperforms Tensor Flow in 3 out of 4 queries, and results in the same accuracy in the 4th
.
2.The superiority of Endor is increasingly evident as the task becomes “more difficult” – focusing on
the top-100 rather than the top-500.

3.There is a clear distinction between “less dynamic queries” (becoming a whale, churn, reduce
activity” – for which static signals should likely be easier to detect) than the “Who will trade in
Apple for the first time” query, which are (a) more dynamic, and (b) have a very low baseline, such
that for the latter, Endor is 10x times more accurate!

4.As previously mentioned – the Tensor Flow results illustrated here employ 2 weeks of manual
improvements done by a Deep Learning expert, whereas the Endor results are 100% automatic and the entire prediction process in Endor took 4 hours.

Clearly, the path going forward for predictive analytics and data science is Endor, Endor, and Endor again!

Predictions for the Future

Personally, one thing has me sold – the robustness of the Endor system to handle noise and missing data. Earlier, this was the biggest bane of the data scientist in most companies (when data engineers are not available). 90% of the time of a professional data scientist would go into data cleaning and data preprocessing since our ML models were acutely sensitive to noise. This is the first solution that has eliminated this ‘grunt’ level work from data science completely.

The second prediction: the Endor system works upon principles of human interaction dynamics. My intuition tells me that data collected at random has its own dynamical systems that appear clearly to experts in complexity theory. I am completely certain that just as this tool developed a prediction tool with human society dynamical laws, data collected in general has its own laws of invariance. And the first person to identify these laws and build another Endor-style platform on them will be at the top of the data science pyramid – the alpha unicorn.

Final prediction – democratizing data science means that now data scientists are not required to have six-figure salaries. The success of the Endor platform means that anyone can perform advanced data science without resorting to TensorFlow, Python, R, Anaconda, etc. This platform will completely disrupt the entire data science technological sector. The first people to master it and build upon it to formalize the rules of invariance in the case of general data dynamics will for sure make a killing.

It is an exciting time to be a data science researcher!

Data Science is a broad field and it would require quite a few things to learn to master all these skills.

Dimensionless has several resources to get started with.

To Learn Data Science, Get Data Science Training in Pune and Mumbai from Dimensionless Technologies.

To learn more about analytics, be sure to have a look at the following articles on this blog:

Machine Learning for Transactional Analytics

and

Text Analytics and its applications

Enjoy data science!

Data Science: What to Expect in 2019

Data Science: What to Expect in 2019

Introduction

2019 looks to be the year of using smarter technology in a smarter way. Three key trends — artificial intelligence systems becoming a serious component in enterprise tools, custom hardware breaking out for special use-cases, and a rethink on data science and its utility — will all combine into a common theme.

In recent years, we’ve seen all manner of jaw-dropping technology, but the emphasis has been very much on what these gadgets and systems can do and how they do it, with much less attention paid to why.

In this blog, we will explore different areas in data science and figure out our expectations in 2019 in them. Areas include machine learning, AR/VR systems, edge computing etc. Let us go through them one by one

Machine Learning/Deep Learning

Businesses are using machine learning to improve all sorts of outcomes, from optimizing operational workflows and increasing customer satisfaction to discovering to a new competitive differentiator. But, now all the hype around AI is settling. Machine learning is not a cool term anymore. Furthermore, organisations are looking for more ways of identifying more options in the form of agent modelling. Apart from this, more adoption of these algorithms looks very feasible now. Adoption will be seen in new and old industries

Healthcare companies are already big users of AI, and this trend will continue. According to Accenture, the AI healthcare market might hit $6.6 billion by 2021, and clinical health AI applications can create $150 billion in annual savings for the U.S. healthcare economy by 2026.

In retail, global spending on AI will grow to $7.3 billion a year by 2022, up from $2 billion in 2018, according to Juniper Research. This is because companies will invest heavily in AI tools that will help them differentiate and improve the services they offer customers.

In cybersecurity, the adoption of AI brings a boom in startups that are able to raised$3.65 billion in equity funding in the last five years. Cyber AI can help security experts sort through millions of incidents to identify aberrations, risks, and signals of future threats.

And there is even an opportunity brewing in industries facing labour shortages, such as transportation. At the end of 2017, there was a shortage of 51,000 truck drivers (up from a shortage of 36,000 the previous year). And the ATA reports that the trucking industry will need to hire 900,000 more drivers in the next 10 years to keep up with demand. AI-driven autonomous vehicles could help relieve the need for more drivers in the future.

Programming Language

The practice of data science requires the use of analytics tools, technologies and programming languages to help data professionals extract insights and value from data. A recent survey of nearly 24,000 data professionals by Kaggle suggests that Python, SQL and R are the most popular programming languages. The most popular, by far, was Python (83%). Additionally, 3 out of 4 data professionals recommended that aspiring data scientists learn Python first.

Survey results show that 3 out of 4 data professionals would recommend Python as the programming language aspiring data scientists to learn first. The remaining programming languages are recommended at a significantly lower rate (R recommended by 12% of respondents; SQL by 5% of respondents. Anyhow, Python will also boom more in 2019. But, R community too have come up with a lot of recent advancements. With new packages and improvements, R is expected to come closer to python in terms of usage.

Blockchain and Big Data

In recent years, the blockchain is at the heart of computer technologies. It is a cryptographically secure distributed database technology for storing and transmitting information. The main advantage of the blockchain is that it is decentralized. In fact, no one controls the data entering or their integrity. However, these checks run through various computers on the network. These different machines hold the same information. In fact, faulty data on one computer cannot enter the chain because it will not match the equivalent data held by the other machines. To put it simply, as long as the network exists, the information remains in the same state.

Big Data analytics will be essential for tracking transactions and enabling businesses that use the Blockchain to make better decisions. That’s why new Data Intelligence services are emerging to help financial institutions and governments and other businesses discover who they interact with within the Blockchain and discover hidden patterns.

Augmented-Reality/Virtual Reality

The broader the canvas of visualization is, the better the understanding is. That’s exactly what happens when one visualizes big data through the Augmented Reality (AR) and Virtual Reality (VR). A combination of AR and VR could open a world of possibilities to better utilize the data at hand. VR and AR can practically improve the way we perceive data and could actually be the solution to make use of the large unused data.

By presenting the data in the form of 3D, the user will be able to decipher the major takeaways from the data better and faster with easier understanding. Many recent types of research show that the VR and AR has a high sensory impact which promotes faster learning and understanding.

This immersive way of representation of the data enables the analysts to handle the big data more efficiently. It makes the analysis and interpretation more of an experience and realisation that the traditional analysis. Instead of the user seeing numbers and figures, the person will be able to see beyond it and into the facts, happenings and reasons which could revolutionize the businesses.

Edge Computing

Computing infrastructure is an ever-changing landscape of technol­ogy advancements. Current changes affect the way companies deploy smart manufacturing systems to make the most of advancements.

The rise of edge computing capabilities coupled with tradi­tional industrial control system (ICS) architectures provides increasing levels of flexibility. In addition, time-synchronized applications and analytics augment the need for larger Big Data operations in the cloud. This is regardless of cloud premise.

Edge is still in early stage adoption. But, one thing is clear that edge devices are subject to large-scale investments from cloud suppliers to offload bandwidth. Also, there are latency issues due to an explosion of the IoT data in both industrial and commercial applications.

Edge soon will likely increase in adoption where users have questions about the cloud’s specific use case. Cloud-level interfaces and apps will migrate to the edge. Industrial application hosting and analytics will become common at the edge. This will happen using virtual servers and simplified operational technology-friendly hardware and software.

The Rise of Semi-Automated Tools for Data Science

There has been a rise of self-service BI tools such as Tableau, Qlik Sense, Power BI, and Domo. Furthermore, now managers can obtain current business information in graphical form on demand. Although, IT may need to set up a certain amount of setup at the outset. Also, when adding a data source, most of the data cleaning work and analysis can be done by analysts. The analyses can update automatically from the latest data any time they are opened.

Managers can then interact with the analyses graphically to identify issues that need to be addressed. In a BI-generated dashboard or “story” about sales numbers, that might mean drilling down to find underperforming stores, salespeople, and products, or discovering trends in year-over-year same-store comparisons. These discoveries might in turn guide decisions about future stocking levels, product sales and promotions. Also, they may determine the building of additional stores in under-served areas.

Upgrade in Job Roles

In recent times, there have been a lot of advancements in the data science industry. With these advancements, different businesses are in better shape to extract much more value out of their data. With an increase in expectation, there is a shift in the roles of both data scientists and business analysts now. The data scientists should move from statistical focus phase to more of a research phase. But the business analysts are now filling in the gap left by data scientists and are taking their roles up.

We can see it as an upgrade in both the job roles. Business analysts now hold the business angle firm but are also handling the statistical and technical part of the things too. Business analysts are now more into predictive analytics. They are at a stage now where they can use off-the-shelf algorithms for predictions in their business domains. BA’s are not only for reporting and business mindset but now are more into the prescriptive analytics too. They are handling the role of model building, data warehousing and statistical analysing.

Summary

How this question is answered will be fascinating to watch. It could be that the data science field has to completely overhaul what it can offer, overcoming seeming off-limit barriers. Alternatively, it could be that businesses discover their expectations can’t be met and have to adjust to this reality in a productive manner rather than get bogged down in frustration.

In conclusion, 2019 promises to be a year where smart systems make further inroads into our personal and professional lives. More importantly, I expect our professional lives to get more sophisticated with a variety of agents and systems helping us get more of out of our time in the office!

Follow this link, if you are looking to learn more about data science online!

You can follow this link for our Big Data course!

Additionally, if you are having an interest in learning Data Science, click here to start

Furthermore, if you want to read more about data science, you can read our blogs here

Also, the following are some blogs you may like to read

Big Data and Blockchain

What is Predictive Model Performance Evaluation

AI and intelligent applications