A Personal Digital Assets Manager – Blockchain Applications of Data Science Part 1

A Personal Digital Assets Manager – Blockchain Applications of Data Science Part 1

The Potential of Blockchain Technology

 

Unless you’ve been living with your head under a rock for the last 4 years, you will definitely have heard of Bitcoin. You would also have heard about the technology behind Bitcoin, Blockchain. Now cryptocurrencies are banned in most cases in India and China, but the Americas and Europe still use cryptocurrencies extensively. And in my opinion, Asia stands to lose a lot if blockchain is not adopted extensively everywhere. Because make no mistake about it – blockchain technology will change the world as we know it. Forever.

Blockchain is the technology powering Bitcoin and other cryptocurrencies. To explain what blockchain is and what bitcoin is you can go through anyone of the articles below. Don’t worry these articles are carefully selected to be as interesting and fun to read as possible. (This also gives me space to add my own original ideas instead of copying or rewording existing articles – and I have plenty (of ideas)!

References to Understand Blockchain

For technical readers:

Wiki-Blockchain

For non-technical readers:

Whats blockchain in plain English- Quora

For the researchers:

NCBI-Article

For those of you with no time and who like visual explanations:

Graphics Reuters- Visual Explanations

In fact, that last link is so amazingly simple visual and clear that I recommend everyone read it. Just so that we’re on the same page.

Exciting Applications

Image result for bitcoin
From Pixabay

 

Cut to the chase. A little confession here. I was asked to do this article nearly 16 days ago. Now I have some experience with blockchain before since having gone through it extensively as a research topic for my own blog. Then a remarkable idea hit me. An idea for a startup that could (in theory) become a multi-billion dollar enterprise. I spent a few days refining it, even going so far as to see if I could start this company with this area myself, until reality set in – I lacked the experience and the business skills.

No sooner had this realization struck me and the excitement cooled a little, another idea to improve blockchain struck me, and I promise to sketch out that idea as well. I am doing this for two reasons:

  • I am staunch support of the FOSS (free open source software movement and would like to be credited with the idea, and I am starting a free to use, open source project on GitHub – working on it, currently moving towards an alpha release as of now.  
  • I believe in the power of technology to remove economic inequality. Now you may say that technology has evolved to the point that 4-5 monolithic companies dominate the entire world. But I believe that technology when used ethically has the potential to create more opportunities than it removes.
  • Blockchain has two major problems – energy consumption and resource consumption. But there are techniques that can alleviate both of these problems. We’ll deal with that as well in Part 2.
  • Finally, the vaunted hype about security for blockchain and cryptocurrencies is ridiculous when you think about it. For the sake of brevity, I will address the main security issues with blockchain in a separate article on Medium – (not here, since it has no relation to data science).

Application – A Personal Blockchain For Every Person On The Planet

In points (I assume you’ve gone through the graphical explanation of blockchain at least – if not you can review it here):

  • The trouble with end products of all types that are produced today is that there are so many intermediaries between the producer and the consumer that the producers receives a pittance compared to the end final price. It would be nice if we could track a product everywhere that it is used.
  • This is also applicable for books, music, articles, poems, pictures, any digital content of any sort. Currently Amazon and YouTube monopolize content distribution, the latter with a complete disregard for copyright and media ownership and payment. Suppose we had a tracking system that viewed every view of a video, and rewarded the original producer for it?
  • To emphasize the previous point, let us consider the case of Lindsey Stirling. Lindsey Stirling is a famous contemporary violinist who dances while playing. Her 118 video uploads have earned her 2,575,305,706 views, 2.5 billion approx, and her earnings from YouTube ads last month was 100K a month. Her net worth as on 10th April 2019 is 12 million USD (12,000,000).
  • But suppose Lindsey Stirling distributed her videos at a price of 1 USD every view. Her net worth would be 2.6 billion USD at the very least! She would be a multi-billionaire had this platform existed. It doesn’t – yet. And because it doesn’t exist she is 2.49 billion USD poorer!

Now everyone who knows blockchain technology will now realize this idea, the concept, and how blockchain can be used to overcome this problem – and its power. Disruptive power!

The Solution

The blockchain is a service that immutably assigns ownership.

The blockchain is also a database that stores every single transaction on a particular digitisable entity.

Finally, the Ethereum smart contract technology means that we can assign payments to go to every person on his own personal blockchain of all his digitisable goods.

This means we can build a world where producer pays a user-defined amount to every entity which created a particular digitisable product.

On this platform or website or marketplace, producers can adjust their prices and their payments and consumers can buy directly from them.

Everything can be tracked on the blockchain. Your own database of your own transactions can be used with smart contracts to pay the maximum possible fee to the most deserving person in the supply chain – fixed by each producer.

Hugely, Massively Disruptive

If you are interested or want to know more, you can leave a comment below with your email address. If you want to be a part of this new revolution and the new decentralised world – with all services provided free – please provide a comment below asking for my email ID with a statement of what and how you want to contribute to this endeavor. I promise to reply to every sincere query.

This is a fledgling project and a lot of work remains to be done. I will be writing articles and creating a team to work on this idea. Those of you who are interested please mail me at thomascherickal@gmail.com.

This will be an open source project and all services have to be offered free of cost. How do you go about making a profit from this? You don’t! The only way this can be fair to all players in countries like India is if it is specially designed to be applicable to anyone.

So this article gave a small glimpse into a world without intermediaries, corporations, money-making middlemen, and running purely on smart contracts. This is applicable to AI and data science since this technology will not reach anywhere significant without extensive use of AI and data science.

The more data that is available, the more analysis can be performed on it. And unless we have analysts who are running monitoring fraud detection systems fulltime on such a system, we might as well never build it – because blockchain data integrity cannot be hacked, but cryptocurrencies are hackable and have been hacked extensively since the beginning of Bitcoin.

For Part 2 of this series on Blockchain Applications of Data Science, you can go to the link below:

https://dimensionless.in/how-a-kaggle-jupyter-notebook-with-a-python-kernel-correctly-predicted-the-burst-of-the-bitcoin-bubble-with-reasonable-accuracy

For more on AI and Blockchain, I suggest that you refer to the article below:

A Beginner’s Guide to Big Data and Blockchain

and you can go to the link below to access our course on Python and R.

Data Science using R & Python

As always, enjoy artificial intelligence! You are privileged to be at the forefront of humanity’s push into the new uncharted future. All the best!
The Upcoming Revolution in Predictive Analytics (And Data Science)

The Upcoming Revolution in Predictive Analytics (And Data Science)

The Next Generation of Data Science

Quite literally, I am stunned.

I have just completed my survey of data (from articles, blogs, white papers, university websites, curated tech websites, and research papers all available online) about predictive analytics.

And I have a reason to believe that we are standing on the brink of a revolution that will transform everything we know about data science and predictive analytics.

But before we go there, you need to know: why the hype about predictive analytics? What is predictive analytics?

Let’s cover that first.

 Importance of Predictive Analytics

Black Samsung Tablet Computer

By PhotoMix Ltd

 

According to Wikipedia:

Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. The enhancement of predictive web analytics calculates statistical probabilities of future events online. Predictive analytics statistical techniques include data modeling, machine learning, AI, deep learning algorithms and data mining.

Predictive analytics is why every business wants data scientists. Analytics is not just about answering questions, it is also about finding the right questions to answer. The applications for this field are many, nearly every human endeavor can be listed in the excerpt from Wikipedia that follows listing the applications of predictive analytics:

From Wikipedia:

Predictive analytics is used in actuarial science, marketing, financial services, insurance, telecommunications, retail, travel, mobility, healthcare, child protection, pharmaceuticals, capacity planning, social networking, and a multitude of numerous other fields ranging from the military to online shopping websites, Internet of Things (IoT), and advertising.

In a very real sense, predictive analytics means applying data science models to given scenarios that forecast or generate a score of the likelihood of an event occurring. The data generated today is so voluminous that experts estimate that less than 1% is actually used for analysis, optimization, and prediction. In the case of Big Data, that estimate falls to 0.01% or less.

Common Example Use-Cases of Predictive Analytics

 

Components of Predictive Analytics

Components of Predictive Analytics

 

A skilled data scientist can utilize the prediction scores to optimize and improve the profit margin of a business or a company by a massive amount. For example:

  • If you buy a book for children on the Amazon website, the website identifies that you have an interest in that author and that genre and shows you more books similar to the one you just browsed or purchased.
  • YouTube also has a very similar algorithm behind its video suggestions when you view a particular video. The site identifies (or rather, the analytics algorithms running on the site identifies) more videos that you would enjoy watching based upon what you are watching now. In ML, this is called a recommender system.
  • Netflix is another famous example where recommender systems play a massive role in the suggestions for ‘shows you may like’ section, and the recommendations are well-known for their accuracy in most cases
  • Google AdWords (text ads at the top of every Google Search) that are displayed is another example of a machine learning algorithm whose usage can be classified under predictive analytics.
  • Departmental stores often optimize products so that common groups are easy to find. For example, the fresh fruits and vegetables would be close to the health foods supplements and diet control foods that weight-watchers commonly use. Coffee/tea/milk and biscuits/rusks make another possible grouping. You might think this is trivial, but department stores have recorded up to 20% increase in sales when such optimal grouping and placement was performed – again, through a form of analytics.
  • Bank loans and home loans are often approved with the credit scores of a customer. How is that calculated? An expert system of rules, classification, and extrapolation of existing patterns – you guessed it – using predictive analytics.
  • Allocating budgets in a company to maximize the total profit in the upcoming year is predictive analytics. This is simple at a startup, but imagine the situation in a company like Google, with thousands of departments and employees, all clamoring for funding. Predictive Analytics is the way to go in this case as well.
  • IoT (Internet of Things) smart devices are one of the most promising applications of predictive analytics. It will not be too long before the sensor data from aircraft parts use predictive analytics to tell its operators that it has a high likelihood of failure. Ditto for cars, refrigerators, military equipment, military infrastructure and aircraft, anything that uses IoT (which is nearly every embedded processing device available in the 21st century).
  • Fraud detection, malware detection, hacker intrusion detection, cryptocurrency hacking, and cryptocurrency theft are all ideal use cases for predictive analytics. In this case, the ML system detects anomalous behavior on an interface used by the hackers and cybercriminals to identify when a theft or a fraud is taking place, has taken place, or will take place in the future. Obviously, this is a dream come true for law enforcement agencies.

So now you know what predictive analytics is and what it can do. Now let’s come to the revolutionary new technology.

Meet Endor – The ‘Social Physics’ Phenomenon

 

Image result for endor image free to use

End-to-End Predictive Analytics Product – for non-tech users!

 

In a remarkable first, a research team at MIT, USA have created a new science called social physics, or sociophysics. Now, much about this field is deliberately kept highly confidential, because of its massive disruptive power as far as data science is concerned, especially predictive analytics. The only requirement of this science is that the system being modeled has to be a human-interaction based environment. To keep the discussion simple, we shall explain the entire system in points.

  • All systems in which human beings are involved follow scientific laws.
  • These laws have been identified, verified experimentally and derived scientifically.
  • Bylaws we mean equations, such as (just an example) Newton’s second law: F = m.a (Force equals mass times acceleration)
  • These equations establish laws of invariance – that are the same regardless of which human-interaction system is being modeled.
  • Hence the term social physics – like Maxwell’s laws of electromagnetism or Newton’s theory of gravitation, these laws are a new discovery that are universal as long as the agents interacting in the system are humans.
  • The invariance and universality of these laws have two important consequences:
    1. The need for large amounts of data disappears – Because of the laws, many of the predictive capacities of the model can be obtained with a minimal amount of data. Hence small companies now have the power to use analytics that was mostly used by the FAMGA (Facebook, Amazon, Microsoft, Google, Apple) set of companies since they were the only ones with the money to maintain Big Data warehouses and data lakes.
    2. There is no need for data cleaning. Since the model being used is canonical, it is independent of data problems like outliers, missing data, nonsense data, unavailable data, and data corruption. This is due to the orthogonality of the model ( a Knowledge Sphere) being constructed and the data available.
  • Performance is superior to deep learning, Google TensorFlow, Python, R, Julia, PyTorch, and scikit-learn. Consistently, the model has outscored the latter models in Kaggle competitions, without any data pre-processing or data preparation and cleansing!
  • Data being orthogonal to interpretation and manipulation means that encrypted data can be used as-is. There is no need to decrypt encrypted data to perform a data science task or experiment. This is significant because the independence of the model functioning even for encrypted data opens the door to blockchain technology and blockchain data to be used in standard data science tasks. Furthermore, this allows hashing techniques to be used to hide confidential data and perform the data mining task without any knowledge of what the data indicates.

Are You Serious?

Image result for OMG image

That’s a valid question given these claims! And that is why I recommend everyone who has the slightest or smallest interest in data science to visit and completely read and explore the following links:

  1. https://www.endor.com
  2. https://www.endor.com/white-paper
  3. http://socialphysics.media.mit.edu/
  4. https://en.wikipedia.org/wiki/Social_physics

Now when I say completely read, I mean completely read. Visit every section and read every bit of text that is available on the three sites above. You will soon understand why this is such a revolutionary idea.

  1. https://ssir.org/book_reviews/entry/going_with_the_idea_flow#
  2. https://www.datanami.com/2014/05/21/social-physics-harnesses-big-data-predict-human-behavior/

These links above are articles about the social physics book and about the science of sociophysics in general.

For more details, please visit the following articles on Medium. These further document Endor.coin, a cryptocurrency built around the idea of sharing data with the public and getting paid for using the system and usage of your data. Preferably, read all, if busy, at least read Article No, 1.

  1. https://medium.com/endor/ama-session-with-prof-alex-sandy-pentland
  2. https://medium.com/endor/endor-token-distribution
  3. https://medium.com/endor/https-medium-com-endor-paradigm-shift-ai-predictive-analytics
  4. https://medium.com/endor/unleash-the-power-of-your-data

Operation of the Endor System

Upon every data set, the first action performed by the Endor Analytics Platform is clustering, also popularly known as automatic classification. Endor constructs what is known as a Knowledge Sphere, a canonical representation of the data set which can be constructed even with 10% of the data volume needed for the same project when deep learning was used.

Creation of the Knowledge Sphere takes 1-4 hours for a billion records dataset (which is pretty standard these days).

Now an explanation of the mathematics behind social physics is beyond our scope, but I will include the change in the data science process when the Endor platform was compared to a deep learning system built to solve the same problem the traditional way (with a 6-figure salary expert data scientist).

An edited excerpt from Link here

From Appendix A: Social Physics Explained, Section 3.1, pages 28-34 (some material not included):

Prediction Demonstration using the Endor System:

Data:
The data that was used in this example originated from a retail financial investment platform
and contained the entire investment transactions of members of an investment community.
The data was anonymized and made public for research purposes at MIT (the data can be
shared upon request).

 

Summary of the dataset:
– 7 days of data
– 3,719,023 rows
– 178,266 unique users

 

Automatic Clusters Extraction:
Upon first analysis of the data the Endor system detects and extracts “behavioral clusters” – groups of
users whose data dynamics violates the mathematical invariances of the Social Physics. These clusters
are based on all the columns of the data, but is limited only to the last 7 days – as this is the data that
was provided to the system as input.

 

Behavioural Clusters Summary

Number of clusters:268,218
Clusters sizes: 62 (Mean), 15 (Median), 52508 (Max), 5 (Min)
Clusters per user:164 (Mean), 118 (Median), 703 (Max), 2 (Min)
Users in clusters: 102,770 out of the 178,266 users
Records per user: 6 (Median), 33 (Mean): applies only to users in clusters

 

Prediction Queries
The following prediction queries were defined:
1. New users to become “whales”: users who joined in the last 2 weeks that will generate at least
$500 in commission in the next 90 days
2. Reducing activity : users who were active in the last week that will reduce activity by 50% in the
next 30 days (but will not churn, and will still continue trading)
3. Churn in “whales”: currently active “whales” (as defined by their activity during the last 90 days),
who were active in the past week, to become inactive for the next 30 days
4. Will trade in Apple share for the first time: users who had never invested in Apple share, and
would buy it for the first time in the coming 30 days

 

Knowledge Sphere Manifestation of Queries
It is again important to note that the definition of the search queries is completely orthogonal to the
extraction of behavioral clusters and the generation of the Knowledge Sphere, which was done
independently of the queries definition.

Therefore, it is interesting to analyze the manifestation of the queries in the clusters detected by the system: Do the clusters contain information that is relevant to the definition of the queries, despite the fact that:

1. The clusters were extracted in a fully automatic way, using no semantic information about the
data, and –

2. The queries were defined after the clusters were extracted, and did not affect this process.

This analysis is done by measuring the number of clusters that contain a very high concentration of
“samples”; In other words, by looking for clusters that contain “many more examples than statistically
expected”.

A high number of such clusters (provided that it is significantly higher than the amount
received when randomly sampling the same population) proves the ability of this process to extract
valuable relevant semantic insights in a fully automatic way.

 

Comparison to Google TensorFlow

In this section a comparison between prediction process of the Endor system and Google’s
TensorFlow is presented. It is important to note that TensorFlow, like any other Deep Learning library,
faces some difficulties when dealing with data similar to the one under discussion:

1. An extremely uneven distribution of the number of records per user requires some canonization
of the data, which in turn requires:

2. Some manual work, done by an individual who has at least some understanding of data
science.

3. Some understanding of the semantics of the data, that requires an investment of time, as
well as access to the owner or provider of the data

4. A single-class classification, using an extremely uneven distribution of positive vs. negative
samples, tends to lead to the overfitting of the results and require some non-trivial maneuvering.

This again necessitates the involvement of an expert in Deep Learning (unlike the Endor system
which can be used by Business, Product or Marketing experts, with no perquisites in Machine
Learning or Data Science).

 

Traditional Methods

An expert in Deep Learning spent 2 weeks crafting a solution that would be based
on TensorFlow and has sufficient expertise to be able to handle the data. The solution that was created
used the following auxiliary techniques:

1.Trimming the data sequence to 200 records per customer, and padding the streams for users
who have less than 200 records with neutral records.

2.Creating 200 training sets, each having 1,000 customers (50% known positive labels, 50%
unknown) and then using these training sets to train the model.

3.Using sequence classification (RNN with 128 LSTMs) with 2 output neurons (positive,
negative), with the overall result being the difference between the scores of the two.

Observations (all statistics available in the white paper – and it’s stunning)

1.Endor outperforms Tensor Flow in 3 out of 4 queries, and results in the same accuracy in the 4th
.
2.The superiority of Endor is increasingly evident as the task becomes “more difficult” – focusing on
the top-100 rather than the top-500.

3.There is a clear distinction between “less dynamic queries” (becoming a whale, churn, reduce
activity” – for which static signals should likely be easier to detect) than the “Who will trade in
Apple for the first time” query, which are (a) more dynamic, and (b) have a very low baseline, such
that for the latter, Endor is 10x times more accurate!

4.As previously mentioned – the Tensor Flow results illustrated here employ 2 weeks of manual
improvements done by a Deep Learning expert, whereas the Endor results are 100% automatic and the entire prediction process in Endor took 4 hours.

Clearly, the path going forward for predictive analytics and data science is Endor, Endor, and Endor again!

Predictions for the Future

Personally, one thing has me sold – the robustness of the Endor system to handle noise and missing data. Earlier, this was the biggest bane of the data scientist in most companies (when data engineers are not available). 90% of the time of a professional data scientist would go into data cleaning and data preprocessing since our ML models were acutely sensitive to noise. This is the first solution that has eliminated this ‘grunt’ level work from data science completely.

The second prediction: the Endor system works upon principles of human interaction dynamics. My intuition tells me that data collected at random has its own dynamical systems that appear clearly to experts in complexity theory. I am completely certain that just as this tool developed a prediction tool with human society dynamical laws, data collected in general has its own laws of invariance. And the first person to identify these laws and build another Endor-style platform on them will be at the top of the data science pyramid – the alpha unicorn.

Final prediction – democratizing data science means that now data scientists are not required to have six-figure salaries. The success of the Endor platform means that anyone can perform advanced data science without resorting to TensorFlow, Python, R, Anaconda, etc. This platform will completely disrupt the entire data science technological sector. The first people to master it and build upon it to formalize the rules of invariance in the case of general data dynamics will for sure make a killing.

It is an exciting time to be a data science researcher!

Data Science is a broad field and it would require quite a few things to learn to master all these skills.

Dimensionless has several resources to get started with.

To Learn Data Science, Get Data Science Training in Pune and Mumbai from Dimensionless Technologies.

To learn more about analytics, be sure to have a look at the following articles on this blog:

Machine Learning for Transactional Analytics

and

Text Analytics and its applications

Enjoy data science!

The Rise of Edge Computing

The Rise of Edge Computing

Introduction

Computing infrastructure is an ever-changing landscape of technol­ogy advancements. Current changes affect the way companies deploy smart manufacturing systems to make the most of advancements.

The rise of edge computing capabilities coupled with tradi­tional industrial control system (ICS) architectures provides increasing levels of flexibility. In addition, time-synchronized applications and analytics augment, or in some cases minimize, the need for larger Big Data operations in the cloud, regardless of cloud premise.

In this blog, we will start with the definition of edge computing. After that, we will discuss the need of edge computing and it’s applications. Also, we will try to understand the scope of edge computing in the future.

What is Edge computing

Consolidation and the centralized nature of cloud computing have proven cost-effective and flexible, but the rise of the IIoT and mobile computing has put a strain on networking band­width. Ultimately, not all smart devices need to use cloud comput­ing to operate. In some cases, architects can — and should — avoid the back and forth. Edge computing could prove more efficient in some areas where cloud computing operates.

Furthermore, edge computing permits data processing closer to it’s origin (i.e., motors, pumps, generators or other sensors), reducing the need to transfer that data back and forth between the cloud.

Additionally, think of edge computing in manufacturing as a network of mi­cro data centers capable of hosting, storage, computing and analysis on a localized basis while pushing aggregate data to a centralized plant or enterprise data center, or even the cloud (private or public, on-premise or off) for further analysis, deeper learning, or to feed an artificial intelligence (AI) engine hosted elsewhere.

According to Microsoft, in edge computing, compute resources are “placed closer to information-generation sources to reduce network latency and bandwidth usage generally associated with cloud computing.” This helps to ensure continuity of services and operations even if cloud connections aren’t steady.

Also, this moving of compute and storage to the “edge” of the network, away from the data centre and closer to the user, cuts down the amount of time it takes to exchange messages compared with traditional centralized cloud computing. Moreover, according to research by IEEE, it can help to balance network traffic, extend the life of IoT devices and, ultimately, reduce “response times for real-time IoT applications.”

Terms in Edge Computing

Like most technology areas, edge computing has its own lexicon. Here are brief definitions of some of the more commonly used terms

  • Edge devices: These can be any device that produces data. These could be sensors, industrial machines or other devices that produce or collect data.
  • Edge: What the edge depends on the use case. In a telecommunications field, perhaps the edge is a cell phone or maybe it’s a cell tower. Furthermore, in an automotive scenario, the edge of the network could be a car. Also, in manufacturing, it could be a machine on a shop floor. Additionally, in enterprise IT, the edge could be a laptop.
  • Edge gateway: A gateway is a buffer between where edge computing processing is done and the broader fog network. The gateway is the window into the larger environment beyond the edge of the network.
  • Fat client: Software that can do some data processing in edge devices. This is opposite to a thin client, which would merely transfer data.
  • Edge computing equipment: Edge computing uses a range of existing and new equipment. We can outfit many devices, sensors and machines to work in an edge computing environment by simply making them Internet-accessible. Cisco and other hardware vendors have a line of rugged network equipment that has hardened exteriors meant to be used in field environments. A range of compute servers and even storage-based hardware systems like Amazon Web Service’s Snowball have usage in edge computing deployments.
  • Mobile edge computing: This refers to the buildout of edge computing systems in telecommunications systems, particularly 5G scenarios

Why Rise in Edge Computing

1. Latency in decision making

Businesses are getting a huge boost from computerised systems, especially as they evolve into the cloud era. But bringing that same level of technology across different sites has proven to be not so straightforward for many companies, particularly as the sites started generating more data. The main concern is latency, that being the time it takes for data to move between points. As with the NYSE, a little distance goes a long way in the computer world, so it stands to reason that delays in sending data needed to reach decisions will translate into delays for the business.

2. Decentralisation and scaling

To some, it may seem counterintuitive to move away from the centre. Wasn’t centralisation the whole point of cloud systems? But the cloud isn’t about pooling everything in the middle. It’s about scale and making it easier to access the services that the business uses every day. Also, the transfer gap problem between sites and data centres predates the cloud era. Yet cloud can exacerbate it. The only way to overcome this transfer gap is to move some of the data centres to where the data is.

3. Process Optimisation

With edge computing, data centres can execute rules that are time sensitive (like “stop the car” in case of driverless vehicles), and then stream data to the cloud in batches when bandwidth needs aren’t as high. Furthermore,the cloud can then take the time to analyze data from the edge, and send back recommended rule changes — like “decelerate slowly when the car senses human activity within 50 feet.”

4. Cost

Cost is also a driving factor for edge computing. The bulk of telemetry data that is from the sensors and actuators is likely not relevant for the IoT application. The fact a temperature sensor reports a 20ºC reading every second might not be interesting until the sensor reports a 40ºC reading. Edge computing allows for the filtering and processing of data before sending it to the cloud. This reduces the network cost of data transmission. It also reduces the cloud storage and processing cost of data that is not relevant to the application.

5. Resourcefulness

Storing and processing data on the edge and only sending out to the cloud what will be used and useful saves bandwidth and server space.

Where all we are using it

1. Grid Edge Control and Analytics

Grid Edge computing solutions are helping the utility monitor and analyse these additional renewable power generating resources integrated into their grid, in real time. This is something legacy SCADA systems are unable to offer.

From residential rooftop solar to solar farms, commercial solar, electric vehicles and wind farms, smart meters are generating a ton of data that helps utilities to view the amount of energy available and required, allowing their demand response to become more efficient, avoid peaks and reduce costs. This data is first processed in the Grid Edge Controllers that perform local computation and analysis of the data only send necessary actionable information over a wireless network to the Utility.

2. Oil and Gas Remote Monitoring

Safety monitoring within critical infrastructures such as oil and gas utilities is of utmost importance. For this reason, many cutting edge IoT monitoring devices are being deployed in order to safeguard against disaster. Edge computing allows data to be analysed, processed, and then delivered to end-users in real-time, allowing for control centres to access data as it occurs in order to foresee and prevent malfunctions or incidents before they occur. This is really important. As, when dealing with critical infrastructures such as oil and gas or other energy services, any failures within a particular system have the potential to be catastrophic and should always warrant the highest levels of precaution.

3. Internet of Things

A smart window firm monitors windows for errors, weather information, maintenance needs and performance. This generates a massive stream of data as each device is regularly reporting information. Edge services filter this information and report a summary back to a centralized service that is running from the firm’s primary data centres. By summarizing information before reporting it, global bandwidth consumption is reduced by 99%.

4. E-Commerce

An e-commerce company delivers images and static web content from a content delivery network. They also perform processing at edge data centres to quickly calculate product recommendations for customers.

5. Markets

A hedge fund pays an expensive premium for servers that are in close proximity to various stock exchanges to achieve extremely low latency trading. Trading algorithms are deployed on these machines. These servers are expensive and resource constrained. As such, they connect back to a cloud service for processing support.

6. Games

A game platform executes certain real-time elements of the game experience on edge servers near the user. The edges connect to a cloud backend for support processing. The backend is run from three regions that need not be close to the end-user.

Predictions for Edge Computing in Future

According to IDC  by 2020, the IT spend on edge infrastructure will reach up to 18% of the total spend on IoT infrastructure. That spend is driven by the deployment of converged IT and OT systems which reduces the time to value of data collected from their connected devices IDC adds. It’s what we explained and illustrated in a nutshell.

According to a November 1, 2017, announcement regarding research of the edge computing market across hardware, platforms, solutions and applications (smart city, augmented reality, analytics etc.) the global edge computing market is expected to reach USD 6.72 billion by 2022 at a compound annual growth rate of a whopping 35.4 per cent.

The major trends responsible for the growth of the market in North America are all too familiar. Also, there is a growing number of devices and dependency on IoT devices. Hence, the need for faster processing, the increase in cloud adoption, and the increase in pressure on networks.

In an October 2018 blog post, Gartner’s Rob van der Meulen said that currently, around 10% of enterprise-generated data is created and processed outside a traditional centralized data centre or cloud. By 2022, Gartner predicts this figure will reach 50 per cent.

Summary

Edge is still in early stage adoption, but one thing is clear: Edge devices are subject to large-scale investments from cloud suppliers to offload bandwidth. Also, there are latency issues due to an explosion of the Internet of Things (IoT) data in both industrial and commercial applications.

Edge soon will likely increase in adoption where users have questions about how or if the cloud applies for the specific use case. Cloud-level interfaces and apps will migrate to the edge. Industrial application hosting and analytics will become common at the edge, using virtual servers and simplified operational technology-friendly hardware and software.

Benefits in network simplification, security and bandwidth accompany the IT simplification.

Follow this link, if you are looking to learn more about data science online!

You can follow this link for our Big Data course!

Additionally, if you are having an interest in learning Data Science, click here to start

Furthermore, if you want to read more about data science, you can read our blogs here

Also, the following are some blogs you may like to read

MATLAB for Data Science

Top 5 Ways to Evaluate Data Science Competency

Can you learn Data Science and Machine Learning without Maths?

 

Deploying ML models using flask and flasgger

You have finally trained your first Machine Learning model. Congratulations! What’s next though? How would you unleash the power of your model outside of your laptop? In this tutorial, we help you take the first step towards deploying models. We will use flask to create apps and flasgger to create a beautiful UI.

Pre-requisite:

  • Anaconda distribution of Python

 

Libraries required:

  1. If you do not have Anaconda installed, you can download it from here:
    https://www.anaconda.com/distribution/
  2. Libraries to install:
    -numpy
    -pandas
    -sklearn
    -flask
    -flasgger (version = 0.8.1)

Github repository for this tutorial – https://github.com/DhruvilKarani/MLapp_using_flask_and_flasgger

You can install libraries using Anaconda Prompt (use the search option on windows) by typing –pip install <name of the library>. For flasgger, use pip install flasgger==0.8.1

Of course, don’t include the <>

Building an ML model

Before we deploy any model, let’s first build one. For now, let’s build a simple model on a simple dataset so that we can spend more time on the deployment part. We use the Iris dataset from sklearn’s datasets. The required imports are given below.

The Iris dataset looks something like this –

Image Credits: Analyticskhoj

 

The variable to be predicted, i.e., Species, has three categories – Sentosa, Virginica, Versicolour. Now, we build our model in just 6 lines of code

The model we build is saved as a pickle file. A pickle file saves any file into its binary form. Next time we want to use this model, we don’t have to train it again. We can merely load this pickle file.

The above command saves the model as a pickle file under the name model_pkl in the path specified (in this case – C:/Users….model.pkl). Also, make sure you have / and not \. You might also want to check once if the file is present in the folder. Once you have made sure the file exists, the next step is to use flask and flasgger to make a fantastic UI. Make a new Python script and import the following modules and read the pickle file.

Next, we create a Flask object and name it app. The argument to this Flask object is the special keyword __name__. To create an easy UI for this app, we use the Swagger module in the Flasgger library.

Now, we create two apps – One which accepts individual values for all 4 input values and the other which accepts a CSV file as inputs. Let’s create the first app

The first line @app.route(‘/predict’) specifies the part of the URL which runs this particular app. If you do not understand this as of now, don’t worry. Things get more evident as we use the app. The next thing we do is create a function, named predict_iris. Under this function, we have a long docstring. Swagger uses this string for creating a UI. It says that the app requires 4 parameters namely S_length, S_width, P_length, P_width. All of these input values are of the query type. Next, the app uses the GET method to accept these values which means that we need to enter the numbers by ourselves. Then we pass these values to our model in a 2 D numpy array and return the predictions. Two things here –

  • Predictions[0] returns the element in the prediction numpy array
  • We always output a string, never a numeric value to avoid errors.

 

Now we build the second app, the one that accepts a file. However, before the app, we create a file that has all four variable values for which we predict the output. In the Python console, type the following

Select any 2-4 rows at random, copy them and save them in a CSV file. We use this file to test our second app. The file would look something like this

Notice the changes here. The @app.route decorator has ‘/predict_file’ as one of its argument. The docstring under our new function predict_iris_file tells Swagger to set the file as an input file. Next, we read the CSV using read_csv and make sure the header is set to None if you haven’t set the column names while making the CSV. Next, we use the model to make the predictions and return them as a string.

Finally, we run the app using

In the console, the output generates a local URL, something like this –

Copy the URL (the one highlighted) and paste it in your browser. Add /apidocs to it and hit Enter. For example http://127.0.0.1:5000/apidocs. Something like this opens up –

Click on default and then on /predict. You’ll find something like this –

Above is a UI for your first app. Go ahead, insert the four values and click on ‘Try it out!’.  Under the response body, you find the predicted class. For our second app, upload the file by clicking on choose file.

When you try it out, you get a string of predicted classes.

Congratulations! You have just created a nice UI for your ML model. Feel free to play around and try out new things.

AWS Big Data Prep Course – Why you Need it

AWS Big Data Prep Course – Why you Need it

What is AWS?

In very simple words, Amazon Web Services is a subsidiary of Amazon that provides on-demand cloud computing platforms to individuals, companies and governments, on a paid subscription basis. The technology allows subscribers to have at their disposal a virtual cluster of computers, available all the time, through the Internet.

Let us give a shot at a very technical description of AWS. Amazon Web Services (AWS) is a secure cloud services platform, offering computing power, database storage, content delivery and other functionality to help businesses scale and grow. Explore how millions of customers are currently leveraging AWS cloud products and solutions to build sophisticated applications with increased flexibility, scalability and reliability.

Capabilities?

  • Websites & Website Hosting: Amazon Web Services offers cloud web hosting solutions that provide businesses, non-profits, and governmental organizations with low-cost ways to deliver their websites and web applications. Whether you’re looking for marketing, rich media, or e-commerce website, AWS offers a wide range of website hosting options, and we’ll help you select the one that is right for you.
  • Backup & Recovery: AWS offers the most storage services, data-transfer methods, and networking options to build solutions that protect your data with unmatched durability and security
  • Data Archive: Amazon Web Services offers a complete set of cloud storage services for archiving. You can choose Amazon Glacier for affordable, non-time sensitive cloud storage, or Amazon Simple Storage Service (S3) for faster storage, depending on your needs. With AWS Storage Gateway and our solution provider ecosystem, you can build a comprehensive, storage solution.
  • DevOps: AWS provides a set of flexible services designed to enable companies to more rapidly and reliably build and deliver products using AWS and DevOps practices. These services simplify provisioning and managing infrastructure, deploying application code, automating software release processes, and monitoring your application and infrastructure performance.
  • Big Data: AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. AWS gives customers the widest array of analytics and machine learning services, for easy access to all relevant data, without compromising on security or governance.

Why learn AWS?

DevOps Automation

You don’t want your data scientists spending time on DevOps tasks like creating AMIs, defining Security Groups, and creating EC2 instances. Data science workloads benefit from large machines for exploratory analysis in tools like Jupyter or RStudio, as well as elastic scalability to support bursty demand from teams, or parallel execution of data science experiments, which are often computationally intensive.

Cost controls, resource monitoring, and reporting

Data science workloads often benefit from high-end hardware, which can be expensive. When data scientists have more access to scalable compute, how do you mitigate the risk of runaway costs, enforce limits, and attribute across multiple groups or teams?

Environment management

Data scientists need agility to experiment with new open source tools and packages, which are evolving faster than ever before. System administrators must ensure stability and safety of environments. How can you balance these two points in tension?

GPUs

Neural networks and other effective data science techniques benefit from GPU acceleration, but configuring and utilizing GPUs remains easier said than done. How can you provide efficient access to GPUs for your data scientists without miring them in DevOps configuration tasks?

Security

AWS offers world-class security in their environment — but you must still make choices about how you configure security for your applications running on AWS. These choices can make a significant difference in mitigating risk as your data scientists transfer logic (source code) and data sets that may represent your most valuable intellectual property.

Our AWS Course

1. AWS Introduction

This section covers the basic and different concepts and terms which are AWS specific. This lays out the basic setting where learners are fed with all the AWS specific terms and are prepared for the deep dive.

2. VPC Subnet

A virtual private cloud (VPC) is a virtual network dedicated to your AWS account. It is logically isolated from other virtual networks in the AWS Cloud. You can launch your AWS resources, such as Amazon EC2 instances, into your VPC.

3. Route

A route table contains a set of rules, called routes, that are used to determine where network traffic is directed. Each subnet in your VPC must be associated with a route table; the table controls the routing for the subnet. A subnet can only be associated with one route table at a time, but you can associate multiple subnets with the same route table.

4. EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

5. IAM

AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources. You use IAM to control who is authenticated (signed in) and authorized (has permissions) to use resources.

6. S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

7. Lambda

AWS Lambda is a ‘compute’ service that lets you run code without provisioning or managing servers. AWS Lambda executes your code only when needed and scales automatically, from a few requests per day to thousands per second

8. SNS

Amazon Simple Notification Service (SNS) is a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications. Amazon SNS provides topics for high-throughput, push-based, many-to-many messaging.

9. SQS

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. SQS eliminates the complexity and overhead associated with managing and operating message-oriented middleware and empowers developers to focus on differentiating work.

10. RDS

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.

11. Dynamo DB

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multi-region, multi-master database with built-in security, backup and restores, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second.

13. Cloud Formation

AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts. This file serves as the single source of truth for your cloud environment.

14. Projects

No learning can happen without doing any project. This is our mantra at Dimensionless Technologies. We have different projects planned for our learners which will help in implementing all the learners during the course.

Why Dimensionless as your learning partner?

Dimensionless Technologies provide instructor-led LIVE online training with hands-on different problems. We do not provide classroom training but we deliver more as compared to what a classroom training could provide you with

Are you sceptical of online training or you feel that online mode is not the best platform to learn? Let us clear your insecurities about online training!

  1. Live and Interactive sessions
    We conduct classes through live sessions and not pre-recorded videos. The interactivity level is similar to classroom training and you get it in the comfort of your home.

 

  1. Highly Experienced Faculty
    We have very highly experienced faculty with us (IIT`ians) to help you grasp complex concepts and kick-start your career success journey

 

  • Up to Data Course content
    Our course content is up to date which involves all the latest technologies and tools. Our course is well equipped for learners to grasp the knowledge required to solve real-world problems through their data analytical skills

 

  • Availability of software and computing resource
    Any laptop with 2GB RAM and Windows 7 and above is perfectly fine for this course. All the software used in this course are Freely downloadable from the Internet. The trainers help you set it up in your systems. We also provide access to our Cloud-based online lab where these are already installed.

 

  • Industry-Based Projects
    During the training, you will be solving multiple case studies from different domains. Once the LIVE training is done, you will start implementing your learnings on Real Time Datasets. You can work on data from various domains like Retail, Manufacturing, Supply Chain, Operations, Telecom, Oil and Gas and many more.

 

  • Course Completion Certificate
    Yes, we will be issuing a course completion certificate to all individuals who successfully complete the training.

 

  • Placement Assistance
    We provide you with real-time industry requirements on a daily basis through our connection in the industry. These requirements generally come through referral channels, hence the probability to get through increases manifold

Conclusion

Dimensionless technologies have the right courses for you if you are aiming to kick-start your career in the field of data science. Not only we cover all the important concepts and technologies but also focus on their implementation and usage in real-world business problems. Follow the link to register yourself for the free demo of the courses!

You can follow this link for our AWS course

Additionally, if you are interested in learning Data Science, click here to get started

Furthermore, if you want to read more about data science, you can read our blogs here

Also, the following are some suggested blogs you may like to read

Introduction to AWS Big Data

Top 5 Advantages of AWS Big Data Speciality

Introduction to Agent-Based Modelling