Now, in theory, it is possible to become a data scientist, without paying a dime. What we want to do in this article is to list out the best of the best options to learn what you need to know to become a data scientist. Many articles offer 4-5 courses under each heading. What I have done is to search through the Internet covering all free courses and choose the single best course for each topic.
These courses have been carefully curated and offer the best possible option if you’re learning for free. However – there’s a caveat. An interesting twist to this entire story. Interested? Read on! And please – make sure you complete the full article.
Topics For A Data Scientist Course
The basic topics that a data scientist needs to know are:
Machine Learning Theory and Applications
Statistics & Probability
Calculus Basics (short)
Machine Learning in Python
Machine Learning in R
So let’s get to it. Here is the list of the best possible options to learn every one of these topics, carefully selected and curated.
Machine Learning – Stanford University – Andrew Ng (audit option)
The world-famous course for machine learning with the highest rating of all the MOOCs in Coursera, from Andrew Ng, a giant in the ML field and now famous worldwide as an online instructor. Uses MATLAB/Octave. From the website:
This course provides a broad introduction to machine learning, data mining, and statistical pattern recognition. Topics include:
(ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning)
(iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)
The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
This course is extremely effective and has many benefits. However, you will need high levels of self-discipline and self-motivation. Statistics show that90% of those who sign up for a MOOC without a classroom or group environment never complete the course.
Learn Python The Hard Way – Zed Shaw – Free Online Access
You may ask me, why do I want to learn the hard way? Shouldn’t we learn the smart way and not the hard way? Don’t worry. This ebook, online course, and web site is a highly popular way to learn Python. Ok, so it says the hard way. Well, the only way to learn how to code is to practice what you have learned. This course integrates practice with learning. Other Python books you have to take the initiative to practice.
Here, this book shows you what to practice, how to practice. There is only one con here – although this is the best self-driven method, most people will not complete all of it. The main reason is that there is no external instructor for supervision and a group environment to motivate you. However, if you want to learn Python by yourself, then this is the best way. But not the optimal one, as you will see at the end of this article since the cost of the book is 30$ USD (2100 INR approx).
Interactive R and Data Science Programming – SwiRl
Swirlstats is a wonderful tool to learn R and data science scripting in R interactively and intuitively by teaching you R commands from within the R console. This might seem like a very simple tool, but as you use it, you will notice its elegance in teaching you literally how to express yourselves in R and the finer nuances of the language and integration with the console and tidyverse. This is a powerful method of learning R and what is more, it is also a lot of fun!
KhanAcademy is a free non-profit organization on a mission – they want to provide a world-class education to you regardless of where you may be in the world. And they’re doing a fantastic job! This course has been covered in several very high profile blogs and Quora posts as the best online course for statistics – period. What is more, it is extremely high quality and suitable for beginners – and – free! This organization is doing wonderful work. More power to them!
Mathematics for Data Science
Now the basic mathematics for data science content includes linear algebra, single-variable, discrete mathematics, and multivariable calculus (selected topics) and basics of differential equations. Now you could take all of these topics separately in KhanAcademy and that is a good option for Linear Algebra and Multivariate Calculus (in addition to Statistics and Probability).
For Linear Algebra, the link of what you need to know given in a course in KhanAcademy is given below:
These courses are completely free and very accessible to beginners.
This topic deserves a section to itself because discrete mathematics is the foundation of all computer science. There are a variety of options available to learn discrete mathematics, from ebooks to MOOCs, but today, we’ll focus on the best possible option. MIT (Massachusetts Institute of Technology) is known as one of the best colleges in the world and they have an Open information initiative known as MIT OpenCourseWare (MIT OCW). These are actual videos of the lectures taken by the students at one of the best engineering colleges in the world. You will benefit a lot if you follow the lectures at this link, they give all the basic concepts as clearly as possible. It’s a bit technical because this is open mostly for students at an advanced level. The link is given below:
It is also technical and from MIT but might be a little more accessible than the earlier option.
SQL (see-quel) or Structured Query Language is a must-learn if you are a data scientist. You will be working with a lot of databases, and SQL is the language used to access and generate data from database systems like Oracle and Microsoft SQL Server. The best free course I could find online is undoubtedly the one below:
We have covered Python, R, Machine Learning using MATLAB, Data Science with R (SwiRl teaches data science as well), Statistics, Probability, Linear Algebra, and Basic Calculus. Now we just need to get a course for Data Science with Python, and we are done! Now I looked at many options but was not satisfied. So instead of a course, I have provided you with a link to the scikit-learn documentation. Why?
Because that’s as good as an online course by itself. If you read through the main sections, get the code (Ctrl-X, Ctrl-V) and execute it in an Anaconda environment, and then play around with it, experiment, and observe and read up on what every line does, you will already know who to solve standard textbook problems. I recommend the following order:
This book is free to learn online. Get the data files, get the script files, use RStudio, and just as with Python, play, enjoy, experiment, execute, and explore. A little hard work will have you up and running with R in no time! But make sure you try as many code examples as possible. The libraries you can focus on are:
dplyr (data manipulation)
tidyr (data preprocessing “tidying”)
ggplot2 (graphical package)
purrr (functional toolkit)
readr (reading rectangular data files easily)
stringr (string manipulation)
To make it short, simple, and sweet, since we have already covered SQL and this content is for beginners, I recommend the following course:
This is a course on Udemy rated 4.2/5 and completely free. You will learn everything you need to work with Tableau (the most commonly used corporate-level visualization tool). This is an extremely important part of your skill set. You can make all the greatest analyses, but if you don’t visualize them and do it well, management will never buy into your machine learning solution, and neither will anyone who doesn’t know the technical details of ML (which is a large set of people on this planet). Visualization is important. Please make sure to learn the basics (at least!) of Tableau.
Kaggle Micro-Courses (Add-Ons – Short Concise Tutorials)
Kaggle is a wonderful site to practice your data science skills, but recently, they have added a set of hands-on courses to learn data science practicals. And, if I do say, so myself, it’s brilliant. Very nicely presented, superb examples, clear and concise explanations. And of course, you will cover more than we discussed earlier. Please, if you read through all the courses discussed so far in this article, and if you do just the courses at Kaggle.com, you will have spent your time wisely (though not optimally – as we shall see).
Now, if you are reading this article, you might have a fundamental question. This is a blog of a company that offers courses in data science, deep learning, and cloud computing. Why would we want to list all our competitors and publish it on our site? Isn’t that negative publicity?
Quite the opposite.
This is the caveat we were talking about.
Our course is a better solution than every single option given above!
We have nothing to hide.
And we have an absolutely brilliant top-class product.
Every option given above is a separate course by itself.
And they all suffer from a very prickly problem – you need to have excellent levels of discipline and self-motivation to complete just one of the courses above – let alone all ten.
You also have no classroom environment, no guidance for doubts and questions, and you need to know the basics about programming.
Our product is the most cost-effective option in the market for learning data science, as well as the most effective methodology for everyone – every course is conducted live in a classroom environment from the comfort of your home. You can work at a standard job, spend two hours on the internet every day, do extra work and reading on weekends, and become a professional data scientist in 6 months time.
We also have personalized GitHub project portfolio creation, management, and faculty guidance. Not to mention individual attention for each student.
And IITians for faculty who also happen to have 9+ years of industry experience.
So when we say that our product is the best on the market, we really mean it. Because of the live session teaching of the classes, which no other option on the Internet today has.
Am I kidding? Absolutely not. And you can get started with Dimensionless Technologies Data Science with Python and R course for just 70-odd USD. Which is the most cost-effective option on the market!
And unlike all the 10 courses and resources detailed above, instead of doing 10 courses, you just need to do one single course, with the extracted meat of all that you need to know as a data scientist. And yes, we cover:
Statistics & Probability
Machine Learning in Python
Machine Learning in R
GitHub Personal Project Portfolio Creation
Live Remote Daily Sessions
Experts with Industrial Experience
A Classroom Environment (to keep you motivated)
Individual Attention to Every Student
I hope this information has you seriously interested. Please sign up for the course – you will not regret it.
And we even have a two-week trial for you to experience the course for yourself.
Choose wisely and optimally.
Unleash the data scientist within!
An excellent general article on emerging state-of-the-art technology, AI, and blockchain:
So you want to learn data science but you don’t know where to start? Or you are a beginner and you want to learn the basic concepts? Welcome to your new career and your new life! You will discover a lot of things on your journey to becoming a data scientist and being part of a new revolution. I am a firm believer that you can learn data science and become a data scientist regardless of your age, your background, your current knowledge level, your gender, and your current position in life. I believe – from experience – that anyone can learn anything at any stage in their lives. What is required is just determination, persistence, and a tireless commitment to hard work. Nothing else matters as far as learning new things – or learning data science – is concerned. Your commitment, persistence, and your investment in your available daily time is enough.
I hope you understood my statement. Anyone can learn data science if you have the right motivation. In fact, I believe anyone can learn anything at any stage in their lives, if they invest enough time, effort and hard work into it, along with your current occupation. From my experience, I strongly recommend that you continue your day job and work on data science as a side hustle, because of the hard work that will be involved. Your commitment is more important than your current life situation. Carrying on a full-time job and working on data science part-time is the best way to go if you want to learn in the best possible manner.
Technical Concepts of Data Science
So what are the important concepts of data science that you should know as a beginner? They are, in order of sequential learning, the following:
Statistics & Probability
Data Preparation and Data ETL*
Machine Learning with Python and R
Data Visualization and Summary
*Extraction, Transformation, and Loading
Now if you were to look at the above list an go to a library, you would, most likely, come back with 9-10 books at an average of 1000 pages each. Even if you could speed-read, 10,000 pages is a lot to get through. I could list the best books for each topic in this post, but even the most seasoned reader would balk at 10,000 pages. And who reads books these days? So what I am going to give you is a distilled extract on each of those topics. Keep in mind, however, that every topic given above could be a series of blog posts in its own right, and these 80-word paragraphs are just a tiny taste of each topic and there is an ocean of depth involved in every topic. You might ask if that is the case, how can everybody be a possible candidate for data scientist role? Two words: Persistence and Motivation. With the right amount of these two characteristics, anyone can be anything they want to be.
1) Python Programming:
Python is one of the most popular programming languages in the world. It is the ABC of data science because Python is the language every beginner starts with on data science. It is universally used for any purposes since it is so amazingly versatile. Python can be used for web applications and websites with Django, microservices with Flask, general programming projects with the standard library from PyPI, GUIs with PyQt5 or Tkinter, Interoperability with Jython (Java), Cython (C) and nearly other programming language are available today.
Of course, Python is the also first language used for data science with the standard stack of scikit-learn (machine learning), pandas (data manipulation), matplotlib and seaborn (visualization) and numpy (vectorized computation). Nowadays, the most common technology used is the Anaconda distribution, available from www.anaconda.com. Current version 2018.12 or Anaconda Distribution 5. To learn more about Python, I strongly recommend the following books: Head First Python and the Python Cookbook.
2) R Programming
R is The Best Language for statistical needs since it is a language designed by statisticians, for statisticians. If you know statistics and mathematics well, you will enjoy programming in R. The language gives you the best support available for every probability distribution, statistics functions, mathematical functions, plotting, visualization, interoperability, and even machine learning and AI. In fact, everything that you can do in Python can be done in R. R is the second most popular language for data science in the world, second only to Python. R has a rich ecosystem for every data science requirement and is the favorite language of academicians and researchers in the academic domain.
Learning Python is not enough to be a professional data scientist. You need to know R as well. A good book to start with is R For Data Science, available at Amazon at a very reasonable price. Some of the most popular packages in R that you need to know are ggplot2, ThreeJS, DT (tables), network3D, and leaflet for visualization, dplyr and tidyr for data manipulation, shiny and R Markdown for reporting, parallel, Rcpp and data.table for high performance computing and caret, glmnet, and randomForest for machine learning.
3) Statistics and Probability
This is the bread and butter of every data scientist. The best programming skills in the world will be useless without knowledge of statistics. You need to master statistics, especially practical knowledge as used in a scientific experimental analysis. There is a lot to cover. Any subtopic given below can be a blog-post in its own right. Some of the more important areas that a data scientist needs to master are:
Succinctly, linear algebra is about vectors, matrices and the operations that can be performed on vectors and matrices. This is a fundamental area for data science since every operation we do as a data scientist has a linear algebra background, or, as data scientists, we usually work with collections of vectors or matrices. So we have the following topics in Linear Algebra, all of which are covered in the following world-famous book, Linear Algebra and its Applications by Gilbert Strang, an MIT professor. You can also go to the popular MIT OpenCourseWare page, Linear Algebra (MIT OCW). These two resources cover everything you need to know. Some of the most fundamental concepts that you can also Google or bring up on Wikipedia are:
5) Data Preparation and Data ETL (Extraction, Transformation, and Loading)
By IAmMrRob on Pixabay
Yes – welcome to one of the more infamous sides of data science! If data science has a dark side, this is it. Know for sure that unless your company has some dedicated data engineers who do all the data munging and data wrangling for you, 90% of your time on the job will be spent on working with raw data. Real world data has major problems. Usually, it’s unstructured, in the wrong formats, poorly organized, contains many missing values, contains many invalid values, and contains types that are not suitable for data mining.
Dealing with this problem takes up a lot of the time of a data scientist. And your data scientist’s analysis has the potential to go massively wrong when there is invalid and missing data. Practically speaking, unless you are unusually blessed, you will have to manage your own data, and that means conducting your own ETL (Extraction, Transformation, and Loading). ETL is a data mining and data warehousing term that means loading data from an external data store or data mart into a form suitable for data mining and in a state suitable for data analysis (which usually involves a lot of data preprocessing). Finally, you often have to load data that is too big for your working memory – a problem referred to as external loading. During your data wrangling phase, be sure to look into the following components:
Automating the Data ETL Pipeline
Automation of Data Validation and Verification
Usually, expert data scientists try to automate this process as much as possible, since a human being would be wearied by this task very fast and is remarkably prone to errors, which will not happen in the case of a Python or an R script doing the same operations. Be sure to try to automate every stage in your data processing pipeline.
6) Machine Learning with Python and R
An expert machine learning scientist has to be proficient in the following areas at the very least:
Data Science Topics Listing – Thomas
Now if you are just starting out in Machine Learning (ML), Python, and R, you will gain a sense of how huge the field is and the entire set of lists above might seem more like advanced Greek instead of Plain Jane English. But not to worry; there are ways to streamline your learning and to consume as little time as possible in learning or becoming able to learn nearly every single topic given above. After you learn the basics of Python and R, you need to go on to start building machine learning models. From experience, I suggest you break up your time into 50% of Python and 50% of R and spend as much time as possible spending time without switching your languages or working between languages. What do I mean? Spend maximum time learning one programming language at one time. That will prevent syntax errors and conceptual errors and language confusion problems.
Now, on the job, in real life, it is much more likely that you will work in a team and be responsible for only one part of the work. However, if your working in a startup or learning initially, you will end up doing every phase of the work yourself. Be sure to give yourself time to process information and to spend sufficient time for your brain to rest and get a handle on the topics you are trying to learn. For more info, do check out the Learning How to Learn MOOC on Coursera, which is the best way to learn mathematical or scientific topics without ending up with burn out. In fact, I would recommend this approach to every programmer out there trying to learn a programming language, or anything considered difficult, like Quantum Mechanics and Quantum Computation or String Theory, or even Microsoft F# or Microsoft C# for a non-Java programmer.
Common tools that you have with which you can produce powerful visualizations include:
Google Data Studio
Microsoft Power BI Desktop
Some involve coding, some are drag-and-drop, some are difficult for beginners, some have no coding at all. All of these tools will help you with data visualization. But one of the most overlooked but critical practical functions of a data scientist has been included under this heading: summarisation.
Summarisation means the practical result of your data science workflow. What does the result of your analysis mean for the operation of the business or the research problem that you are currently working on? How do you convert your result to the maximum improvement for your business? Can you measure the impact this result will have on the profit of your enterprise? If so, how? Being able to come out of a data science workflow with this result is one of the most important capacities of a data scientist. And most of the time, efficient summarisation = excellent knowledge of statistics. Please know for sure that statistics is the start and the end of every data science workflow. And you cannot afford to be ignorant about it. Refer to the section on statistics or google the term for extra sources of information.
How Can I Learn Everything Above In the Shortest Possible Time?
You might wonder – How can I learn everything given above? Is there a course ora pathway to learn every single concept described in this article at one shot? It turns out – there is. There is a dream course for a data scientist that contains nearly everything talked about in this article.
Want to Become a Data Scientist? Welcome to Dimensionless Technologies! It just so happens that the course: Data Science using Python and R, a ten-week course that includes ML, Python and R programming, Statistics, Github Account Project Guidance, and Job Placement, offers nearly every component spoken about above, and more besides. You don’t know to buy the books or do any of the courses other than this to learn the topics in this article. Everything is covered by this single course, tailormade to convert you to a data scientist within the shortest possible time. For more, I’d like to refer you to the following link:
Does this seem too good to be true? Perhaps, because this is a paid course. With a scholarship concession, you could end up paying around INR 40,000 for this ten-week course, two weeks of which you can register for 5,000 and pay the remainder after two weeks trial period to see if this course really suits you. If it doesn’t, you can always drop out after two weeks and be poorer by just 5k. But in most cases, this course has been found to carry genuine worth. And nothing worthwhile was achieved without some payment, right?
In case you want to learn more about data science, please check out the following articles:
Python and R have been around for well over 20 years. Python was developed in 1991 by Guido van Rossum, and R in 1995 by Ross Ihaka and Robert Gentleman. Both Python and R have seen steady growth year after year in the last two decades. Will that trend continue, or are we coming to an end of an era of the Python-R dominance in the data science segment? Let’s find out!
Python in the last decade has grown from strength to strength. In 2013, Python overtook R as the most popular language used fordata science, according to the Stack Overflow developer survey (Link).
Will Python’s Dominance Continue?
We believe, yes, definitely. Two words – data science.
Data science is such a hot and happening field right now, and the data scientist job is hyped as the ‘sexiest job of the 21st century‘, according to Forbes. Python is by far the most popular language for data science. The only close competitor is R, which Python overtook in the KDNuggets data science survey of 2016 . As shown in the link, in 2018, Python held 65.6% of the data science market, and R was actually below RapidMiner, at 48.5%. From the graphs, it is easy to see that Python is eating away at R’s share in the market. But why?
In 2018, we say a huge push towards advancement across all verticals in the industry due to deep learning. And what is the most famous tool for deep learning? TensorFlow and Keras – both Python-based frameworks! While we have Keras and TensorFlow interfaces in R and RStudio now, Python was the initial choice and is still the native library – kerasR and tensorflow in RStudio being interfaces to the Python packages. Also, a real-life implementation of a deep learning project contains more than the deep learning model preparation and data analysis.
There is the data preprocessing, data cleaning, data wrangling, data preparation, outlier detection and missing data values management section which is infamous for taking up 99% of the time of a data scientist, with actual deep learning model work taking just 1% or less of their on-duty time! And what language is used for this commonly? For general purpose programming, Python is the goto language in most cases. I’m not saying that R doesn’t have data preprocessing packages. I’m saying that standard data science operations like web scraping are easier in Python than in R.And hence Python will be the language used in most cases, except in the statistics and the university or academic fields.
Our prediction for Python – growth – even to 70% of the data science market as more and more research-level projects like AutoML keep using Python as a first language of choice.
What About R?
In 2016, the use of R for data science in the industry was 55%, and Python stood at 51%. Python increased by 33% and R decreased by 25% in 2 years. Will that trend continue and will R continue on its downward spiral? I believe perhaps in figures, but not in practice. Here’s why.
Data science is at its heart, the field of the statistician. Unless you have a strong background in statistics, you will be unable to process the results of your experiments, especially in concepts like p-values, tests of significance, confidence intervals, and analysis of experiments. And R is the statistician’s language.Statistics and mathematics students will always find working in R remarkably easy and simple, which explains its popularity in academia. R programming lends itself to statistics. Python lends itself to model building and decent execution performance (R can be 4x slower). R, however, excels in statistical analysis. So what is the point that I am trying to express?
Simple – Python and R are complementary. They are best used together. You will find that knowledge of both Python and R will suit you best for most projects. You need to learn both. You can find this trend expressed in every article that speaks about becoming a data science unicorn – knowledge of both Python and R is required as a norm.
Yes, R is having a downturn in popularity. However, due to the complementary nature of the tools, I believe that R will have a part to play in the data scientist’s toolbox, even if it does dip a bit in growth in the years to come. Very simply, R is too convenient for a statistician to be neglected by the industry completely. It will continue to have its place in the toolbox. And yes; deep learning is now practical in R with support for Keras and AutoML as well as of right now.
Dimensionless Technologies is the market leader as far as training in AI, cloud, deep learning and data science in Python and R is concerned. Of course, you don’t have to spend 40k for a data science certification, you could always go for its industry equivalent – 100-120 lakhs for a US university’s Ph.D. research doctorate! What Dimensionless Technologies has as an advantage over its closest rival – (Coursera’s John Hopkins University’s Data Science Specialization) – is:
Live Video Training
The videos that you get on Coursera, edX, Dataquest, MIT OCW (MIT OpenCourseWare), Udacity, Udemy, and many other MOOCs have a fundamental flaw – they are NOT live! If you have a doubt in a video lecture, you only have the comments as a communication tool to the lectures. And when over 1,000 students are taking your class, it is next to impossible to respond to every comment. You will not and cannot get personalized attention for your doubts and clarifications. This makes it difficult for many, especially Indian students who may not be used to foreign accents to have a smooth learning curve in the popular MOOCs available today.
Try Before You Buy Fully
Dimensionless Technologies offers 20 hours of the course for Rs 5000, with the remaining 35k (10k of 45k waived if you qualify for the scholarship) payable after 2 weeks / 20 hours of taking the course on a trial basis. You get to evaluate the course for 20 hours before deciding whether you want to go through the entire syllabus with the highly experienced instructors who are strictly IIT alumni.
Instructors with 10 years Plus Industry Experience
In Coursera or edX, it is more common for Ph.D. professors than industry experienced professionals to teach the course. If you are good with American accents and next to zero instructor support, you will be able to learn a little bit about the scholastic side of your field. However, if you want to prepare for a 100K USD per year US data scientist job, you would be better off learning from professionals with industry experience. I am Not criticizing the Coursera instructors here, most have industry experience as well in the USA. However, if you want connections and contacts in the data science industry in India and the US, you might be a bit lost in the vast numbers of student who take those courses. Industry experience in instructors is rare in a MOOC and critically important to your landing a job.
Personalized Attention and Job Placement Guarantee
Dimensionless has a batch size of strictly not more than 25 per batch. This means that unlike other MOOCs with hundreds or thousands of students, every student in a class will get individual attention and training. This is the essence of what makes this company the market leader in this space. No other course provider has this restriction, which makes it certain that when you pay the money, you are 100% certain of completing your course, unlike all the other MOOCs out there. You are also given training for creating a data science portfolio, and how to prepare for data science interviews when you start applying to companies. The best part of this entire process is the 100% job placement guarantee.
If this has got your attention, and you are highly interested in data science, I encourage you to go to the following link to see more about the Data Science Using Python and R course, a strong foundation for a data science career:
Europe has more than 307 million people on Facebook
There are five new Facebook profiles created every second!
More than 300 million photos get uploaded per day
Every minute there are 510,000 comments posted and 293,000 statuses updated (on Facebook)
And all this data was gathered 21st May, last year!
Photo by rawpixel on Unsplash
So I decided to do a more up to date survey. The data below was from an article written on 25th Jan 2019, given at the following link:
By 2020, the accumulated volume of big data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB.
Originally, data scientists maintained that the volume of data would double every two years thus reaching the 40 ZB point by 2020. That number was later bumped to 44ZB when the impact of IoT was brought into consideration.
The rate at which data is created is increased exponentially. For instance, 40,000 search queries are performed per second (on Google alone), which makes it 3.46 million searches per day and 1.2 trillion every year.
Freshers in Analytics get paid more than then any other field, they can be paid up-to 6-7 Lakhs per annum (LPA) minus any experience, 3-7 years experienced professional can expect around 10-11 LPA and anyone with more than 7-10 years can expect, 20-30 LPA.
Opportunities in tier 2 cities can be higher, but the pay-scale of Tier 1 cities is much higher.
E-commerce is the most rewarding career with great pay-scale especially for Fresher’s, offering close to 7-8 LPA, while Analytics service provider offers the lowest packages, 6 LPA.
It is advised to combine your skills to attract better packages, skills such as SAS, R Python, or any open source tools, offers around 13 LPA.
Machine Learning is the new entrant in analytics field, attracting better packages when compared to the skills of big data, however for a significant leverage, acquiring the skill sets of both Big Data and Machine Learning will fetch you a starting salary of around 13 LPA.
Combination of knowledge and skills makes you unique in the job market and hence attracts high pay packages.
Picking up the top five tools of big data analytics, like R, Python, SAS, Tableau, Spark along with popular Machine Learning Algorithms, NoSQL Databases, Data Visualization, will make you irresistible for any talent hunter, where you can demand a high pay package.
As a professional, you can upscale your salary by upskilling in the analytics field.
So there is no doubt about the demand or the need for data scientists in the 21st century.
Now we have done a survey for India. but what about the USA?
The following data is an excerpt from an article by IBM< which tells the story much better than I ever could:
Jobs requiring machine learning skills are paying an average of $114,000.
Advertised data scientist jobs pay an average of $105,000 and advertised data engineering jobs pay an average of $117,000.59% of all Data Science and Analytics (DSA) job demand is in Finance and Insurance, Professional Services, and IT.
Annual demand for the fast-growing new roles of data scientist, data developers, and data engineers will reach nearly 700,000 openings by 2020.
By 2020, the number of jobs for all US data professionals will increase by 364,000 openings to 2,720,000 according to IBM.
Data Science and Analytics (DSA) jobs remain open an average of 45 days, five days longer than the market average.
And yet still more! Look below:
By 2020 the number of Data Science and Analytics job listings is projected to grow by nearly 364,000 listings to approximately 2,720,000 The following is the summary of the study that highlights how in-demand data science and analytics skill sets are today and are projected to be through 2020.
There were 2,350,000 DSA job listings in 2015
By 2020, DSA jobs are projected to grow by 15%
Demand for Data scientists and data engineers is projectedto grow byneary40%
DSA jobs advertise average salaries of 80,265 USD$
81% of DSA jobs require workers with 3-5 years of experience or more.
Machine learning, big data, and data science skills are the most challenging to recruit for and potentially can create the greatest disruption to ongoing product development and go-to-market strategies if not filled.
So where does Dimensionless Technologies, with courses in Python, R, Deep Learning, NLP, Big Data, Analytics, and AWS coming soon, stand in the middle of all the demand?
The answer: right in the epicentre of the data science earthquake that is no hitting our IT sector harder than ever.The main reason I say this is because of the salaries increasing like your tummy after you finish your fifth Domino’s Dominator Cheese and Pepperoni Pizza in a row everyday for seven days! Have a look at the salaries for data science:
Do you know which city in India pays highest salaries to data scientist?
Mumbai pays the highest salary in India around 12.19L p.a.
Report of Data Analytics Salary of the Top Companies in India
Accenture’s Data Analytics Salary in India: 90% gets a salary of about Rs 980,000 per year
Tata Consultancy Services Limited Data Analytics Salary in India: 90% of the employees get a salary of about Rs 550,000 per year. A bonus of Rs 20,000 is paid to the employees.
EY (Ernst & Young) Data Analytics Salary in India: 75% of the employees get a salary of Rs 620,000 and 90% of the employees get a salary of Rs 770,000.
HCL Technologies Ltd. Data Analytics Salary in India: 90% of the people are paid Rs 940,000 per year approximately.
In the USA
To convert into INR, in the US, the salaries of a data scientist stack up as follows:
Lowest: 86,000 USD = 6,020,000 INR per year (60 lakh per year)
Average: 117,00 USD = 8,190,000 INR per year (81 lakh per year)
Highest: 157,000 USD = 10,990,000 INR per year(109 lakh per year or approximately one crore)
at the exchange rate of 70 INR = 1 USD.
By now you should be able to understand why everyone is running after data science degrees and data science certifications everywhere.
The only other industry that offers similar salaries is cloud computing.
A Personal View
On my own personal behalf, I often wondered – why does everyone talk about following your passion and not just about the money. The literature everywhere advertises“Follow your heart and it will lead you to the land of your dreams”. But then I realized – passion is more than your dreams. A dream, if it does not serve others in some way, is of no inspirational value. That is when I found the fundamental role – focus on others achieving their hearts desires, and you will automatically discover your passion. I have many interests, and I found my happiness doing research in advanced data science and quantum computing and dynamical systems, focusing on experiments that combine all three of them together as a single unified theory. I found that that was my dream. But, however, I have a family and I need to serve them. I need to earn.
Thus I relegated my dreams of research to a part-time level and focused fully on earning for my extended family, and serving them as best as I can. Maybe you will come to your own epiphany moment yourself reading this article. What do you want to do with your life? Personally, I wish to improve the lives of those around me, especially the poor and the malnourished. That feeds my heart. Hence my career decision – invest wisely in the choices that I make to garner maximum benefit for those around me. And work on my research papers in the free time that I get.
So my hope for you today is: having read this article, understand the rich potential that lies before you if you can complete your journey as a data scientist. The only reason that I am not going into data science myself is that I am 34 years old and no longer in the prime of my life to follow this American dream. Hence I found my niche in my interest in research. And further, I realized that a fundamental ‘quantum leap’ would be made if my efforts were to succeed. But as for you, the reader of this article, you may be inspired or your world-view expanded by reading this article and the data contained within. My advice to you is: follow your heart. It knows you best and will not betray you into any false location. Data science is the future for the world. make no mistake about that. And – from whatever inspiration you have received go forward boldly and take action. Take one day at a time. Don’t look at the final goal. Take one day at a time. If you can do that, you will definitely achieve your goals.
The salary at the top, per year. From glassdoor.com. Try not to drool. 🙂
Finding Your Passion
Many times when you’re sure you’ve discovered your passion and you run into a difficult topic, that leaves you stuck, you are prone to the famous impostor syndrome. “Maybe this is too much for me. Maybe this is too difficult for me. Maybe this is not my passion. Otherwise, it wouldn’t be this hard for me.” My dear friend, this will hit you. At one point or the other. At such moments, what I do, based upon lessons from the following course, which I highly recommend to every human being on the planet, is: Take a break. Do something different that completely removes the mind from your current work. Be completely immersed in something else. Or take a nap. Or – best of all – go for a run or a cycle. Exercise. Workout. This gives your brain cells rest and allows them to process the data in the background. When you come back to your topic, fresh, completely free of worry and tension, completely recharged, you will have an insight into the problem for you that completely solves it. Guaranteed. For more information, I highly suggest the following two resources:
or the most popular MOOC of all time, based on the same topic: Coursera
Learning How to Learn – Coursera and IEEE
This should be your action every time you feel stuck. I have completely finished this MOOC and the book and it has given me the confidence to tackle any subject in the world, including quantum mechanics, topology, string theory, and supersymmetry theory. I strongly recommend this resource (from experience).
So Dimensionless Technologies (link given above) is your entry point to all things data science. Before you go to TensorFlow, Hadoop, Keras, Hive, Pig, MapReduce, BigQuery, BigTable, you need to know the following topics first:
All the best. Your passion is not just a feeling. It is a choice you make the day in and a day out whether you like it or not. That is the definition of character – to do what must be done even if you don’t feel like it. Internalize this advice, and there will be no limits to how high you can go.All the best!
Now unless you’ve been a hermit or a monk living in total isolation, you will have heard of Amazon Web Services and AWS Big Data. It’s a sign of an emerging global market and the entire world becoming smaller and smaller every day. Why? The current estimate for the cloud computing market in 2020, according to Forbes (a new prediction, highly reliable), is a staggering 411 Billion USD$! Visit the following link to read more and see the statistics for yourself:
To know more, refer to Wikipedia for the following terms by clicking on them, which mark, in order the evolution of cloud computing (I will also provide the basic information to keep this article as self-contained as possible):
This was the beginning of the revolution called cloud computing. Companies and industries across verticals understood that they could let experts manage their software development, deployment, and management for them, leaving them free to focus on their key principle – adding value to their business sector. This was mostly confined to the application level. Follow the heading link for more information, if required.
PaaS began when companies started to understand that they could outsource both software management and operating systems and maintenance of these platforms to other companies that specialized in taking care of them. Basically, this was SaaS taken to the next level of virtualization, on the Internet. Amazon was the pioneer, offering SaaS and PaaS services worldwide from the year 2006. Again the heading link gives information in depth.
After a few years in 2011, the big giants like Microsoft, Google, and a variety of other big names began to realize that this was an industry starting to boom beyond all expectations, as more and more industries spread to the Internet for worldwide visibility. However, Amazon was the market leader by a big margin, since it had a five-year head start on the other tech giants. This led to unprecedented disruption across verticals, as more and more companies transferred their IT requirements to IaaS providers like Amazon, leading to (in some cases) savings of well over 25% and per-employee cost coming down by 30%.
After all, why should companies set up their own servers, data warehouse centres, development centres, maintenance divisions, security divisions, and software and hardware monitoring systems if there are companies that have the world’s best experts in every one of these sectors and fields that will do the job for you at less than 1% of the cost the company would incur if they had to hire staff, train them, monitor them, buy their own hardware, hire staff for that as well – the list goes on-and-on. If you are already a tech giant like, say Oracle, you have everything set up for you already. But suppose you are a startup trying to save every penny – and there and tens of thousands of such startups right now – why do that when you have professionals to do it for you?
There is a story behind how AWS got started in 2006 – I’m giving you a link, so as to not make this article too long:
OK. So now you may be thinking, so this is cloud computing and AWS – but what does it have to do with Big Data Speciality, especially for startups? Let’s answer that question right now.
A startup today has a herculean task ahead of them.
Not only do they have to get noticed in the big booming startup industry, they also have to scale well if their product goes viral and receives a million hits in a day and provide security for their data in case a competitor hires hackers from the Dark Web to take down their site, and also follow up everything they do on social media with a division in their company managing only social media, and maintain all their hardware and software in case of outages. If you are a startup counting every penny you make, how much easier is it for you to outsource all your computing needs (except social media) to an IaaS firm like AWS.
You will be ready for anything that can happen, and nothing will take down your website or service other than your own self. Oh, not to mention saving around 1 million USD$ in cost over the year!If you count nothing but your own social media statistics, every company that goes viral has to manage Big Data! And if your startup disrupts an industry, again, you will be flooded with GET requests, site accesses, purchases, CRM, scaling problems, avoiding downtime, and practically everything a major tech company has to deal with!
Bro, save your grey hairs, and outsource all your IT needs (except social media – that you personally need to do) to Amazon with AWS!
And the Big Data Speciality?
Having laid the groundwork, let’s get to the meat of our article. The AWS certified Big Data Speciality website mentions the following details:
The AWS Certified Big Data – Specialty exam validates technical skills and experience in designing and implementing AWS services to derive value from data. The examination is for individuals who perform complex Big Data analyses and validates an individual’s ability to:
Implement core AWS Big Data services according to basic architecture best practices
Design and maintain Big Data
Leverage tools to automate data analysis
So, what is an AWS Big Data Speciality certified expert? Nothing more than an internationally recognized certification that says that you, as a data scientist can work professionally in AWS and Big Data’s requirements in Data Science.
Please note: the eligibility criteria for an AWS Big Data Speciality Certification is:
Minimum five years hands-on experience in a data analytics field
Background in defining and architecting AWS Big Data services with the ability to explain how they fit in the data life cycle of collection, ingestion, storage, processing, and visualization
Experience in designing a scalable and cost-effective architecture to process data
To put it in layman’s terms, if you, the data scientist, were Priyanka Chopra, getting the AWS Big Data Speciality certification passed successfully is the equivalent of going to Hollywood and working in the USA starring in Quantico!
Suddenly, a whole new world is open at your feet!
But don’t get too excited: unless you already have five years experience with Big Data, there’s a long way to go. But work hard, take one step at a time, don’t look at the goal far ahead but focus on every single day, one day, one task at a time, and in the end you will reach your destination. Persistence, discipline and determination matters. As simple as that.
Five Advantages of an AWS Big Data Speciality
1. Massive Increase in Income as a Certified AWS Big Data Speciality Professional (a long term 5 years plus goal)
Everyone who’s anyone in data science knows that a data scientist in the US earns an average of 100,000 USD$ every year. But what is the average salary of an AWS Big Data Speciality Certified professional? Hold on to your hat’s folks; it’s 160,000 $USD starting salary. And with just two years of additional experience, that salary can cross 250,000 USD$ every year if you are a superstar at your work. Depending upon your performance on the job! Do you still need a push to get into AWS? The following table shows the average starting salaries for specialists in the following Amazon products: (from www.dezyre.com)
Top Paying AWS Skills According to Indeed.com
Elastic MapReduce (EMR)
Key Management Service
2. Wide Ecosystem of Tools, Libraries, and Amazon Products
Amazon Web Services, compared to other Cloud IaaS services, has by far the widest ecosystem of products and tools. As a Big Data specialist, you are free to choose your career path. Do you want to get into AI? Do you have an interest in ES3 (storage system) or HIgh-Performance Serverless computing (AWS Lambda). You get to choose, along with the company you work for. I don’t know about you, but I’m just writing this article and I’mseriouslyexcited!
3. Maximum Demand Among All Cloud Computing jobs
If you manage to clear the certification in AWS, then guess what – AWS certified professionals have by far the maximum market demand! Simply because more than half of all Cloud Computing IaaS companies use AWS. The demand for AWS certifications is the maximum right now. To mention some figures: in 2019, 350,000 professionals will be required for AWS jobs. 60% of cloud computing jobs ask for AWS skills (naturally, considering that it has half the market share).
4. Worldwide Demand In Every Country that Has IT
It’s not just in the US that demand is peaking. There are jobs available in England, France, Australia, Canada, India, China, EU – practically every nation that wants to get into IT will welcome you with open arms if you are an AWS certified professional. And look no further than this site. AWS training will be offered soon, here: on Dimensionless.in. Within the next six months at the latest!
5. Affordable Pricing and Free One Year Tier to Learn AWS
Amazon has always been able to command the lowest prices because of its dominance in the market share. AWS offers you a free 1 year of paid services on its cloud IaaS platform. Completely free for one year. AWS training materials are also less expensive compared to other offerings. The following features are offered free for one single year under Amazon’s AWS free tier system:
The following is a web-scrape of their free-tier offering:
AWS Free Tier One Year Resources Available
There were initially seven pages in the Word document that I scraped from www.aws.com/free. To really have a look, go to the website on the previous link and see for yourself on the following link (much more details in much higher resolution). Please visit this last mentioned link. That alone will show you why AWS is sitting pretty on top of the cloud – literally.
Right now, AWS rules the roost in cloud computing. But there is competition from Microsoft, Google, and IBM. Microsoft Azure has a lot of glitches which costs a lot to fix. Google Cloud Platform is cheaper but has very high technical support charges. A dark horse here: IBM Cloud. Their product has a lot of offerings and a lot of potential. Third only to Google and AWS. If you are working and want to go abroad or have a thirst for achievement, go for AWS. Totally. Finally, good news, all Dimensionless current students and alumni, the languages that AWS is built on has 100% support for Python! (It also supports, Go, Ruby, Java, Node.js, and many more – but Python has 100% support).
Keep coming to this website – expect to see AWS courses here in the near future!
Ok. So there’s been a lot of coverage by various websites, data science gurus, and AI experts about what 2019 holds in store for us. Everywhere you look, we have new fads and concepts for the new year. This article is going to be rather different. We are going to highlight the dark horses – the trends that no one has thought about but will completely disrupt the working IT environment (for both good and bad – depends upon which side of the disruption you are on), in a significant manner. So, in order to give you a taste of what’s coming up, let’s go through the top four (plus 1 (bonus) = five) top trends of 2019 for data science:
Cyber Data Science Crime
(Bonus) Quantum Computation & Data Science
1. AutoML (& AutoKeras)
This single innovation is going to change the way machine learning works in the real world. Earlier, deep learning and even advanced machine learning was an aristocratic property of PhD holders and other research scientists. AutoML has changed that entire domain – especially now that AutoKeras is out.
AutoML automates machine learning. It chooses the best architecture by analyzing the data – through a technology called Neural Architecture Search (NAS), tries out various models and gives you the best possible hyperparameters for your scenario automatically! Now, this was priced at the ridiculous price of 76$ USD per hour, but we now have a free open source competitor, AutoKeras.
AutoKeras is an open source free alternative to AutoML developed at University of Texas A & M DATA lab and the open source community. This project should make a lot of deep learning accessible to everyone on the planet who can code even a little. To give you an example, this is the code used to train an arbitrary image classifier with deep learning:
Note:Of course, the entire training and testing process will take more than a day to complete at the very least, but less if you have some high-throughput GPUs or Google’s TPUs (Tensor Processing Units – custom hardware for data science computation) or plenty of money to spend on the cloud infrastructure computation resources of AutoML.
2. Interoperability (ONNX)
For those of you are new as to what interoperability means to neural networks – we now have several Deep Learning Neural Network Software Libraries competing with each other for market dominance. The most highly rated ones are:
Torch & PyTorch
However, converting an artificial neural network written in CNTK (Microsoft Cognitive Tool Kit) to Caffe is a laborious task. Why can’t we simply have one single standard so that discoveries in AI can be shared with the public and with the open source community?
To solve this problem, the following standard has been proposed:
Open Neural Network Exchange Format
ONNX is a standard in which deep learning networks can be represented as a directed acyclic computation graph which is compatible with every deep learning framework available today (almost). Watch out for this release since if proper transparency is enforced, we could see decades worth of research happen in this single year!
3. Cyber Data Science Crime
There are allegations that the entire US elections conducted last year was a socially engineered manipulation of the US democratic process using data science and data mining techniques. The most incriminated product was Facebook, on which fake news were spread by Russian agents and the Russian intelligence forces, leading to an external agency deciding who the US president would be instead of the US people themselves. And yes, one of the major tools used was data science! So, this is not so much a new trend as an already existing phenomenon, one that needs to be recognized and dealt with effectively.
While this is a controversial trend or topic, it needs to be addressed. If the world’s most technologically advanced nation can be manipulated by its sworn enemies to elect a leader (Trump) that no one really wanted, then how much more easily can nations like India or the UK be manipulated as well?
This has already begun in a small way in India with BJP social media departments putting up pictures of clearly identifiable cities (there was one of Dubai) as cities in Gujarat on WhatsApp. This trend will not change any time soon. There needs to be a way to filter the truth from the lies. The threat comes not from information but from misinformation.
Are you interested in the elections? Then get involved in Facebook. What happened in the USA in 2018 could easily happen in India in 2019. The very fabric of democracy could break apart. As data scientists, we need to be aware of every issue in our field. And we owe it to the public – and to ourselves – to be honest and hold ourselves to the highest levels of integrity.
We could do more than thirty blog posts on this topic alone – but we digress.
4. Cloud AI-as-a-Service
To understand Cloud AI-as-a-Service, we need to know that maintaining an AI in-house analytics solution is overkill as far as most companies are concerned. It is so much easier to outsource the construction, deployment and maintenance costs of an AI system to a company that provides it online at a far lesser cost than what the effort and difficulty would be otherwise in maintaining and updating an in-built version that has to be managed by a separate department with some very hard-to-find and esoteric skills. There are so many start-ups in this area alone over the last year (over 100) that to list all of them would be a difficult task. Of course, as usual, all the big names are extremely involved.
This is a trend that has already manifested, and will only continue to grow in popularity. There are already a number of major players in this AI as a Service offering including but not limited to Google, IBM, Amazon, Nvidia, Oracle, and many, many more. In this upcoming year, companies without AI will fall and fail spectacularly. Hence the importance of keeping AI open to the public for all as cheaply as possible. What will be the end result? Only time will tell.
5. Quantum Computing and AI (Bonus topic)
Quantum Computing is very much an active research topic as of now. And the country with the greatest advances is not the US, but China. Even the EU has invested over 1 Billion Euros in its quest to build a feasible quantum computer. A little bit of info about quantum computing, in 5 crisp points:
It has the potential to become the greatest quantum leap since the invention of the computer itself. (pun intended)
However, practical hardware difficulties have kept quantum computers constructed so far as laboratory experiments alone.
If a quantum computer that can manipulate 100-200 qubits (quantum bits) is built, every single encryption algorithm used today will be broken quite easily.
The difficulty in keeping single atoms in isolated states consistently (decoherence) makes current research more of an academic choice.
Experts say a fully functional quantum computer could be 5-15 years away. But, also, it would herald a new era in the history of mankind.
In fact, the greatest example of a quantum computer today is the human brain itself. If we develop quantum computing to practical levels, we could also gain the information to create a truly intelligent computer.
Cognition in real world AI that is self-aware. How awesome is that?
So there you have it. The five most interesting and important trends that could become common in the year 2019 (although the jury will differ on the topic of the quantum computer – it could work this year or ten years from now – but it’s immensely exciting).
What are your thoughts? Do you find any of these topics worth further investigation? Feel free to comment and share!
For additional information, I strongly recommend the articles given below:
Alternate views or thoughts? Share your feelings below!