9923170071 / 8108094992 info@dimensionless.in
Must-Have Resources to Become a Data Scientist

Must-Have Resources to Become a Data Scientist

Become a Professional

Now there are a multitude of data science resources out there, all of whom claim to be the “best possible introductory to advanced material and courseware on the subject of data science”. Now I’ve made mistakes in choosing my data science references to buy and keep (and use) but I’ll be sharing what I’ve learned through experience to be the most effective for these particular topics. This list is both effective and born out of experience by going through them one at a time. The list of resources contains the following items:

eBooks (or Books – your choice):

  1. Developing Analytic Talent: Becoming a Data Scientist by Vincent Granville
  2. Introduction to Machine Learning with Python by Muller & Guido.
  3. R for Data Science by Wickham & Grolemund.
  4. Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurelion Geron
  5. Statistics by Freedman, Pisani, & Purves

Websites:

  1. www.stackoverflow.com (for doubts and errors during coding)
  2. www.kaggle.com (for Data Science Competitions and worldwide rankings)

Online Courses:

  1. Dimensionless.in
    1. Data Science with Python and R: https://dimensionless.in/data-science-using-r-python/ 
    2. Big Data Analytics and NLP: https://dimensionless.in/big-data-analytics-nlp/
    3. Deep Learning: https://dimensionless.in/deep-learning/

 Books for Data Science

1. Developing Analytic Talent: Becoming a Data Scientist by Vincent Granville
Now there are not many books that I would recommend for a professional data scientist, but this book is written by an authority with 15 years of experience in the data science field working on some seriously large-scale projects for the best companies in the world. And it shows. This single book contains some of the latest and the best methods to achieve what you need to be a professional data scientist. And it’s not just teaching theory. Every chapter has multiple case studies taken from the experiences in the industry. Vincent Granville is recognized worldwide as one of the best-known resource talents in data science. The level is a little advanced, and it is not recommended for beginners. But this is the perfect book for advanced-intermediate to professional data scientists. If you want to know how to work professionally as a data scientist, this book is for you. But this is only for intermediate, advanced, and professional data scientists since you need to know the basics before starting on this book.

2. Introduction to Machine Learning with Python – A Guide for Data Scientists (Muller & Guido)

Now, this is a book for beginners, with just a basic knowledge of numpy, pandas and matplotlib required. This is perhaps the most effective way to learn the Scikit-Learn data science library since the authors are two of the core contributors to the scikit-learn package as an open source project. They literally know the library inside out, since they both contributed heavily to creating it! The explanations are simple and the time spent working on the exercises and source code in this book will be highly beneficial if you want to master scikit-learn and its associated libraries.

3. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Wickham & GrolemundThis is another beginner-friendly book, teaching all the basics of R clearly and concisely for those with basic programming skills. R is a language intended for manipulation of raw data, and it is an excellent complement to your toolset if you already know Python and are preparing for a career as a data scientist. The IDE used is RStudio, which is bundled with the Anaconda distribution of Python and ML libraries. Both authors are chief scientists involved in the RStudio software development team and are also members of the R Foundation. This book gets you up and running in R effectively and quickly.

4. Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurelion Geron

 

This book has received massive acclaim from the data science community for the breadth of knowledge which it provides and is one of the best books on this topic till date. TensorFlow coverage is excellent, and there are methodologies that this book teaches to get your data science project perfectly executed immediately. The TensorFlow (with some Keras) coverage is the most simple and easy to understand among all the various TensorFlow tutorials I personally have found both on the Web as well as in the few available ebooks. If you want to work in Deep Learning but don’t know how to get started, this book is for you (it covers Deep Learning as well)!

5. Statistics, 4th Ed. by Freedman, Pisani, and Purves

There is a wide selection of Statistics books for data scientists, but this book is highly recommended. For many simple reasons:

  1. It does not use equations but real-life examples
  2. All the material is based on real-world scenarios
  3. Inferential Statistics coverage is excellent
  4. Study of Design of Experiments is the entry point for the book
  5. Tests for significance and p-values is clearly explained without formulas through examples
  6. The material is presented in such a way that the reader becomes excellent at off-the-cuff estimation.
  7. Most graphics in the book can be drawn easily by hand.
  8. Excellent set of exercises and solutions with real-world applications.
  9. Plain English words, real-life stories, understanding concepts through applications and simple explanations.
  10. Few to almost very little formulae.
  11. Perfect for someone who is approaching the subject for the first time.

And the icing on the cake is that it is a very enjoyable read! 🙂  

Websites for Data Science

1. www.stackoverflow.com

Once upon a time, if a developer was stuck on a programming problem, he would have to go through several textbooks each over 500 pages long to find the answer to his problem. Not any more! StackOverflow is a site that is a platform for questions and doubts on nearly every type of programming language available, including Python, R, scikit-learn, TensorFlow, Keras, pandas, numpy, scipy, Theano, PyTorch, matplotlib and dozens more (both languages and libraries). It shows the power of crowd-sourcing problems since it is much easier to find the answer to a problem from 50,000 people than just four or five which would be the case if you were studying from a few teachers. You can simply copy-paste your error message in your data science compiler tool into StackOverflow and the site will return fully worked out and clearly explained solutions to your problem.

The way I see it – StackOverflow was a defining moment in programming. Once upon a time, debugging was a challenge. Now for nearly 90% or more of all bugs and errors, StackOverflow has your answer, explained in clear English, with the corrected source code. What more could you want? These days, anyone can become a developer in any language, thanks to this single site. And the concept has become so popular that there are now a multitude of crowd-sourcing answers to questions platform websites such as www.math-exchange.com, www.stack-exchange.com, and around 10 to 20 more sites that provide this functionality for that particular field, be it Mathematics, English, and even Christianity!

2. www.kaggle.com

These days, just having an impressive profile on Kaggle will be enough to land you a job interview at the very least. Kaggle is a site that has been hosting data science competitions for many years. The competition is immense and intense, but so are the tutorials and the articles are also equally powerful and instructive. If you want to be a data scientist, not having a decent Kaggle profile is inexcusable. Kaggle will be like a showcase of your data science skills to the entire world. Even if you don’t rank very high, consistency and practice can get you there more often than not.

And there is another side to it, a course designed purely for the purpose of winning data science competitions, available on Coursera (How to Win Data Science Competitions: Learn from Top Kagglers), available on this link: https://www.coursera.org/learn/competitive-data-scienceHowever, this course is not for beginners, it is only for those who already have a strong background in machine learning and machine learning libraries with practical programming knowledge experience.

Courses in Data Science

Dimensionless.in

Dimensionless.in is an elite data science training company that imparts industry level experience and knowledge to those with a real thirst to learn. Training is given from the basics, leading to a strong foundation. Now, anyone with discipline and persistence can learn data science and become a data scientist. All the faculty in the courses are IIT alumni. The training received is personalized to cater to the needs of each student. There are only 20-30 students in a single batch.

Going by popular wisdom, this level of detail and attention is not available in any of the major data science training centers and course curators. Most organizations focus on marketing and volume (quantity). But Dimensionless Technologies has their focus not on quantity but in quality. The following three major courses are offered:

  1. Data Science with Python and R: https://dimensionless.in/data-science-using-r-python/ 
  2. Big Data Analytics and NLP: https://dimensionless.in/big-data-analytics-nlp/
  3. Deep Learning: https://dimensionless.in/deep-learning/

An in-depth explanation for each course is given at each of the links. Do check them out and have an in-depth view of the potential that is here to be tapped, for your benefit.

And Finally…

AI & ML (Artificial Intelligence & Machine Learning) is the future for nearly every single industry. The question on every CEO’s and CIO’s mind will be this: Why should we set up a staffed division in our company for any role when an automated machine with just a high one-time investment (for which operating costs are literally non-existent (compared to paying a salary to 100 staff members – you can do the math) can do the same job for us more reliably, more efficiently, more consistently, and more accurately than people when they do it as staff or employees? That burning question is racing through nearly every industry in the world right now. Thousands of jobs will be automated and the biggest demand in all sectors will be for machine learning experts who are also highly skilled in domain knowledge of that company (say hospitals, for e.g.).

Don’t be scared of the incoming changes. Change is humanity’s best friend. Without change, life would be boring. Changes are not just challenges, they are also opportunities for much higher-paying and much less laborious jobs than the jobs you hold currently. And, if by chance, you happen to be a student reading this article, you now know which industry you should focus on –  completely. All the best, and remember to enjoy the process of learning. Regardless of your age, this is the best time to be alive – ever. Because domain knowledge is available more widely today than at any time in the past. Be enthusiastic. Be positive. Be disciplined. Be focused. And make the right choices at the right times – and no, its never too late when you have quality trainers ready to mentor you. May the thrill of learning a completely new concept with truly enlightened insight never leave you. Once again, all the best.

Robots and Automation