mistake to avoid while learning data science

Introduction

The Harvard Business Review called the data scientist ‘the sexiest job of the 21st century’. As problem solvers and analysts, data scientists are the professionals identifying patterns, noticing trends and making new discoveries, often working with real-time data, machine learning, and AI.

Data scientists are in high demand, with forecasts from IBM suggesting that the number of data scientists will reach 28 percent by 2020. In the US alone, the number of roles for all US data professionals will reach 2.7 million. Also, powerful software programmes have given us access to deeper analytics than ever before. This analysis of data generated by people, places and things is a goldmine of invaluable insight

With the increase in demand, many people are rushing into the data science track. With the large influx of people into this field and most of them being freshers, people are committing many basic mistakes while progressing in their data science career.

10 Common Mistakes to Avoid to Master Data Science

In this blog, we will be looking at some of the very common mistakes that all of us as data scientists make. We will also try to address these issues with probable solutions to them.

    1. Spending a lot of time learning concepts without any practical application All the work and no play makes Jack a dull boy! You may have heard of this during your childhood days and trust me, it significantly holds true in data science too. Learning too much theory and not applying them does more harm than good. The theory is designed as keeping the ideal conditions in mind but these things do not hold practical with real-world problems. For example, you learn to apply a specific algorithm to solve a problem. The algorithm takes some parameters as input which is there with you while learning theory. But, in real-world situations, half of those required parameters will be missing and then there will be a challenge to apply that algorithm to solve a given problem at hand. A good data scientist is one who knows how to handle real-world data and constraints and generate usable insights out of it rather than a one who has a lot of knowledge but no experience in implementing them. I am surely not saying that going over a lot of theory is bad, what I am saying is collecting a lot of theory in your mind and not applying it anywhere is worseSolution –
      Your learning process should be a mix of both theory and practice. Whenever you learn something new, try to find a dataset and apply it over there. Take part in different competitions on websites like kaggle because you will not only learn more here but also will gain experience with the implementation of different concepts

2. Directly jumping to Machine Learning and (fancy) algorithms
Let us all clear this misconception first that machine learning is not everything data science has to offer. Data science is all about solving a given problem. It is a process which starts with understanding the problem and collecting data for the same to delivering insights and solutions for the problem. Between this, machine learning is a small portion(borrowed from computer science field) which helps in making predictions or wiser judgments with the data at hand. Many people directly jump to machine learning or give a lot of importance to it but this should not be the case. It is still ok if you want to be a machine learning engineer in the future but definitely not ok for a data scientist. Machine learning is not everything which data science has to offer. There are statistics, domain knowledge and communication skills attached to it too.

Solution –
The solution here is pretty simple. First, if you are really interested in machine learning, you should focus on its internal math also. You should first ensure you have a good grasp of linear algebra and calculus before directly deep diving into the machine learning. Secondly, one should pay attention to other aspects of data science and should focus on problem understanding more than applying a fancy algorithm to solve it.

3. Considering model accuracy to be supreme
Accuracy isn’t always what the business is after. Sure a model that predicts employee retention probability with 95% accuracy is good, but if you can’t explain how the model got there, which features led it there, and what your thinking was when building the model, your client will reject it. Accuracy sue is important but interpretability holds more importance. Maybe this is the case why deep neural networks are rarely used in the production given they are not highly interpretable.

Solution –
You can have a trade-off between accuracy and interpretability of the model. Try to understand how much accuracy fits in the domain of the business problem and whether the client is interested more in results or understanding of the problem and factors related to it

4. More attention to tools rather than the problem at hand
In data science, tools are not important but the solution to the problem is. It does not matter how you get to the solution considering tools in hand. Tools are for the purpose of making life easier and enabling one to perform tasks quickly hence one should not pay large attention to the usage of tools. For example, one should not try to fit in machine learning everywhere uselessly. Having a solid knowledge of tools and libraries is excellent, but it will only take you so far. Combining that knowledge with the business problem posed by the domain is where a true data scientist steps in. You should be aware of at least the basic challenges in the industry you are interested in (or are applying to).
Solution –
Search for datasets in a specific industry and try to work on them. This will create an impact on your resume. You should also focus on having domain knowledge of the problem you are trying to solve

5. Trying to learn everything at once
This is one of the most common mistakes many data scientists end up doing. Being a jack of all trades and master of none may give your knowledge a lot of breadths and but you will always lack the required depth. You will be able to start an approach to provide a solution to the problem but it will be very rare that you will till the end of it properly. One can not learn everything in one go.
Solution-
Try to find an area of deeper interest within data science and try getting depth in your knowledge and after that, you can work on increasing the breadth

6. Jumping to conclusions without proper validation
I have seen data scientists jumping straight to conclusions without validating results they are getting from their analysis or model predictions.
Solution-
Perform hypothesis testing and validate/invalidate all the insights you have generated by conducting statistical tests for their significance

7. Negligence towards data cleansing, EDA and visualizations
Many data scientist skim over the concepts of data cleaning, EDA and visualizations and move to data modeling. Understanding data first and make it usable for modeling is paramount hence a lot of attention should be given to these topics to emerge out as a successful data scientist
Solution-
Take up datasets from different sources and try finding insights out of them. Try to build a story around datasets with help of graphs and numbers extracted out of the dataset. This practice will help you in understanding the data better

8. Thinking that communication skill is not required
Communications skills are one of the most under-rated and least talked about aspects a data scientist absolutely MUST possess. I am yet to come across a course that places a solid emphasis on this. You can learn all the latest techniques, master multiple tools and make the best graphs, but if you cannot explain your analysis to your client, you will fail as a data scientist. And not just clients, you will also be working with team members who are not well versed with data science — IT, HR, finance, operations, etc. You can be sure that the interviewer will be monitoring this aspect throughout.
Solution-
One of the things, I find most helpful, is explaining data science terms to a non-technical person. It helps me gauge how well I have articulated the problem. If you’re working in a small to medium-sized company, find a person in the marketing or sales department and do this exercise with them. It will help you immensely in the long term.

9. Giving too much importance to coding skills
If you are a data scientist, you will have to code but this is not the hardest part of it. People tend to think that data science is all about coding and should put a lot of attention in coding skills. No doubt coding skills are required but one need not master it all-together.
Solution-
People should focus more on creative ways of solving a problem rather than focussing too much on their coding skills. There are many software’s to perform the tasks for data scientists. Again coding is an essential skill but not a mandatory skill.

10. Insufficient research on the problem at hand
Many problems do not reach a convincing solution just because the initial research on the problem was less or the domain knowledge related to that problem was not sufficient. People tend to jump into the problem directly without getting enough domain knowledge or performing a good initial research on what the problem is and how one should go about it
Solution-
Conduct an extensive initial research and try to get the complete idea of the industry domain of the problem you are dealing with. Try to talk to the people from the same domain and understand how the process flows in that business line.

Conclusion

These mistakes are not easy to avoid when you are starting fresh in the data science career. I have also done many of the above mistakes that I have mentioned above. It takes an experience to understand why all the above points make sense. As we grow in experience, we learn and we get better and emerge out as a champion!

Share via