Common Mistakes in Data Analysis

Introduction

Data Science is a study which deals with the identification, representation, and extraction of meaningful information from data. Data analysts collect it from different sources to use for business purposes.

With an enormous amount of facts generating each minute, the requirement to extract the useful insights is a must for the businesses. It helps them stand out from the crowd. With the huge demands for data scientists, many professionals are taking their founding steps in data science. With a large number of people being inexperienced in data science, there are a lot of basic mistakes committed by young data analysts.

In this blog, we will look into some of the common mistakes by young professionals in data analysis so that you don’t end up with the same.

Additionally, if you are interested in learning data science, then you can get an amazing course from us here

Common Mistakes

1. Correlation vs. causation

The underlying principle in statistics and data science is the correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is apparently the most common mistake in Time Series. Fawcett cites an example of a stock market index and the unrelated time series Number of times Jennifer Lawrence was mentioned in the media. The lines look amusingly similar. There is usually a statement like “Correlation = 0.86”. Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship. 0.86 is a high value, demonstrating that the statistical relationship of the two-time series is strong.

2. Not Looking Beyond Numbers

Some data analysts and marketers are only assessing the numbers they get, without putting them in their contexts. Quantitative data is not powerful unless it’s understood. In those instances, whoever performs the data analysis should ask himself “why” instead of “what”. Falling under the spell of big numbers is a common mistake that so many analysts commit.

3. Not defining the problem well

This can be regarded as the tone of the most fundamental problem in data science. Most of the issues which arise in data science are due to fact that the problem for which solution needs to be found out is itself not correctly defined. If you can’t define the problem well enough then reaching its solution will be a mere dream. One should research the problem well enough and analyse all the components like stakeholders, action plans etc.

4. Focussing on the wrong metric

When you’re just getting started, it can be tempting to get focus on small wins. While it’s definitely important and a great morale booster, make sure it’s not distracting from other metrics you should be more focused on (like sales, customer satisfaction, etc.

5. Not cleaning and normalising data before analysis

Always assume the data you are working with is inaccurate at first. Once you get familiar with it, you will start to “feel” when something is not quite right. Take a first glance using pivot tables or quick analytical tools to look for duplicate records or inconsistent spelling to clean up your data first. Also, not normalising the data is one more concern which can hinder your analysis. In most cases, when you normalize data you eliminate the units of measurement for data, enabling you to more easily compare data from different places.

6. Improper outlier treatment

Outliers can affect any statistical analysis, thereby analysts should investigate, delete and correct outliers as appropriate. For auditable work, the decision on how to treat any outliers should be documented. Sometimes loss of information may be a valid tradeoff in return for enhanced comprehension. In some cases, many people forget to treat the outliers which greatly affects the analysis and skews the results. In some other cases, you may focus too much on the outliers. Due to this, you devote large time handling those events which may not hold much significance in your analysis

7. Wrong graphs selection for visualisations

Let us take the case of pie charts here. Pie charts are for conveying a story about the parts-to-whole aspect of a set of data. That is, how big part A is in relation to part B, C, and so on. The problem with pie charts is that they force us to compare areas (or angles), which is pretty hard. Selecting the right kind of graph for the right context comes with experience.

8. Focussing more on the accuracy of the model rather than context

One should not focus too much on the accuracy of their model to an extent that you start overfitting the model to a particular case. Analysts build machine learning models to apply them to the general scenarios. Overfitting a model will make it work only for the situation which is exactly identical to training situation. In this case, model will fail badly for any situation different from the training environment.

9. Ignoring seasonality in data

Holidays, summer months, and other times of the year can mess up your data. Even a 3-month trend is explainable because of the busy tax season or back-to-school time. Make sure you are considering any seasonality in your data…even days of the week or times of the day!

10. No focus on the statistical significance of results while making decisions

Information from statistical significance testing is necessary but is not always sufficient. Statistical significance does not provide information about the impact of the significant result on business. Effect index size can evaluate this better

Why these common mistakes

1. Inadequate domain and technical knowledge

Having insufficient knowledge about the business of the problem at hand or maybe less technical knowledge required to solve that problem is a cause for these common mistakes. Proper business viewpoints, goal and technical knowledge must be a pre-requisite to the professionals before they start hands-on.

2. Time crunch

Less time available for the end analysis may make the analysts hurry up. This results in analysts missing out on small details as they can never follow a proper checklist and hence these common mistakes.

3. Inexperience in data analysis

Data science is a very huge subject and it is very uphill task for any fresher to have entire knowledge about data science. Common mistakes happening in data science are the result of the fact that most of the professionals are not even aware of certain fine aspects of data science.

Conclusion

Data analysis is both a science and an art. You need to be both calculative and creative, and your hard efforts will truly pay off. There’s nothing more satisfying than dealing with a data analysis problem and fixing it after numerous attempts. When you actually get it right, the benefits for you and the company will make a big difference in terms of traffic, leads, sales, and costs saved.

Additionally, if you are interested in learning Data Science, click here to get started

Furthermore, if you want to read more about data science, you can read our blogs here

Also, the following are some suggested blogs you may like to read

Introduction

Common Mistakes

Why these common mistakes

1. Inadequate domain and technical knowledge

2. Time crunch

3. Inexperience in data analysis

Conclusion

Submit a Comment Cancel reply

Recent Posts

Topics

Common Mistakes in Data Analysis

Introduction

Common Mistakes

Why these common mistakes

1. Inadequate domain and technical knowledge

2. Time crunch

3. Inexperience in data analysis

Conclusion

Submit a Comment Cancel reply

Recent Posts

Topics

Tags