A career in data science is hyped as the hottest job of the 21st century, but how do you become a data scientist? How should you, as an aspiring data scientist, or a student who aims at a data science job, prepare? What are the skills you need? What must you do? Fret not – this article will answer all your questions and give you links with which you can jump-start a new career in data science!
Data science as a field is a cross-disciplinary topic. By this, we mean that the data scientist has to know multiple fields and be an expert in many different things. A data scientist must have a strong foundation in the following subjects:
- Computer Science
- Statistical Research (solid foundation required)
- Linear Algebra
- Data Processing (data analyst expertise)
- Machine Learning
- Software Engineering
- Python Programming
- R Programming
- Business Domain Knowledge
The following diagram shows a little bit of the subjects you will need to master to become a high-quality data scientist:
Now unless you have been focused like a laser beam and have deliberately focused your studies in these areas, it is likely that you will not know one or more of the topics given above. Or you may know two or three really well but may not be solid in the rest. For example, you could be a computer science student who knows mathematics but not statistics to the in-depth level that Analysis of Statistical Research requires. Or you could be a statistician who has a little foundation in programming.
But there are ways to get past that crucial job interview. The five things you must do are:
- Learn Python and R from quality trainers with years of industry experience
- Build a portfolio of data science projects on GitHub
- Join Kaggle and participate in data science competitions
- Practice Interview Questions
- Do basic Online Reputation Management to improve your online presence.
1. Learn Python and R from the best trainers available
There is no substitute for industry experience. If your instructor is not just an enthusiastic amateur (as in the case of many courses available online) but someone with 5+ years of industry experience working in the data science industry, you have the best possible trainers in the field. It is one thing to learn Python and R. It is quite a completely different thing to master Python and R. If you want to do well in the industry, mastery is required, not just basic abilities. Make sure your faculty members have verified industry experience. Because that experience is what will count in finally landing you a job in a top-notch data science company. You will always learn the most from experts who have industry experience rather than academics who have a Ph.D. even in the subject but have not worked in the field.
2. Build a GitHub Portfolio of Data Science Projects
Having an online portfolio in GitHub is critical!
All the best training in the field will take you nowhere if you don’t code what you learn and apply the lessons to real-life datasets and scenarios. You need to do data science projects. Try to make your projects as attractive as possible. As much as you can, your GitHub project portfolio should be built with these guidelines in mind:
- Use libraries, languages, and tools that your target companies work with.
- Use datasets that are used by your companies, and always use real-world data. (no academic datasets like the ones supplied with scikit-learn. Use Kaggle to get practice datasets.) The best datasets are programmatically constructed with APIs from Twitter, Facebook, Wikipedia, and similar real-world scenarios.
- Choose problems that have market value. Don’t choose an academic project, but solve a real-world industry problem.
- Extra marks for creativity and originality in the problem definitions and the questions answered by the portfolio projects.
3. Join Kaggle or TopCoder and participate in Competitions
Kaggle.com is your training arena.
If you are into data science, become a Kaggler immediately! Or, if your taste leans more towards development, join TopCoder (they also have data science tracks). Kaggle is widely touted as the home of data science and for good reason, since Kaggle has been hosting data science competitions for many years and is the international location of all the best data science competitions. One of the simplest ways to get a call from a reputed company is to rank as high as possible on Kaggle. What is more, you will be able to compare your performance with the top competition in the industry.
4. Practice Interview Questions
There are plenty of sites available online that have excellent collections of industry questions used in data science interviews. Now, no one expects you to mug up 200 interview questions, but they do expect you to be able to solve basic data science and algorithm questions in code (Python preferably) or in pseudocode. You also need to know basic concepts like what cross-validation is, the curse of dimensionality, and the problem of overfitting and how you deal with it in practice in real-world scenarios. You should also be able to explain the internal details of most data science algorithms, for example, AdaBoost. Knowledge of linear algebra, statistics, and some basic multivariable calculus is also required to possess that extra edge over the competition.
5. Manage your Online Search Reputation
This may not seem connected with data science, but it is a fundamental component in any job search. What is the first thing that a prospective employer looks for while hunting for job candidates, when given a name? That’s right – he’ll Google it first. What comes up when you Google your name? Is your online profile safe under scrutiny? That is:
- Is your name when searched on Google free of red flags like negative reports of any type (offensive material, controversies)?
- Does the search engine entry for your name represent your profile with accuracy?
- Are your public Facebook, Twitter and Google profiles free of any automatic red flags? (e.g. intimate pictures)?
- Does the Google visibility of your name depict your skill levels correctly?
If the answers to any of these questions are no, you may need to adjust or tweak your online profile. You can do this by blog posts, informed mature comments online, or even creating a blog for yourself and speaking about yourself to the world in a positive manner. This is critical for any job applicant today, in this online, digital, connected world.
You are a Product to be Marketed!
You are trying to sell yourself and your credibility online to people who have never seen you, and not even heard your name. Your Internet profile will make the key crucial difference here, to make sure you stand out from the competition. Many training sites are available that offer courses by amateurs or people with less than 2 years of industry experience. Don’t make the unwise choice to be satisfied with a low-price course. On the Internet, you will get only what you pay for. And this is your future career in the subject area of your dreams. Surely a little initial investment will go a long way in the long run.
Additionally, it will help to gain the employers’ perspective as well. You can refer to this Hiring Guide by TopTal for further reading.
Always keep learning. ML and AI are fields that move forward at an incredible pace. Subscribing to RSS feeds and online websites that keep you updated with the latest developments in the field is something that you absolutely have to do. Nothing shows your commitment to excellence a much as keeping up with the latest state-of-the-art research. And you can do it quite easily by using Reader applications like Feedly and Inoreader. Learning might be something you do in college. But mastery is something you aim towards for your entire lifetime. Never give up. All the best for your job search, which will definitely be successful if you can follow the instructions mentioned here on this blog post. Finally, pay special attention to your portfolio of data science projects on GitHub to make sure you stand out from the competition.