9923170071 / 8108094992 info@dimensionless.in
Top 5 Data Visualization Tools for 2019

Top 5 Data Visualization Tools for 2019

Importance of Data Visualization

All the best dataset, Artificial Intelligence,  Machine Learning, and Business Intelligence tools are useless without effective visualization capabilities. In the end, data science is all about presentation, Whether you are a chief data scientist at Google or an all-in-one ‘many-hats’  data scientist at a start-up, you still have to show the results of your algorithm to a management executive for approval. We have all heard the adage, a picture is worth a thousand words”. I would rephrase that for data science as “An effective infographic is worth an infinite amount of data”. Because even if you present the most amazing algorithms and statistics in the universe to your management, they will be unable to comprehend it. But present even a simple infographic – and everyone in the boardroom, from the CEO to your personnel manager, will be able to understand what your findings mean for your business enterprise.

Tools for Visualization

Because of the fundamental truth stated above, there are a ton of data visualization tools out there for the needs of every data scientist on the planet. There is a wide variety available. From premium and power-user based, to products from giants like Microsoft and Google, to free offerings for developers like Plot.ly across multiple languages and bokeh for Python developers, to DataWrapper for non-technical users. So I have picked five tools that vary widely but are all very effective and worth learning in depth. So let’s get started!

 

Tableau Logo

1. Tableau (https://public.tableau.com/)

Tableau Sample

Tableau Sample Email Marketing ReportTableau is the market leader for visualization as far as data science is concerned. The statistics speak for themselves. Over 32,000 companies use Tableau around the world and this tool is by far the most popular choice among top companies like Facebook and Amazon. What is more, once you learn Tableau, you will know visualization well enough to handle every other tool in the market. This tool is the most popular, the most powerful, and yet surprisingly intuitive to use. If you wanted to learn one single tool for data science, this is It. 

2. Qlikview (https://www.qlik.com/us)

Qlikview Sample

Qlikview solution sample

Qlikview is another solution like Tableau that requires payment for a commercial user, yet it is so powerful that I couldn’t help but include it in my article. This tool is situated more for the power-user and the well-experienced data scientists. While not as intuitive as Tableau, this tool boasts of powerful features that can be used by large-scale users. This is a very powerful choice for many companies all over the world.

3. Microsoft Power BI (https://powerbi.microsoft.com/)

Microsoft Power BI

Microsoft Power BI sample

Unlike the first two tools, Microsoft Power BI (Business Intelligence) is completely free to use and download. It integrates beautifully with Microsoft tools. If you’re on Microsoft Azure as a cloud computing solution, you will enjoy this tool’s seamless integration with Microsoft products. Contrary to popular business ethos at Microsoft, this tool is both free to download (full-featured) and free to use, even the Desktop version. If you use Microsoft tools, then this could be a solution that fits you well. (Although Tableau is the tool used the most by software companies).

4. Google Data Studio (https://datastudio.google.com)

Sample from Google Data Studio

Google Data Studio Sample

This tool is strictly cloud-based and its highest USP is that it tightly integrates with the Google Internet Website Ecosystem. In fact, it is better that the solution be cloud-based and not on your desktop since a copy on your desktop would have to be continually resynchronized, whereas a cloud solution manages all requirements as required with the latest Internet datasets, refreshed every time you load the page. Nearly every single tool you need is at your fingertips, and this is one way to learn the Google-based way to manage your website or company. And did I mention – like Microsoft Power BI, it is completely free of cost! But again, Tableau is still the preferred solution for mainstream software companies.

5. Datawrapper (https://app.datawrapper.de/)

Datawrapper Sample

Datawrapper Sample

This is by far the most user-friendly visualization tool for data science available on the Internet today. And while I was skeptical, this tool can be used by completely non-technical users. And the version I used was free up to to a massive 10,000 chart views. So if you want to create a visualization and don’t have technical skills in coding or Python, this may be your best way to get started. In case you’re feeling skeptical (as I was), visit the website above and view the instructions video (100 seconds – less than 2 minutes). If you are a beginner to data visualization, this is where to go first.

6. Information is Beautiful (https://www.informationisbeautiful.net/)

This is an article on visualization and communicating concepts and analysis through graphics, so it would not be complete without this free gallery of samples at www.informationisbeautiful.net. What do we plan to communicate but information? Information is processed data. Data scientists deal with data but produce as output information. This website has opened my eyes as to how data can be presented effectively. While this is not something you would use for an industrial report, do visit the site for inspiration for ways to make your data visualization more good-looking. If you have business transformational data, it requires the best presentation available. This is a post for five data visualization tools, but consider this sixth one as a bonus for inspiration and all the times you wished your dashboard or charts could be more effective graphically.

Conclusion

While there is a ton of information out there, choose tools that cater to your domain. If you are a large scale enterprise, Tableau could be your best option. If you are a student or want a high-quality free solution, go for DataWrapper. QlikView can be used by companies who want to save on their budget and have plenty of experienced professionals (although this is also a use-case for Tableau). For convenient tools, you can’t go wrong with Microsoft Power BI if your company uses Microsoft ecosystem and Google Data Studio is you are integrated into the Google ecosystem instead. Finally, if you are a student of data visualization or just want to improve your data presentation, please visit informationisbeautiful.net. Trust me, it will be an eye-opener.

Finally, Tableau is what you need to learn to be a true data science professional, especially in FAMGA (Facebook, Apple, Microsoft, Google, and Amazon).

Also, remember to enjoy your work. This adds a fun element to your current job and ensures against burnout and other such problems. This is, in the end, artistry. Even if you are into coding. All the best!

For more on Data Visualization, I strongly recommend the articles below:

https://dimensionless.in/what-is-a-box-plot/

Data Visualization – Hans Rosling

 

 

Will Julia Replace Python and R for Data Science?

Will Julia Replace Python and R for Data Science?

Introduction to Julia

For those of you who don’t know, Julia is a multiple-paradigm (fully imperative, partially functional, and partially object-oriented) programming language designed for scientific and technical (read numerical) computing. It offers significant performance gains over Python (when used without optimization and vectorized computing using Cython and NumPy). Time to develop is reduced by a factor of 2x on average. Performance gains range in the range from 10x-30x over Python (R is even slower, so we don’t include it. R was not built for speed). Industry reports in 2016  indicated that Julia was a language with high potential and possibly the chance of becoming the best option for data science if it received advocacy and adoption by the community. Well, two years on, the 1.0 version of Julia was out in August 2018 (version 1.0), and it has the advocacy of the programming community and the adoption by a number of companies (see https://www.juliacomputing.com) as the preferred language for many domains – including data science. 

 

From https://blog.simos.info/learning-the-julia-computer-language-on-ubuntu/

While it shares many features with Python (and R) and various other programming languages like Go and Ruby, there are some primary factors that set Julia apart from the rest of the competition:

Advantages of Julia

  1. Performance

Basically, Julia is a compiled language, whereas Python and R are interpreted. That means that Julia code is executed directly on the processor as executable code. There are optimizations that can be done for compiler output that cannot be done with an interpreter. You could argue that Python, which is implemented in C as the Cython package, can be optimized to Julia-like levels of performance, can also be optimized if you know the language well. But the real answer to the performance question is that Julia gives you C-like speed without optimization and hand-crafted profiling techniques.  If such performance is available without optimization in the first place, why go for Python? Juia could well be the solution to all your performance problems.

Having said that, Julia is still a teenager as far as growth and development maturity as a programming language is concerned. Certain operations like I/O cause the performance to drop rather sharply if not properly managed. From experience, I suggest that if you plan to implement projects in Julia, go for Jupyter Notebooks running Julia 1.0 as a kernel. The Juno IDE on the Atom editor is the recommended environment, but the lack of a debugger is a massive drawback if you want to use Julia code in production in a Visual Studio-like environment. Jupyter allows you to develop your code one feature at a time, and easy access to any variable and array values that you need to ‘watch’. The Julia ecosystem has an enthusiastic and broad community developing it, but since the update to 1.0 was less than six months ago, not all the packages in Julia from version 0.7 have been migrated completely, so you could come up with the odd package versioning problem while installing necessary packages for your project.

HPC Laptop use (from Unsplash)

2. GPU Support

This is directly related to performance. GPU support is handled transparently by some packages like TensorFlow.jl and MXNet.jl. For developers in 2018 already using extremely powerful GPUs for various applications with CUDA or cryptocurrency mining (just to take two common examples), this is a massive bonus, and performance boosts can even go anywhere between 100x to even 1000x for certain operations and configurations. If you want certain specific GPU capacity, Julia has you covered with libraries like cuBLAS.jl and cuDNN.jl for vendor-specific and CUDANative.jl for hand-crafted GPU programming and support. Low-level CUDA programming is also possible in Julia with CUDAdrv.jl and the runtime library CUDArt.jl. Clearly, Julia has evolved well beyond standard GPU support and capabilities. And even as you read this article, an active community is at work to extend and polish the GPU support for Julia. 

3. Smooth Learning Curve, and  Extensive Built-in Functionality

Julia is remarkably simple to learn and enjoyable to use. If you are a Python program, you will feel completely at home using Julia. In Julia, everything is an expression and there is basic functional programming support. Higher order functions are supported by simple assignment (=) and the function operator -> (lambda in Python, => in C#). Multidimensional matrix support is excellent – functions for BLAS and LINPACK packages are included in the LinearAlgebra package of the standard library. Rows are separated with the comma(,) delimiter, and columns with the semicolon(;) during initialization. Matrix transpose, adjoint, factorization, matrix division, identity, slicing, linear equation system solving, triangulization, and many more functionalities are single function calls away. Even jagged multidimensional matrix support is available. Missing, nothing, any, all, isdef, istype, oftype, hash, Base as a parent class for every Julia object, isequal, and even undef (to represent uninitialized values) are just a few of the many keywords and functions available to increase the readability of your code. Other notable features include built-in support for arbitrary precision arithmetic, complex numbers, Dictionaries, Sets, BitStrings, BitArrays, and Pipelines. I have just scratched the surface here. The extensive built-in functionality is one of Julia’s best features, with many more easily available and well-documented modules.

The Julia programming language

4. Multiple Dispatch

This is a central core feature of the Julia programming language. Multiple dispatch or the multimethod functionality basically means that functions can be called dynamically at run-time to behave in different ways depending upon more than just the arguments passed (which is called function overloading) and can instead vary upon the objects being passed into it dynamically at runtime. This concept is fundamentally linked to the Strategy and Template design patterns in object-oriented terminology. We can vary the behaviour of the program depending upon any attribute as well as the calling object and the objects being passed into the function at runtime. This is one of the killer features of Julia as a language. It is possible to pass any object to a function and thus dynamically vary its behaviour depending upon any minute variation in its implementation, and in many cases, the standard library is built around this concept. Some research statistics indicate that the average lines of boilerplate code required in a complex system is reduced by a factor of three or more (!) by this language design choice. This simplifies development in highly complex software systems considerably and cleanly, without any unpredictable or erratic behaviour. It is, therefore, no surprise that Julia is receiving the acclaim now that it richly deserves.

5. Distributed and Parallel Computing Support

Julia supports parallel and distributed computing using multiple topologies transparently. There is support for coroutines, like in the Go programming language, which are helper functions that operate in parallel on multicore architectures. There is extensive support for threads and synchronization primitives have been carefully designed to maximize performance and minimize the risk of race conditions. Through simple text macros and decorators/annotations like @Parallel, any programmer can write code that executes in parallel. OpenMP support is also present for executing parallel code with compiler directives and macros. From the beginning of its existence, Julia was a language designed to support high-performance computing with multiple worker processes executing simultaneously. And this makes it much easier to use for parallel computing than (say) Python, which has always had the GIL (Global Interpreter Lock for threads) as a serious performance problem.

6. Interoperation with other programming languages (C, Java, Python, etc)

Julia can call C, Go, Java, MATLAB, R, and Python code using native wrapper functions – in fact, every commonly used programming language today has interoperability support with Julia. This makes working with other languages infinitely easier. The only major difference is that in Julia, array indexes begin at 1 (like R) whereas in many other languages it begins at 0 (C, C++, Java, Python, C#, and many more).  The interoperability support makes life as a Julia developer much simpler and easier than if you were working in Python or C++/Java. In the real world, calling other libraries and packages is a part of the data scientist’s daily routine – part of a developer’s daily routine. Open source packages are at an advantage here since they can be modified as required for your problem domain. Interoperability with all the major programming languages is one of Julia’s strongest selling points.

Programming Fundamentals 🙂

Current Weaknesses

Julia is still developing despite the 1.0 release. Performance, the big selling point, is affected badly by using global variables and the first run of a program is always slow, compared to execution following the first. The core developers of the Julia language need to work on its package management system, and critically, migration of all the third-party libraries to version 1.0. Since this is something that will happen over time with active community participation (kind of like the Python 2.7 to 3.6 migration, but with much less difficulty), this is an issue that will resolve itself as time goes by. Also, there needs to be more consistency in the language for performance on older devices. Legacy systems without quality GPUs could find difficulties in running Julia with just CPU processing power. (GPUs are not required, but are highly recommended to run Julia and for the developer to be optimally productive, especially for deep learning libraries like Knet (Julia library for neural networks and deep learning)). The inclusion of support for Unicode 11.0 can be a surprise for those not accustomed to it, and string manipulation can be a headache. Because Julia needs to mature a little more as a language, you must also be sure to use a profiler to identify and handle possible performance bottlenecks in your code.

Conclusion

So you might ask yourself a question – if you’re a company running courses in R and Python, why would you publish an article that advocates another language for data science? Simple. If you’re new to data science and programming, learning Python and R is the best way to begin your exploration into the current industry immediately (it’s rather like learning C before C++). And Python and R are not going anywhere anytime soon. Since there is both a massive codebase and a ton of existing frameworks and production code that run on Python and to a lesser level, on R, the demand for data scientists who are skilled in Python will extend far into the future. So yes, it is fully worth your time to learn both Python and R.  Furthermore, Julia has yet to be adopted industry-wide (although there are many Julia evangelists who promote it actively – e.g. see www.stochasticlifestyle.com). For at least ten years more, expect Python to be the main player as far as data science is concerned.

Furthermore, programming languages never fully die out. COBOL was designed by a committee led by Grace Hopper in 1959 and is still the go-to language for mainframe programming fifty years later. Considering that, expect the ubiquitously used Python and R to be market forces for at least two decades more. Even if more than 70% of the data science community turned to Julia as the first choice for data science, the existing codebase in Python and R will not disappear any time soon. Julia also requires more maturity as a language (as already mentioned) – some functions run slower than Python where the implementation has not been optimal and well tested, especially on older devices. Hence, Julia’s maturity as a programming language could still be a year or two away. So – reading this article, realize that Python and R are very much prerequisite skills in becoming a professional data scientist right now. But – do explore Julia a little in your free time. All the best!

For further information on choosing data science as a career, I highly recommend the following articles:

10 Data Science Skills to Land your Dream Job in 2019

How to Find Mentors for Data Science?

How to get a Data Science Job in 6 Months

Cheers!

How to Land a Job As a Data Scientist in 2019

How to Land a Job As a Data Scientist in 2019

It’s the buzz!

Everyone is talking about data science as the dream job that they want to have!

Yes, the “100K $USD annual package” is a big draw.

Furthermore, the key focus of self-help and self-improvement literature coming out in the last decade speak about doing what you enjoy and care about – in short, a job you love to do – since there is the greatest possibility that you will shine the brightest in those areas.

Hence many students and many adventurous challenge-hunting individuals from other professions and other (sometimes related) roles are seeking jobs that involve problem-solving. Data science is one solution since it offers both the chance to disrupt a company’s net worth and profits for the better by focusing on analytics from the data they already have as well as solving problems that are challenging and interesting. Especially for the math nerds and computer geeks with experience in problem-solving and a passionate thirst to solve their next big challenge.

So what can you do to land yourself in this dream role?

 

Fundamentals of Data Science

Data science comprises of several roles. Some involve data wrangling. Some involve heavy coding expertise. And all of them involve expert communication and presentation skills. If you focus on just one of these three aspects, you’re already putting yourself at a disadvantage. What you need is to follow your own passion. And then integrate it into your profession. That way you earn a high amount while still doing work you love to do, even at the level of going above and beyond all the expectations that your employer has of you.  So if you’re reading this article, I assume that you are either a student who is intrigued by data science or a working professional who is looking for a more lucrative profession. In such a case, you need to understand what the industry is looking for.

From http://news.mit.edu/2018/mitx-micromasters-program-statistics-and-data-science

a) Coding Expertise

If you want to land a job in the IT or data science fields, understand that you will have to deal with code. Usually, that code will already have been written by some other people or company in the first place. So being intimate with programming and readiness to spend hours and hours of your life sitting before a computer and writing code is something you have to get used to. The younger you start, the better. Children pick up coding fastest compared to all other age groups so there is a very real use-case for getting your kids to code and to see if they seem to like it as young as possible. And there is not just coding – the best choices in these cases will involve people who know software engineering basics and even source control tools and platforms (like Git and GitHub) and have already started their career in coding by contributing to open source projects.

If you are a student, and you want to know what all the hype is about, I suggest that you visit a site that teaches programming – preferably in Python – and start developing your own projects and apps. Yes – apps. The IT world is now mobile, and anyone without knowledge of how to build a mobile app for his product will be left in the dust as far as the highest level of earning is concerned. Even deep learning frameworks, that were once academic, have migrated to the mobile and app ecosystem. That was unthinkable a mere five years ago. If you already know the basics of programming, then learn source control (Git), and how to build programs for open source projects. And then contribute to those projects while you’re still a student. In this case, you will actually become an individual that companies go hunting for before you even complete your schooling or college education. Instead of the other way around!

Mentoring

If you are a student or a professional who is interested in this domain, but don’t know where to start – well – the best thing to do is to find a mentor. You can define a mentor or a coach as someone who has achieved what you aim to achieve in your life. You learn from their experience, their networking capabilities, and their tough sides – the way to keep up your ambition and motivation when you feel the least motivated. If you want to learn data science, what better way than to learn from someone who has done that already? And you will gain a lot of traction when you show promise, especially on your networking side for job placement. For more on that topic (mentoring) – I highly recommend that you study the following article:

https://dimensionless.in/how-to-find-mentors-for-data-science/ 

b) Cogent Communication (Writing and Speaking skills)

Even if you have the world’s best programming expertise, ACM awards, Mathematics Olympiad winning background, you name it – even if you are the best data scientist available in the industry today for your domain – you will go nowhere without communication skills. Communication is more than speaking, reading and typing English – it is the way you present yourself to others in the digital world. That is why blogging, content creation, and focused interaction with your target industry – say, on StackOverflow.com – are so important. A blog really resonates with those to whom you seek a job. It shows that you have genuine, original knowledge about your industry. And if your blog receives critical acclaim through several incoming links from the industry, expect a job interview offer in your email before too long. In many countries but especially in India, the market is flooded with graduates, postgraduates, and PhDs who might have top marks on paper but have no marketable skills as far as their job requirements demand.

Overcome your fears!

Right now it is difficult to see the difference between a 100th percentile skilled data scientist and a 30th percentile skill level by just looking at documents that you submit to a company. A blog testifies that you know your field authoritatively. It also means that you have gained attention from industry leaders (when you receive comments). A StackOverflow answer that is highly rated or even a mention in technology sites like GitHub indicate that you are an expert in your field. Communication is so critical that I recommend that you try to make the best use of every chance you get to speak in public. This is the window the world has on you. Make yourself heard. Be original. Be creative. And the best data scientist in the world will go nowhere unless he or she knows how to communicate effectively. In the industry, this capacity is known as soft skills. And it can be your single biggest advantage over the competition. If you are planning to join a training course for your dream job, make sure the syllabus covers it!

c) Social Networking and Building Industry Connections through LinkedIn

Many sources of information don’t focus on this issue, but it is an absolute must. Your next job could be waiting for you on LinkedIn through a connection. Studies show that less than 1% of resume submissions are selected for the final job offer and lucrative placement. But the same studies show that at least 30% of internal referrals from within a company get placed into the job of their dreamsNetworking is important – so important that if you know the job you’re after, please reach out and research. Understand the company’s problems. Try to address some of their key issues. The more focused, you are the more likely it is that you will get placed in the company you aim for. But always have a plan B – a fallback system, so that in case you do not get placed, you will know what to do. This is especially important today with the competition being so intense.

The Facebook of the Workplace

One place where you can be noticed is through industry connections in social networks. You might miss this, even if you are an M.S. from a college in the US. LinkedIn profiles – the Facebook of the technology world – are especially important today. More and more, in an environment saturated with high-quality talent, who you know can sometimes be even more important as what you know. Connecting to professionals in the industry you plan to work in is critical. This can occur through meetups, through conferences, through technological symposiums and even through paid courses. Courses who have instructors with industry connections are worth their weight in gold – even platinum. Students of such courses who show outstanding promises will be directed to their industry leaders early. If you have a decent GitHub profile but don’t know where to go after that, one way is to go for a course with industry experienced experts. These are the people who are the most likely to be able to land you a job in such a competitive environment. Because the market for data scientists – in fact for IT professionals in general – is highly saturated, including locations like the US.

Conclusion

We have not covered all topics required on this issue, there is much more to speak about. You need to know Statistics – even at PhD levels sometimes, especially Inferential Statistics, Bayes Theorem, Probability and Analysis of Experiments. You should know Linear Algebra in-depth. Indeed, there is a lot to cover. But the best place to learn can be courses tailored to produce Data Scientists. Some firms have really gone the extra mile to convert industry knowledge and key results in each subtopic to create noteworthy training courses specially designed for data science students. In the end, no college degree alone will land you a dream job. What will land you a dream job is hard work and experience through internships and industry projects. Some courses like the ones offered by www.Dimensionless.in have resulted in stellar placement and guidance even after the course duration is finished and when you are a working professional in the job of your dreams. These courses offer – 

  1. Individual GitHub Profile Creation & Mentoring (Coding Expertise)
  2. Training in Soft Skills
  3. Networking Connections on LinkedIn
  4. Instructors with Industry Experience (not academic professors!)

It’s a simple yet potent formula to land you the job of your dreams. Compare the normal route to a data science dream job – a PhD from the US (starting cost Rs. 1,40,28,000.00 INR for five years total, as a usual range) – to a simple course at Rs. 50K to Rs. 25K (yes, INR) from the comfort of taking the course from wherever you may be in the world (remote but live tuition – not recorded videos) with a mic on your end to ask the instructor every doubt you have – and you have a remarkable product guaranteed to land you a dream job within six months. Think the offer’s too good to be true? Well; visit the link below, and pay special attention to the feedback from past students of these same courses on the home page.

Last words – you never know what the future holds – economy and convenience are both prudent and praiseworthy. All the best!

 

 

Top 10 Advantages of a Data Science Certification

Top 10 Advantages of a Data Science Certification

Data science is a booming industry, with potentially millions of job openings by 2020, according to the latest analyst’s business predictions. But what if you want to learn data science without the heavy cost of a postgraduate degree or the US university MOOC specialization? What is the best way to prepare for this upcoming wave of opportunity and maximize your chances for a 100K+ USD (annual) job? Well – there are many challenges that stand before you in such a case. Not only is the market saturated with an abundance of existing fresh talent, but most of the training you receive in college has no relationship to the actual type of work you get on the job. With so many engineering graduates passing out every year from so many established institutions such as the IITs, how can you hope to realistically compete? Well – there is one possibility you can choose if you wish to stand out from the rest of the competition – high-quality data science programs or courses. And in this article, we are going to list the top ten advantages of choosing such a course compared to other options, like a Ph.D., or an online MOOC Specialization from a US university (which are very tempting options, especially if you have the money for them).

Top Ten Advantages of Data Science Certification

1. Stick to Essentials, Cut the Fluff.

Now if you are a professional data scientist, no one expects you to derive any AI algorithms from first principles. You also don’t need to extensively dig into the (relatively) trivial history behind each algorithm, nor learn SVD (Singular Value Decomposition) or Gaussian Elimination on a real matrix without a computer to assist you. There is so much material that an academic degree covers that is never used on the job! Yes, you need to have an intuitive idea about the algorithms. But unless you’re going in for ML research, there’s not much use of knowing, say, Jacobians or Hessians in depth. Professional data scientists work in very different domains while compared to academic researchers or academic counterparts. Learn what you need on the job. If you try to cover everything mentioned in class, you’ve already lost the race. Focus on learning bare essentials thoroughly. You always have Google and StackOverflow to assist you as long as you’re not writing an exam!

2. Learning from Instructors with Work Experience, not PhD scientists!

Now from whom should you receive training? From PhD academics who’ve never worked on a real professional project but have published extensively, or instructors with real-life professional project experience? Very often, the teachers and instructors in colleges and universities belong to the former category, and you are remarkably fortunate if you have an instructor who has that invaluable component called industry experience. The latter category are rare and difficult to find, and you are lucky – even remarkably so – if you are studying under them. They will be able to teach you with context to the job experience in real-life, which is always exactly what you need the most.

3. Working with the Latest Technology Stacks.

Now, who would be better able to land you a job – teachers who teach what they studied ten years ago, or professionals who work with the latest tools available in the industry? It’s undoubtedly true that the people with industry experience can help you to choose what technologies you should learn and master. Academics, in comparison, could even be working with technology stacks over ten years old! Please try to stick with instructors who have work experience.

4. Individual Attention.

In a college or a MOOC with thousands of students, it’s simply not possible for each student to get individual attention. However, in data science programs, it is true that every student will receive individual attention tailored to their needs, which is exactly what you need. Every student is different and will have their own understanding of the projects available. This customized attention that is available when batch sizes are less than 30-odd is the greatest advantage such students have over college and MOOC students.

5. GitHub Project Portfolio Guidance.

Every college lecturer will advise you to develop a GitHub project portfolio, but they cannot give your individual profile genuine attention. The reason for that is that they have too many students and requirements upon their time to be able to spend time with individual project portfolios and actually mentor you in designing and establishing your own project portfolio. However, data science programs are different and it is genuinely possible for the instructors to mentor you individually in designing your project portfolios. Experienced industry professionals can even help you identify ‘niches’  within your field in which you can shine and carve out a special brand for your own project specialties so that you can really distinguish yourself and be a class apart from the rest of your competition.

6. Mentoring even After Getting Placed in a Company and Working by Yourself.

Trust me, no college professor will be able or even available to help you once you get placed within the industry since your domains will be so different. However, its a very different story with industry professionals who become instructors. You can even go to them or contact them for guidance even after placement, which is, simply not something most academic professors will be able to do unless they too have industry experience, which is very rare.

7. Placement Assistance.

People who have worked in the industry will know the importance of having company referrals in the placement process. It is one thing to have a cold call with a  company with no internal referrals. Having someone already established within the company you apply to can be the difference between a successful and unsuccessful recruitment process. Every industry professional will have contacts in many companies, which puts them in a unique position to aid you at the time of placement opportunities.

8. Learn Critical but Non-Technical Job Skills, such as Networking, Communication, and Teamwork

teamwork in data science

While it is important to know the basics, one reason why brilliant students do badly in the industry after they get a job is the lack of soft skills like communication and teamwork. A job in the industry is so much more than bare skills studied in class. You need to be able to communicate effectively and to work well in teams, which can be guided by industry professionals but not by professors since they will have no experience in this area because they have never worked in the industry. Professionals will know who to guide you with regard to this aspect of your expertise, since its a case of being in that position and having learnt the necessary skills in the industry through their job experiences and work capacities.

9. Reduced Cost Requirements

It is one thing to be able to sponsor your own PhD doctoral fees.  It is quite another thing to learn the very same skills for less than 1% of the cost of a PhD degree in, say, the USA. Not only is it financially less demanding, but you also don’t have to worry about being able to pay off massive student loans through industry work and fat paychecks, often at the cost of compromising your health or your family needs. Why take a Rs. 75 lakh student loan, when you can get the same outcome from a course less than 0.5% of the price? The takeaways will still be the same! In most cases, you will even receive better training through the data science program than an academic qualification because your instructors will have job experience.

10. Highly Reduced Time Requirements

A PhD degree takes, on average, 5 years. A data science program gets you job-ready in a few months time. Why don’t you decide which is better for you? This is especially true when you already have job experience in another domain or you are more than 23-25 years old, and doing a full PhD program could put you on the wrong side of 30 with almost no job experience. Please go for the data science program, since the time spent working in your 20s is critical for most companies who are hiring today since they consider you to a be a good ‘çultural fit’ for the company environment, especially when you have less than 3-4 years experience.

Summary

Thus, its easy to see that in so many ways, a data science program can be much better for you than a data science degree. So, the critical takeaway for this article is that there is no need to spend Rs. 75,000,000+ for skills which you can acquire for Rs. 35,000 max. It really is a no-brainer. These data science programs really offer true value for money. In case you’re interested, please do check out the following data science programs, each of which have every one of the advantages listed above:

Data Science Programs Offered by Dimensionless.in

  1. Data Science with Python and R: https://dimensionless.in/data-science-using-r-python/ 
  2. Big Data Analytics and NLP: https://dimensionless.in/big-data-analytics-nlp/
  3. Deep Learning: https://dimensionless.in/deep-learning/

All the best and happy learning!

Must-Have Resources to Become a Data Scientist

Must-Have Resources to Become a Data Scientist

Become a Professional

Now there are a multitude of data science resources out there, all of whom claim to be the “best possible introductory to advanced material and courseware on the subject of data science”. Now I’ve made mistakes in choosing my data science references to buy and keep (and use) but I’ll be sharing what I’ve learned through experience to be the most effective for these particular topics. This list is both effective and born out of experience by going through them one at a time. The list of resources contains the following items:

eBooks (or Books – your choice):

  1. Developing Analytic Talent: Becoming a Data Scientist by Vincent Granville
  2. Introduction to Machine Learning with Python by Muller & Guido.
  3. R for Data Science by Wickham & Grolemund.
  4. Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurelion Geron
  5. Statistics by Freedman, Pisani, & Purves

Websites:

  1. www.stackoverflow.com (for doubts and errors during coding)
  2. www.kaggle.com (for Data Science Competitions and worldwide rankings)

Online Courses:

  1. Dimensionless.in
    1. Data Science with Python and R: https://dimensionless.in/data-science-using-r-python/ 
    2. Big Data Analytics and NLP: https://dimensionless.in/big-data-analytics-nlp/
    3. Deep Learning: https://dimensionless.in/deep-learning/

 Books for Data Science

1. Developing Analytic Talent: Becoming a Data Scientist by Vincent Granville
Now there are not many books that I would recommend for a professional data scientist, but this book is written by an authority with 15 years of experience in the data science field working on some seriously large-scale projects for the best companies in the world. And it shows. This single book contains some of the latest and the best methods to achieve what you need to be a professional data scientist. And it’s not just teaching theory. Every chapter has multiple case studies taken from the experiences in the industry. Vincent Granville is recognized worldwide as one of the best-known resource talents in data science. The level is a little advanced, and it is not recommended for beginners. But this is the perfect book for advanced-intermediate to professional data scientists. If you want to know how to work professionally as a data scientist, this book is for you. But this is only for intermediate, advanced, and professional data scientists since you need to know the basics before starting on this book.

2. Introduction to Machine Learning with Python – A Guide for Data Scientists (Muller & Guido)

Now, this is a book for beginners, with just a basic knowledge of numpy, pandas and matplotlib required. This is perhaps the most effective way to learn the Scikit-Learn data science library since the authors are two of the core contributors to the scikit-learn package as an open source project. They literally know the library inside out, since they both contributed heavily to creating it! The explanations are simple and the time spent working on the exercises and source code in this book will be highly beneficial if you want to master scikit-learn and its associated libraries.

3. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Wickham & GrolemundThis is another beginner-friendly book, teaching all the basics of R clearly and concisely for those with basic programming skills. R is a language intended for manipulation of raw data, and it is an excellent complement to your toolset if you already know Python and are preparing for a career as a data scientist. The IDE used is RStudio, which is bundled with the Anaconda distribution of Python and ML libraries. Both authors are chief scientists involved in the RStudio software development team and are also members of the R Foundation. This book gets you up and running in R effectively and quickly.

4. Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurelion Geron

 

This book has received massive acclaim from the data science community for the breadth of knowledge which it provides and is one of the best books on this topic till date. TensorFlow coverage is excellent, and there are methodologies that this book teaches to get your data science project perfectly executed immediately. The TensorFlow (with some Keras) coverage is the most simple and easy to understand among all the various TensorFlow tutorials I personally have found both on the Web as well as in the few available ebooks. If you want to work in Deep Learning but don’t know how to get started, this book is for you (it covers Deep Learning as well)!

5. Statistics, 4th Ed. by Freedman, Pisani, and Purves

There is a wide selection of Statistics books for data scientists, but this book is highly recommended. For many simple reasons:

  1. It does not use equations but real-life examples
  2. All the material is based on real-world scenarios
  3. Inferential Statistics coverage is excellent
  4. Study of Design of Experiments is the entry point for the book
  5. Tests for significance and p-values is clearly explained without formulas through examples
  6. The material is presented in such a way that the reader becomes excellent at off-the-cuff estimation.
  7. Most graphics in the book can be drawn easily by hand.
  8. Excellent set of exercises and solutions with real-world applications.
  9. Plain English words, real-life stories, understanding concepts through applications and simple explanations.
  10. Few to almost very little formulae.
  11. Perfect for someone who is approaching the subject for the first time.

And the icing on the cake is that it is a very enjoyable read! 🙂  

Websites for Data Science

1. www.stackoverflow.com

Once upon a time, if a developer was stuck on a programming problem, he would have to go through several textbooks each over 500 pages long to find the answer to his problem. Not any more! StackOverflow is a site that is a platform for questions and doubts on nearly every type of programming language available, including Python, R, scikit-learn, TensorFlow, Keras, pandas, numpy, scipy, Theano, PyTorch, matplotlib and dozens more (both languages and libraries). It shows the power of crowd-sourcing problems since it is much easier to find the answer to a problem from 50,000 people than just four or five which would be the case if you were studying from a few teachers. You can simply copy-paste your error message in your data science compiler tool into StackOverflow and the site will return fully worked out and clearly explained solutions to your problem.

The way I see it – StackOverflow was a defining moment in programming. Once upon a time, debugging was a challenge. Now for nearly 90% or more of all bugs and errors, StackOverflow has your answer, explained in clear English, with the corrected source code. What more could you want? These days, anyone can become a developer in any language, thanks to this single site. And the concept has become so popular that there are now a multitude of crowd-sourcing answers to questions platform websites such as www.math-exchange.com, www.stack-exchange.com, and around 10 to 20 more sites that provide this functionality for that particular field, be it Mathematics, English, and even Christianity!

2. www.kaggle.com

These days, just having an impressive profile on Kaggle will be enough to land you a job interview at the very least. Kaggle is a site that has been hosting data science competitions for many years. The competition is immense and intense, but so are the tutorials and the articles are also equally powerful and instructive. If you want to be a data scientist, not having a decent Kaggle profile is inexcusable. Kaggle will be like a showcase of your data science skills to the entire world. Even if you don’t rank very high, consistency and practice can get you there more often than not.

And there is another side to it, a course designed purely for the purpose of winning data science competitions, available on Coursera (How to Win Data Science Competitions: Learn from Top Kagglers), available on this link: https://www.coursera.org/learn/competitive-data-scienceHowever, this course is not for beginners, it is only for those who already have a strong background in machine learning and machine learning libraries with practical programming knowledge experience.

Courses in Data Science

Dimensionless.in

Dimensionless.in is an elite data science training company that imparts industry level experience and knowledge to those with a real thirst to learn. Training is given from the basics, leading to a strong foundation. Now, anyone with discipline and persistence can learn data science and become a data scientist. All the faculty in the courses are IIT alumni. The training received is personalized to cater to the needs of each student. There are only 20-30 students in a single batch.

Going by popular wisdom, this level of detail and attention is not available in any of the major data science training centers and course curators. Most organizations focus on marketing and volume (quantity). But Dimensionless Technologies has their focus not on quantity but in quality. The following three major courses are offered:

  1. Data Science with Python and R: https://dimensionless.in/data-science-using-r-python/ 
  2. Big Data Analytics and NLP: https://dimensionless.in/big-data-analytics-nlp/
  3. Deep Learning: https://dimensionless.in/deep-learning/

An in-depth explanation for each course is given at each of the links. Do check them out and have an in-depth view of the potential that is here to be tapped, for your benefit.

And Finally…

AI & ML (Artificial Intelligence & Machine Learning) is the future for nearly every single industry. The question on every CEO’s and CIO’s mind will be this: Why should we set up a staffed division in our company for any role when an automated machine with just a high one-time investment (for which operating costs are literally non-existent (compared to paying a salary to 100 staff members – you can do the math) can do the same job for us more reliably, more efficiently, more consistently, and more accurately than people when they do it as staff or employees? That burning question is racing through nearly every industry in the world right now. Thousands of jobs will be automated and the biggest demand in all sectors will be for machine learning experts who are also highly skilled in domain knowledge of that company (say hospitals, for e.g.).

Don’t be scared of the incoming changes. Change is humanity’s best friend. Without change, life would be boring. Changes are not just challenges, they are also opportunities for much higher-paying and much less laborious jobs than the jobs you hold currently. And, if by chance, you happen to be a student reading this article, you now know which industry you should focus on –  completely. All the best, and remember to enjoy the process of learning. Regardless of your age, this is the best time to be alive – ever. Because domain knowledge is available more widely today than at any time in the past. Be enthusiastic. Be positive. Be disciplined. Be focused. And make the right choices at the right times – and no, its never too late when you have quality trainers ready to mentor you. May the thrill of learning a completely new concept with truly enlightened insight never leave you. Once again, all the best.

Robots and Automation