Introduction to Julia
For those of you who don’t know, Julia is a multiple-paradigm (fully imperative, partially functional, and partially object-oriented) programming language designed for scientific and technical (read numerical) computing. It offers significant performance gains over Python (when used without optimization and vectorized computing using Cython and NumPy). Time to develop is reduced by a factor of 2x on average. Performance gains range in the range from 10x-30x over Python (R is even slower, so we don’t include it. R was not built for speed). Industry reports in 2016 indicated that Julia was a language with high potential and possibly the chance of becoming the best option for data science if it received advocacy and adoption by the community. Well, two years on, the 1.0 version of Julia was out in August 2018 (version 1.0), and it has the advocacy of the programming community and the adoption by a number of companies (see https://www.juliacomputing.com) as the preferred language for many domains – including data science.
While it shares many features with Python (and R) and various other programming languages like Go and Ruby, there are some primary factors that set Julia apart from the rest of the competition:
Advantages of Julia
- Performance
Basically, Julia is a compiled language, whereas Python and R are interpreted. That means that Julia code is executed directly on the processor as executable code. There are optimizations that can be done for compiler output that cannot be done with an interpreter. You could argue that Python, which is implemented in C as the Cython package, can be optimized to Julia-like levels of performance, can also be optimized if you know the language well. But the real answer to the performance question is that Julia gives you C-like speed without optimization and hand-crafted profiling techniques. If such performance is available without optimization in the first place, why go for Python? Juia could well be the solution to all your performance problems.
Having said that, Julia is still a teenager as far as growth and development maturity as a programming language is concerned. Certain operations like I/O cause the performance to drop rather sharply if not properly managed. From experience, I suggest that if you plan to implement projects in Julia, go for Jupyter Notebooks running Julia 1.0 as a kernel. The Juno IDE on the Atom editor is the recommended environment, but the lack of a debugger is a massive drawback if you want to use Julia code in production in a Visual Studio-like environment. Jupyter allows you to develop your code one feature at a time, and easy access to any variable and array values that you need to ‘watch’. The Julia ecosystem has an enthusiastic and broad community developing it, but since the update to 1.0 was less than six months ago, not all the packages in Julia from version 0.7 have been migrated completely, so you could come up with the odd package versioning problem while installing necessary packages for your project.
2. GPU Support
This is directly related to performance. GPU support is handled transparently by some packages like TensorFlow.jl and MXNet.jl. For developers in 2018 already using extremely powerful GPUs for various applications with CUDA or cryptocurrency mining (just to take two common examples), this is a massive bonus, and performance boosts can even go anywhere between 100x to even 1000x for certain operations and configurations. If you want certain specific GPU capacity, Julia has you covered with libraries like cuBLAS.jl and cuDNN.jl for vendor-specific and CUDANative.jl for hand-crafted GPU programming and support. Low-level CUDA programming is also possible in Julia with CUDAdrv.jl and the runtime library CUDArt.jl. Clearly, Julia has evolved well beyond standard GPU support and capabilities. And even as you read this article, an active community is at work to extend and polish the GPU support for Julia.
3. Smooth Learning Curve, and Extensive Built-in Functionality
Julia is remarkably simple to learn and enjoyable to use. If you are a Python program, you will feel completely at home using Julia. In Julia, everything is an expression and there is basic functional programming support. Higher order functions are supported by simple assignment (=) and the function operator -> (lambda in Python, => in C#). Multidimensional matrix support is excellent – functions for BLAS and LINPACK packages are included in the LinearAlgebra package of the standard library. Rows are separated with the comma(,) delimiter, and columns with the semicolon(;) during initialization. Matrix transpose, adjoint, factorization, matrix division, identity, slicing, linear equation system solving, triangulization, and many more functionalities are single function calls away. Even jagged multidimensional matrix support is available. Missing, nothing, any, all, isdef, istype, oftype, hash, Base as a parent class for every Julia object, isequal, and even undef (to represent uninitialized values) are just a few of the many keywords and functions available to increase the readability of your code. Other notable features include built-in support for arbitrary precision arithmetic, complex numbers, Dictionaries, Sets, BitStrings, BitArrays, and Pipelines. I have just scratched the surface here. The extensive built-in functionality is one of Julia’s best features, with many more easily available and well-documented modules.
4. Multiple Dispatch
This is a central core feature of the Julia programming language. Multiple dispatch or the multimethod functionality basically means that functions can be called dynamically at run-time to behave in different ways depending upon more than just the arguments passed (which is called function overloading) and can instead vary upon the objects being passed into it dynamically at runtime. This concept is fundamentally linked to the Strategy and Template design patterns in object-oriented terminology. We can vary the behaviour of the program depending upon any attribute as well as the calling object and the objects being passed into the function at runtime. This is one of the killer features of Julia as a language. It is possible to pass any object to a function and thus dynamically vary its behaviour depending upon any minute variation in its implementation, and in many cases, the standard library is built around this concept. Some research statistics indicate that the average lines of boilerplate code required in a complex system is reduced by a factor of three or more (!) by this language design choice. This simplifies development in highly complex software systems considerably and cleanly, without any unpredictable or erratic behaviour. It is, therefore, no surprise that Julia is receiving the acclaim now that it richly deserves.
5. Distributed and Parallel Computing Support
Julia supports parallel and distributed computing using multiple topologies transparently. There is support for coroutines, like in the Go programming language, which are helper functions that operate in parallel on multicore architectures. There is extensive support for threads and synchronization primitives have been carefully designed to maximize performance and minimize the risk of race conditions. Through simple text macros and decorators/annotations like @Parallel, any programmer can write code that executes in parallel. OpenMP support is also present for executing parallel code with compiler directives and macros. From the beginning of its existence, Julia was a language designed to support high-performance computing with multiple worker processes executing simultaneously. And this makes it much easier to use for parallel computing than (say) Python, which has always had the GIL (Global Interpreter Lock for threads) as a serious performance problem.
6. Interoperation with other programming languages (C, Java, Python, etc)
Julia can call C, Go, Java, MATLAB, R, and Python code using native wrapper functions – in fact, every commonly used programming language today has interoperability support with Julia. This makes working with other languages infinitely easier. The only major difference is that in Julia, array indexes begin at 1 (like R) whereas in many other languages it begins at 0 (C, C++, Java, Python, C#, and many more). The interoperability support makes life as a Julia developer much simpler and easier than if you were working in Python or C++/Java. In the real world, calling other libraries and packages is a part of the data scientist’s daily routine – part of a developer’s daily routine. Open source packages are at an advantage here since they can be modified as required for your problem domain. Interoperability with all the major programming languages is one of Julia’s strongest selling points.
Current Weaknesses
Julia is still developing despite the 1.0 release. Performance, the big selling point, is affected badly by using global variables and the first run of a program is always slow, compared to execution following the first. The core developers of the Julia language need to work on its package management system, and critically, migration of all the third-party libraries to version 1.0. Since this is something that will happen over time with active community participation (kind of like the Python 2.7 to 3.6 migration, but with much less difficulty), this is an issue that will resolve itself as time goes by. Also, there needs to be more consistency in the language for performance on older devices. Legacy systems without quality GPUs could find difficulties in running Julia with just CPU processing power. (GPUs are not required, but are highly recommended to run Julia and for the developer to be optimally productive, especially for deep learning libraries like Knet (Julia library for neural networks and deep learning)). The inclusion of support for Unicode 11.0 can be a surprise for those not accustomed to it, and string manipulation can be a headache. Because Julia needs to mature a little more as a language, you must also be sure to use a profiler to identify and handle possible performance bottlenecks in your code.
Conclusion
So you might ask yourself a question – if you’re a company running courses in R and Python, why would you publish an article that advocates another language for data science? Simple. If you’re new to data science and programming, learning Python and R is the best way to begin your exploration into the current industry immediately (it’s rather like learning C before C++). And Python and R are not going anywhere anytime soon. Since there is both a massive codebase and a ton of existing frameworks and production code that run on Python and to a lesser level, on R, the demand for data scientists who are skilled in Python will extend far into the future. So yes, it is fully worth your time to learn both Python and R. Furthermore, Julia has yet to be adopted industry-wide (although there are many Julia evangelists who promote it actively – e.g. see www.stochasticlifestyle.com). For at least ten years more, expect Python to be the main player as far as data science is concerned.
Furthermore, programming languages never fully die out. COBOL was designed by a committee led by Grace Hopper in 1959 and is still the go-to language for mainframe programming fifty years later. Considering that, expect the ubiquitously used Python and R to be market forces for at least two decades more. Even if more than 70% of the data science community turned to Julia as the first choice for data science, the existing codebase in Python and R will not disappear any time soon. Julia also requires more maturity as a language (as already mentioned) – some functions run slower than Python where the implementation has not been optimal and well tested, especially on older devices. Hence, Julia’s maturity as a programming language could still be a year or two away. So – reading this article, realize that Python and R are very much prerequisite skills in becoming a professional data scientist right now. But – do explore Julia a little in your free time. All the best!
For further information on choosing data science as a career, I highly recommend the following articles:
Cheers!
Your method of explaining everything in this paragraph is really nice,
all can simply be aware of it, Thanks a lot.
A caveat. The first time I executed Julia code on my Dell Inspiron 3542, it took 1300-odd seconds to execute (approximately 21 minutes). I read the Julia documentation and did some very basic optimization:
1. Convert disk reads to memory access
2. Ban all global variables, all variables strictly inside functions or passed as parameters
Then I ran the same code around twenty-thirty times, making minor changes. Eventually, after around 20-30-odd runs, the initial time of 1300 seconds became 64.77 seconds to go through the entire training and test set of the MNIST dataset. (60k + 10k = 70,000 records). Not only is it faster than Python’s scikit-learn implementation (2x+), it is also more accurate, scoring 95% accuracy as compared to Python’s 80% accuracy with scikit-learn. For more information on how to run Julia code for high performance, please visit the following page:
https://docs.julialang.org/en/v1/manual/performance-tips/index.html
Julia is quite a powerful tool.
Hello Thomas,
What Julia package did u use that is faster /more accurate than Python’s scikit-learn implementation?
Thanks
Hi,
I used a custom-made implementation of an MLP hand-coded in Julia. I made modifications that improved the base code in Julia. The code I started with was from a GitHub repository that has now been taken down. It was Not a package – it was hand-written. I think that could be why it was faster. 🙂