The field of Data Science has seen exponential growth in the last few years. Though the concept was prevalent previously as well the recent hype is the result of the variety of huge volumes of unstructured data that is getting generated across different industries and the enormous potential that’s hidden beneath those data. On top of the top, the massive computational power that modern day computers possess has made it even more possible to mine such huge chunks of data.
Now Data Science is a study which is comprised of several disciplines starting from exploratory data analysis to predictive analytics. There are various tools and techniques that professionals use to extract information from the data. However, there is a common misconception among them is to focus more on those tools rather than the math behind the data modelling. People tend to put too much importance on the Machine Learning algorithms instead of the Linear Algebra or the Probability concepts that are required to fetch relevant meaning from the data.
Thus, in this blog post, we would cover one of the pre-requisites in Data Science i.e. Linear Algebra and some of the basic concepts that you should learn.
Understanding the Linear Algebra Concepts
To comprehend the underlying theory behind Machine Learning or Deep Learning, it is necessary to have sufficient knowledge of some of the Linear Algebra concepts. You cannot master the state-of-art Machine Learning algorithms without knowing the math.
Below are some of the Linear Algebra concepts that you need to know for Machine Learning.
1. Matrix and Vectors
Arguably two of the most important concepts that you would encounter throughout your Machine Learning journey. An array of numbers is known as vectors whereas a matrix is 2-dimensional vectors which are generally expressed in uppercase.
In Machine Learning terms, a vector is the target variable in a supervised learning problem where the features form the matrix in the data. Several operations like multiplication, transformation, rank, conjugate, etc., could be performed with the matrix.
Two vectors of equal shape and with same number of elements could be added and subtracted.
2. Diagonal Matrix
A matrix whose non-diagonal elements are all zero is known as Diagonal Matrix. A diagonal matrix’s inverse is easy to find unlike a generic a matrix. Multiplication of diagonal matrices are also easier. A matrix has no inverse if it its diagonal matrix is not square.
3. Orthogonal Matrix
A matrix whose product of the transpose and the matrix itself is equal to an Identity matrix is known as orthogonal matrix. The concept of orthogonality is important in Machine Learning or specifically in Principal Component Analysis which solves the curse of dimensionality.
Orthogonal matrix are so useful because its inverse is equal to its transpose. Also if any orthogonal matrix is multiplied with a scalar term, the errors of the scalar would not be magnified. To maintain numerical stability, this is a very desirable behaviour.
4. Symmetric Matrix
One of the important concepts in Machine Learning and Linear Algebra is symmetric matrix. Matrices in Linear Algebra are often used to hold f(vi, vj). These are often symmetrical functions and the matrix corresponding to it are also symmetric. The feature distance between the data points could be measured by f and also it could calculate the covariance of features. Some of the properties of symmetric matrix are –
The inverse of a symmetric matrix is symmetrical in nature.
- There are no complex numbers in the eigenvalues. All values are real numbers.
- Even with repeated eigenvalues, n eigenvectors could be chosen of S to be orthogonal.
- Multiplying a matrix with its transpose would form the symmetric matrix.
- If the columns of a matrix are linearly independent, then the product of the matrix and its transpose is invertible in nature.
- Property of factorization is another important property of a symmetric matrix.
5. Eigenvalues and Eigenvectors –
A vector which doesn’t change its direction but only scales by magnitude of its eigenvalue is known as the eigenvector. It is one of most sought after concepts in Data Science
Av = lambda * v
Here v is the (m x 1) eigenvectors and lambda is the (m x m) eigenvalues and A is a square matrix.
The basics of computing and mathematics is formed by the eigenvalues and the eigenvectors. A vector when plotted in an XY chart has a particular direction. Applying a linear transformation on certain vectors doesn’t change its direction which makes them extremely valuable in Machine Learning.
To reduce noise in the data both the eigenvalues and eigenvectors are used. In computationally intensive tasks, the efficiency could be improved using these two. Eigenvectors and Eigenvalues could also help to reduce overfitting as it eliminates the strongly co-related features.
Both eigenvectors and eigenvalues has a broad set of usages. Image, sound or textual data which has large set of features could often be difficult to visualize as it has more than three dimensions. Transforming such data using one-hot encoding or other methods is not space efficient at all. To resolve such scenarios, Eigenvectors and Eigenvalues are used which would capture the Information stored in a large matrix. Reducing the dimensions is the key in computationally intensive tasks. This lets us to the concept of PCA which we would describe below.
In facial recognition, eigenvectors and eigenvalues are used. Data could be better understood using the eigenvectors and the eigenvalues in non-linear motion dynamics.
6. Principal Component Analysis
A Machine Learning problem often suffers from the curse of dimensionality. It means that the features of the data are in higher dimension and is highly co-related. The problem that arises as a result of this is that it gets difficult to understand how each feature influences the target variable because highly co-related features would mean the target variable is equally influenced by both the features instead of one. Another issue with higher dimensional data is that you cannot visualize it because at most you could plot a 3-D data in a plane. Thus the performance of the model could not be interpreted as well.
PCA or Principal Component Analysis is a process by which you can reduce the dimension of your data to either a 2-D or 3-D data. The reduction in dimension is done keeping the maximum variance (95% to 99%) intact so that none of the information is lost. The data is reduced from a higher dimension to two or three independent principal components.
Now the maths behind Principal Component Analysis follows the concept of orthogonality. The data from higher dimension is projected on to a lower dimension sub-space and the goal is to reduce the projected error between the data points and the lower dimension sub-space. The reduction in the projected error would ensure increase in the variance.
Once, the number of principal components is decided (say two or three), the first principal component would carry maximum variance in the day followed by the second component which would have a slightly less variance and so on. PCA is a very good technique to decrease the number of features and reduce the complexity of the model.
However, one must not use Principal Component Analysis as the first step to reduce overfitting. Overfitting is a state where the model is complex and has high variance. To reduce overfitting, you should first try to increase the amount of data or choose less number of features if possible. If that doesn’t work, then the next best option is to use L1 or L2 regularization which would penalize the co-efficient try to make the model complex. PCA should the last technique used if none of the above mentions solution works.
7. Singular Value Decomposition
A matrix factorization method used in science, technology and various other domains. Singular Value Decomposition has seen in growing importance in recent times due machine learning. Data mining, developments. The product of matrix representation is known as Matrix Factorization.
M=Unitary Matrix * Diagonal Matrix * Conjugate Transpose of Unitary Matrix
For each element, the row and column index interchanges as the result of the conjugate transpose.
In a higher dimensional raw data, singular value decomposition could be used to untangle information. To compute Principal Component Analysis, we use the Singular Value Decomposition concept in Machine Learning. Some of the applications of Singular Value Decomposition are in image processing, recommending products, and also in processing natural data using Natural Language Processing.
However, the Singular Value Decomposition differs from Principal Component Analysis by the fact that you could find diagonal of a matrix with SVD into special matrices. These matrices could be analysed and are easy to manipulate. The data could be compressed into independent components as well.
Conclusion
Machine Learning is the word of the mouth of several professionals but to master it you must know the math behind it and learn some of the Linear Algebra concepts that is used in any ML or Deep Learning project.
Dimensionless has several blogs and training to get started with Python, and Data Science in general.
Follow this link, if you are looking to learn more about data science online!
Additionally, if you are having an interest in learning Data Science, Learn online Data Science Course to boost your career in Data Science.
Furthermore, if you want to read more about data science, you can read our blogs here