# Introduction to SVM

*Raghav Aggiwal*

*February 7, 2017*

Machine learning is a new buzz in the industry. It has a wide range of applications which makes this field a lot more competitive. Staying in the competition requires you to have a sound knowledge of the existing and an intuition for the non-existing. Well, it’s relieving that getting familiar with the existing is not that difficult given the right strategy. Climbing up the ladder step by step is the best way to reach the sky.

Mastering data analytics is not that difficult and that mathematical either. You do not need a PhD to understand the fancier ML algorithms (Though inventing a new one might ask you for it). Most of us start out with regression and climb our way up. There is a quote, “Abundant data generally belittles the importance of algorithm”. But we are not always blessed with the abundance. So, we need to have a good knowledge of all the tools and an intuitive sense for their applicability. This post aims at explaining one more such tool, **Support Vector Machine**.

# Table of contents

- What is SVM?
- How does it work?
- Implementation in R.
- Pros and Cons?
- Applications

# What is SVM?

A Support Vector Machine is a yet another supervised machine learning algorithm. It can be used for both regression and classification purposes. But SVMs are more commonly used in classification problems (This post will focus only on classification). Support Vector machine is also commonly known as “Large Margin Classifier”.

# How does it work?

Support Vectors and Hyperplane

Before diving deep, let’s first undertand “What is a Hyperplane?”. A hyperplane is a flat subspace having dimensions one less than the dimensions of co-ordinate system it is represented in.

In a 2-D space, hyperplane is a line of the form \(A_0\) + \(A_1\)\(X_1\) + \(A_2\)\(X_2\) = 0 and in a m-D space, hyperplane is of the form \(A_0\) + \(A_1\)\(X_1\) + \(A_2\)\(X_2\) + …. + \(A_m\)\(X_m\) = 0

Support Vector machines have some special data points which we call “Support Vectors” and a separating hyperplane which is known as “Support Vector Machine”. So, essentially SVM is a frontier that best segregates the classes.

Support Vectors are **the data points nearest to the hyperplane, the points of our data set which if removed, would alter the position of the dividing hyperplane**. As we can see that there can be many hyperplanes which can segregate the two classes, the hyperplane that we would choose is the one with the highest margin.

The Kernel Trick

We are not always lucky to have a dataset which is lineraly separable by a hyperplane. Fortunately, SVM is capable of fitting non-inear boundaries using a simple and elegant method known as kernel trick. In simple words, it projects the data into higher dimension where it can be separated by a hyperplane and then project back to lower dimensions.

Here, we can imagine an extra feature ‘z’ for each data point “(x,y)” where \(z^{2} = x^{2}+y^{2}\)

We have in-built kernels like rbf, poly, etc. which projects the data into higher dimensions and save us the hard work.

SVM objective

Support Vector Machine try to achieve the following two classification goals simultaneously:

- Maximize the margin (see fig)
- Correctly classify the data points.

There is a loss function which takes into account the loss due to both, ‘a diminishing margin’ and ‘in-correctly classified data point’. There are hyperparameters which can be set for a trade off between the two.

Hyperparameters in case of SVM are:

**Kernel**– “Linear”, “rbf” (default), “poly”, etc. “rbf” and “poly” are mainly for non- linear hyper-plane.**C(error rate)**– Penalty for wrongly classified data points. It controls the trade off between a smoother decision boundary and conformance to test data.**Gamma**– Kernel coefficient for kernels (‘rbf’, ‘poly’, etc.). Higher values results in overfitting.

Note: Explaining the maths behind the algortihm is beyond the scope of this post.

Some examples of SVM classification

- A is the best hyperplane.
- Fitting non-linear boundary using Kernel trick.
- Trade off between smooth booundary and correct classification.