## Introduction to Support Vector Machine

code{white-space: pre;}

pre:not([class]) {

background-color: white;

}if (window.hljs && document.readyState && document.readyState === “complete”) {

window.setTimeout(function() {

hljs.initHighlighting();

}, 0);

}h1 {

font-size: 34px;

}

h1.title {

font-size: 38px;

}

h2 {

font-size: 30px;

}

h3 {

font-size: 24px;

}

h4 {

font-size: 18px;

}

h5 {

font-size: 16px;

}

h6 {

font-size: 12px;

}

.table th:not([align]) {

text-align: left;

}

.main-container {

max-width: 940px;

margin-left: auto;

margin-right: auto;

}

code {

color: inherit;

background-color: rgba(0, 0, 0, 0.04);

}

img {

max-width:100%;

height: auto;

}

.tabbed-pane {

padding-top: 12px;

}

button.code-folding-btn:focus {

outline: none;

}$(document).ready(function () {

window.buildTabsets(“TOC”);

});

`Machine learning`

is a new buzz in the industry. It has a wide range of applications which makes this field a lot more competitive. Staying in the competition requires you to have a sound knowledge of the existing and an intuition for the non-existing. Well, it’s relieving that getting familiar with the existing is not that difficult given the right strategy. Climbing up the ladder step by step is the best way to reach the sky.

Mastering`data analytics`

is not that difficult and that mathematical either. You do not need a PhD to understand the fancier ML algorithms (Though inventing a new one might ask you for it). Most of us start out with regression and climb our way up. There is a quote, “Abundant data generally belittles the importance of algorithm”. But we are not always blessed with the abundance. So, we need to have a good knowledge of all the tools and an intuitive sense for their applicability. This post aims at explaining one more such tool,Support Vector Machine.

## Table of contents

- What is SVM?
- How does it work?
- Implementation in R.
- Pros and Cons?
- Applications

## What is SVM?

A Support Vector Machine is a yet another supervised machine learning algorithm. It can be used for both regression and classification purposes. But SVMs are more commonly used in classification problems (This post will focus only on classification). Support Vector machine is also commonly known as “Large Margin Classifier”.

## How does it work?

Support Vectors and Hyperplane

Before diving deep, let’s first undertand “What is a Hyperplane?”. A hyperplane is a flat subspace having dimensions one less than the dimensions of co-ordinate system it is represented in.

In a 2-D space, hyperplane is a line of the form \(A_0\) + \(A_1\)\(X_1\) + \(A_2\)\(X_2\) = 0 and in a m-D space, hyperplane is of the form \(A_0\) + \(A_1\)\(X_1\) + \(A_2\)\(X_2\) + …. + \(A_m\)\(X_m\) = 0

Support Vector machines have some special data points which we call “Support Vectors” and a separating hyperplane which is known as “Support Vector Machine”. So, essentially SVM is a frontier that best segregates the classes.

Support Vectors arethe data points nearest to the hyperplane, the points of our data set which if removed, would alter the position of the dividing hyperplane. As we can see that there can be many hyperplanes which can segregate the two classes, the hyperplane that we would choose is the one with the highest margin.

The Kernel Trick

We are not always lucky to have a dataset which is lineraly separable by a hyperplane. Fortunately, SVM is capable of fitting non-inear boundaries using a simple and elegant method known as kernel trick. In simple words, it projects the data into higher dimension where it can be separated by a hyperplane and then project back to lower dimensions.

Here, we can imagine an extra feature ‘z’ for each data point “(x,y)” where \(z^{2} = x^{2}+y^{2}\)

We have in-built kernels like rbf, poly, etc. which projects the data into higher dimensions and save us the hard work.

SVM objective

Support Vector Machine try to achieve the following two classification goals simultaneously:

- Maximize the margin (see fig)
- Correctly classify the data points.
There is a loss function which takes into account the loss due to both, ‘a diminishing margin’ and ‘in-correctly classified data point’. There are hyperparameters which can be set for a trade off between the two.

Hyperparameters in case of SVM are:

Kernel– “Linear”, “rbf” (default), “poly”, etc. “rbf” and “poly” are mainly for non- linear hyper-plane.C(error rate)– Penalty for wrongly classified data points. It controls the trade off between a smoother decision boundary and conformance to test data.Gamma– Kernel coefficient for kernels (‘rbf’, ‘poly’, etc.). Higher values results in overfitting.Note: Explaining the maths behind the algortihm is beyond the scope of this post.

Some examples of SVM classification

- A is the best hyperplane.
- Fitting non-linear boundary using Kernel trick.
- Trade off between smooth booundary and correct classification.

## Implementation in R.

Below is a sample implementation in R using the IRIS dataset.

12 #Using IRIS datasethead(iris, 3)

1234 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 0.2 setosa## 2 4.9 3.0 1.4 0.2 setosa## 3 4.7 3.2 1.3 0.2 setosa

1234 #For simplicity of visualization(2-D), let us use only two feature "Sepal.length" and "Sepal.width" for prediction of "Species"iris.part = iris[,c(1,2,5)]attach(iris.part)head(iris.part, 3)

1234 ## Sepal.Length Sepal.Width Species## 1 5.1 3.5 setosa## 2 4.9 3.0 setosa## 3 4.7 3.2 setosa

123 #Plot our data setplot(Sepal.Width, Sepal.Length, col=Species)legend(x = 3.9, y=7.5, legend = c("Setosa", "versicolor", "verginica"),fill = c('white','red','green'))

123456789 x <- subset(iris.part, select=-Species) #features to usey <- Species #feature to predict#Create a SVM Model#For simplicity, data is not splitted up into train and test sets.#In practical scenarios, split the data into training, cross validation and test datasetmodel <- svm(Species ~ ., data=iris.part)summary(model)

1234567891011121314151617181920 #### Call:## svm(formula = Species ~ ., data = iris.part)###### Parameters:## SVM-Type: C-classification## SVM-Kernel: radial## cost: 1## gamma: 0.5#### Number of Support Vectors: 86#### ( 10 40 36 )###### Number of Classes: 3#### Levels:## setosa versicolor virginica

12 #Predict the Speciesy_pred <- predict(model,x)

1234 #Tune SVM to find the best hyperparameterstune_svm <- tune(svm, train.x=x, train.y=y,kernel="radial", ranges=list(cost=10^(-2:2), gamma=c(.25,.5,1,2)))print(tune_svm)

12345678910 #### Parameter tuning of 'svm':#### - sampling method: 10-fold cross validation#### - best parameters:## cost gamma## 0.1 0.5#### - best performance: 0.2066667

123456 #After you find the best cost and gamma, you can set the best found parametersfinal_svm <- svm(Species ~ ., data=iris.part, kernel="radial", cost=1, gamma=1)#Plot the resultsplot(final_svm , iris.part)legend(x = 3.37, y=7.5, legend = c("Setosa", "versicolor", "verginica"),fill = c('white','red','green'))

1 #crosses in plot indicate support vectors.

123456 #Try changing the kernel to linearfinal_svm_linear <- svm(Species ~ ., data=iris.part, kernel="linear", cost=1, gamma=1)#Plot the resultsplot(final_svm_linear , iris.part)legend(x = 3.37, y=7.5, legend = c("Setosa", "versicolor", "verginica"),fill = c('white','red','green'))

12345678 #Try changing C and gammafinal_svm <- svm(Species ~ ., data=iris.part, kernel="radial", cost=100, gamma=100)#high C and gamma leads to overfitting#Plot the resultsplot(final_svm , iris.part)legend(x = 3.37, y=7.5, legend = c("Setosa", "versicolor", "verginica"),fill = c('white','red','green'))

I highly recommend you to play with this data set by changing kernels and trying different values of`cost`

and`gamma`

. This will increase your understanding of hyperparameter tuning.

## Pros and Cons?

## Pros:

- Easy to train as it uses only a subset of training points.
- Proven to work well on small and clean datasets.
- Solution is guaranteed to be global minima (it solves a convex quadratic problem)
- Non – linear decision boundaries can be obtained using kernel trick.
- Custom controllable parameter to find an optimal balance between error rate and high margin
- Can capture much more complex relationships between data points without having to perform difficult transformations ourselves
## Cons:

- Cannot scale well on larger datasets as training time is higher.
- Less effective for datasets with noise and classes overlapping.
- Complex data transformations and resulting boundary plane are very difficult to interpret (Black box magic).

## Applications

Support Vector Machine is a versatile algorithm and has successfully been implemented for various classification problems. Some examples are:

- Spam detection.
- Sentiment detection.
- Handwritten digits recognition
- Image processing and image recognition.

## Additional resources:

I highly recommend you to go through the links below for an in-depth understanding of the Maths behind this algorithm.

Stendra is a prescription only medicine so you can’t buy it over the counter. But if you have an issue with erectile dysfunction then please see your doctor and there are options available to help with the condition.

// add bootstrap table styles to pandoc tables

function bootstrapStylePandocTables() {

$(‘tr.header’).parent(‘thead’).parent(‘table’).addClass(‘table table-condensed’);

}

$(document).ready(function () {

bootstrapStylePandocTables();

});(function () {

var script = document.createElement(“script”);

script.type = “text/javascript”;

script.src = “https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML”;

document.getElementsByTagName(“head”)[0].appendChild(script);

})();