__Everything you need to Know about Supervised Learning__

__Everything you need to Know about Supervised Learning__

*Supervised Learning is the Machine Learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.*

The evolution of Artificial Intelligence has changed
the entire world in terms of technology. It’s advancement are quicker than we
predicted with such exponential growth in Artificial Intelligence, Machine
Learning is becoming the most trending field of this century, it is starting to
redefine the way we live and the way we see the world.

So before diving into the

**Supervised Learning**directly, let’s briefly explore the concept of Machine Learning.###
**What is Machine Learning?**

- Machine Learning is a method of data analysis that
automates analytical model building. Using algorithms that iteratively learn
from data, Machine Learning allows computers to find hidden insights without
being explicitly programmed.

We can also say that Machine Learning is a subset of
Artificial Intelligence that focuses on getting Machines to make decisions by
feeding them data.

So now we came to know what is Machine Learning.

###
**What are the Types of Machine Learning?**

**-**Basically Machine Learning is divided into three parts.

###
**What is Unsupervised Learning?**

- Unsupervised Learning is a type of Machine Learning that looks for
previously undetected patterns in a dataset with no pre-existing labels and
minimum of human supervision. The Unsupervised Learning algorithms are not
provided any “Answers” to learn from; it must make sense of the data just by observations.

Go here and check out...Top 7 Essential Points about Unsupervised
Learning Every Artificial Intelligence developer should know.

###
**What is Reinforcement Learning?**

- Reinforcement Learning is a part of Machine Learning where an

*Agent*learns to behave in an environment by performing actions and seeing the results.
It basically performs an action and it either gets rewarded on the
actions or it gets punishments. Reinforcement Learning is all about taking appropriate action in order to maximize the rewards in a particular situation.

Reinforcement Learning is all about an

*Agent*who has been put in an unknown environment and it is going to use hit and run method in order to figure out the environment and come up with an appropriate outcome.
Now let’s discuss our today's topics which are

**Supervised Learning**.
Let’s have a look at our Agenda :

### (i) What is Supervised Learning?

### (ii) Supervised Learning Process

### (iii) Supervised Learning and it’s types

### (iv) Classification vs Regression

### (v) Supervised Learning vs Unsupervised Learning

### (vi) Supervised Learning Examples in Real Life

### (vii) Advantages of Supervised Learning

### (viii) Disadvantages of Supervised Learning

### (ix) Supervised Learning Example in Python

###
(i) __What is Supervised Learning?__

- Supervised Learning model has sets of input variables

- Supervised Learning algorithms are trained using labeled examples, such as an input where the desired output is known. That means within our dataset we are going to have some historical features with historical labels.

**(x)**and an output variable**(y)**. An algorithm identifies the mapping function between**(x)**and**(y)**variables. The relationship is**y=f(x)**.- Supervised Learning algorithms are trained using labeled examples, such as an input where the desired output is known. That means within our dataset we are going to have some historical features with historical labels.

- Let’s say for example we are creating an E-mail spam detection system. Now we will provide information such as a segment of text which has a category label. So we take a bunch of previous E-mails and someone has already gone by those E-mails and classified them using the correct labels as spam or ham

*(Non-Spam)*. So there were some E-mails which are were classified as the spam or ham

*(Non-Spam)*so the idea would be for the future text information such as future E-mails using the historical labeled data and Machine Learning algorithm which will learn using historical data in order to predict the new data whether it is a spam or ham.

Supervised Learning is commonly used in applications where
historical data predicts likely future events.

###
(ii) __Supervised Learning Process in Machine Learning__

So let’s go in and go through this step by step

*Get data and it actually depends on what domain you are working in
from where this data actually comes from. This data can come from your
customer's feedback, research or it would have been collected from some online
database, etc. So at some point the data has to be acquired.

*Once we actually acquire the data then we need to clean and format the data and often we do this using

**Pandas**.

*Then we split the data into Training data and Testing data. What we do here is, we take some portion of our data like 30% to make it our test data, and then we will take the remaining data which is 70% as our training data. Now what we are going to do here is we are going to use that specific training set on our machine learning model in order to fit a model to that training data.

*Then we want to know how our model performs. So then we run that test data through the model and compare the model predictions to the actual correct label that the test data have because remember we actually know the correct label for the test data. So we can run that test data features through the model, get our models predictions and compare it to the correct answer and then we can evaluate the model and then maybe you want to go back based off that performance and adjust the model parameters such as decreasing the amount of test size to 20% or 25%.

*And once we are satisfied by this we can then deploy the model to the real world.

###
(iii) __Supervised Learning and it’s types__

As we know from earlier Supervised Learning algorithms are trained
using labeled examples, such as an input where the desired output is known.

####
**Types of Supervised Learning :**

Supervised Learning is divided into 2 types.

#### (i) Classification :

- Classification is a process
of categorizing a given set of data into classes, It can be performed on both
structured or unstructured data. The process starts with predicting the class
of given data points. The classes are often referred to as target, label, or
categories. The classification predictive modeling is a task of approximating
the mapping function from input variables to discrete output variables and the
main goal is to identify which class or category the new data will fall into.

To simplify this, Let’s look at some examples: Is that email spam or not, Does a patient is suffering from heart diseases or not, etc.

Classification problems can be solved with a numerous amount of
algorithms. Any algorithm you choose it actually depends on the situation. Here
are some of the popular classification algorithms.

**K-Nearest Neighbor*

**Random Forest*

#### (ii) Regression :

- Regression Analysis is a

**predictive modeling**technique. It estimates the relationship between a dependent variable we can also call it as**target**& an independent variable which is also known as a**predictor**.
Regression algorithm goal is to predict the continuous number such
as sales, price, scores, etc. The equation for basic linear regression can be
also written as:

*y = m*x+c*

Where;

**: Dependent variable**

*y***: Independent variable**

*x***: y-Intercept**

*c***: Co-efficient**

*m*
For simple regression
problems such as this, the model predictions are represented by the line of
best fit. For models using two independent variables, the plane will be used.
Finally, for a model using more than two independent variables, a hyperplane will
be used.

To simplify this, Let’s look at some examples: Predicting house
prices, predicting student scores, etc.

There are many different types of Regression Algorithms. The three
most common are listed down below :

**Polynomial Regression*

###
(iv) __Regression vs Classification__

Let’s see the Difference between Regression and Classification.

###
(v) __Supervised Learning vs Unsupervised Learning__

Now Let’s see the difference between Supervised Learning and
Unsupervised Learning.

If you didn’t have knowledge of Unsupervised Learning don’t worry
just check out this article of ours.

###
(vi) __Supervised Learning Examples in Real Life__

So as we saw that there are 2 types of supervised learning, let’s
see their implementation in the real world.

#### (a) Real-world implementation of Supervised Learning with respect to Classification :

- Rating of a product or a movie.

- Fraud detection.

- Spam or not spam.

- Risk analysis.

#### (b) Real-world implementation of Supervised Learning with respect to Regression :

- House price prediction.

- The amount of Fraud.

- Probability of Risk.

- Detecting the increase or decrease of Crime Rates.

###
(vii) __Advantages of Supervised Learning__

* You can get
very

**specific**about the definition of the classes, which means that you can train the classifier in a way that has a**perfect decision boundary**to distinguish different classes accurately.
* You can specifically determine

**how many classes**you want to have.
* After training, you don’t necessarily need to keep the
training examples in a memory. You can keep the

**decision boundary as a mathematical formula**and that would be enough for classifying future inputs.###
(viii) __Disadvantages of Supervised Learning__

* Your decision
boundary

**might be overtrained**. This means that if your training set is not including some examples that you want to have in a class when you use those examples after training, you might not get the correct class label.
* When this an input which is not from any of the classes in
reality, then

**it might get a wrong class label**after classification.
* You have to select a lot of good examples from each class
while you are training the classifier. If you consider the classification of

**big data**that can be a real challenge.
* Training needs a lot of

**computation time**, so do the classification.
* You might need to use a
cloud and leave the training algorithm work overnight or nights before
obtaining a good decision boundary model.

###
(ix) __Supervised Learning Example in Python__

Let’s see Supervised Learning Examples where we will be using python. We will work on 2 examples, the first example will be related to Regression
and the second example will be related to the classification.

- Here we will be using Linear Regression to predict the house prices in Boston. If you want to know more about Linear Regression do check out this link and continue with this example.

Q.) Predict Boston House Prices Using Python and Linear Regression

#### (a) Supervised Learning (Regression)

- Here we will be using Linear Regression to predict the house prices in Boston. If you want to know more about Linear Regression do check out this link and continue with this example.

*Linear Learning*Q.) Predict Boston House Prices Using Python and Linear Regression

# Importing Essential Libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

#Importing Data set

from sklearn.datasets import load_boston

bos_data = load_boston()

bos_data = load_boston()

# Initializing the dataframe

data = pd.DataFrame(bos_data.data)

data.head()

data.head()

**Output:**

# Adding the feature names to the dataframe

data.columns = bos_data.feature_names

data.head()

data.head()

**Output:**

**CRIM**per capita crime rate by town

**ZN**proportion of residential land zoned for lots over 25,000 sq.ft.

**INDUS**proportion of non-retail business acres per town

**CHAS**Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

**NOX**nitric oxides concentration (parts per 10 million)

**RM**average number of rooms per dwelling

**AGE**proportion of owner-occupied units built prior to 1940

**DIS**weighted distances to five Boston employment centers

**RAD**index of accessibility to radial highways

**TAX**full-value property-tax rate per 10,000usd

**PTRATIO**pupil-teacher ratio by town

**B**1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

**LSTAT**% lower status of the population

# Checking the shape of the dataset

data.shape

**Output:**

(506, 13)

# Overall Description of the dataset

data.describe().transpose()

**Output:**

# Adding a target variable to the data frame

# Median value of owner-occupied homes in $1000s

data['PRICE'] = bos_data.target

# Finding out the correlation between the features

corr = data.corr()

corr.shape

corr.shape

**Output:**

(14, 14)

# Plotting the heatmap of correlation between features

plt.figure(figsize=(16,10))

sns.heatmap(corr, cbar=True, annot=True, cmap="BuPu")

plt.ylim(15,0)

sns.heatmap(corr, cbar=True, annot=True, cmap="BuPu")

plt.ylim(15,0)

**Output:**

# Splitting target variable and independent variables

X = data.drop(['PRICE'], axis = 1)

y = data['PRICE']

y = data['PRICE']

# Splitting to training and testing data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)

# Import library for Linear Regression

# Create a Linear regressor

# Train the model using the training sets

from sklearn.linear_model import LinearRegression

model = LinearRegression())

model.fit(X_train, y_train)

model = LinearRegression())

model.fit(X_train, y_train)

**Output:**

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

# Value of y-intercept

model.intercept_

**Output:**

32.1816401311959

# Value of Co-efficient

model.coef_

**Output:**

array([-1.09214028e-01, 4.57894028e-02, 4.59321522e-02, 3.40092338e+00,

-1.79347006e+01, 4.21895547e+00, -8.53459799e-03, -1.43742353e+00,

3.09455894e-01, -1.23217028e-02, -8.44931385e-01, 8.04204265e-03,

-4.88716167e-01])

# Model prediction on train data

y_pred = model.predict(X_test)

# Check the model performance and accuracy using Mean Squared Error

from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_test,y_pred))

print(mean_squared_error(y_test,y_pred))

**Output:**

23.9922269542709

So this was the example of predicting house prices in Boston using Linear Regression which comes under the concept of Regression.

#### (b) Supervised Learning (Classification)

- A support vector machine is a popular classification algorithm. Basically the way support vector machine works is it draws a hyperplane in n dimension space such that it maximizes the margin between classification groups.

*Support Vector Machine*

*Q.) SVM on Breast Cancer Dataset*

# Importing Essential Data Set

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

# Import Dataset

from sklearn import datasets

cancer_data = datasets.load_breast_cancer()

cancer_data = datasets.load_breast_cancer()

# Creating Train & Test Data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(cancer_data.data, cancer_data.target, test_size=0.2)

X_train, X_test, y_train, y_test = train_test_split(cancer_data.data, cancer_data.target, test_size=0.2)

# Import Support Vector Machine

from sklearn import svm

cls = svm.SVC(kernel='linear')

cls = svm.SVC(kernel='linear')

# Training the model

cls.fit(X_train,y_train)

# Predict the model

from sklearn import metrics

pred = cls.predict(X_test)

print("Accuracy :", metrics.accuracy_score(y_test,y_pred = pred))

pred = cls.predict(X_test)

print("Accuracy :", metrics.accuracy_score(y_test,y_pred = pred))

**Output:**

**Accuracy : 0.9473684210526315**

# Precision score

print("Precision :", metrics.precision_score(y_test,pred))

**Output:**

**Precision : 0.9506172839506173**

# Recall score

print("Recall :", metrics.recall_score(y_test,pred))

**Output:**

**Recall : 0.9746835443037974**

# Classification Report

from sklearn.metrics import classification_report

print("Classification Report :",classification_report(y_test,y_pred=pred))

print("Classification Report :",classification_report(y_test,y_pred=pred))

Top 9 Essential Concept Of Supervised Learning That Will Surely Make You An Expert Instantly
Reviewed by InfinityCodeX
on
Monday, May 04, 2020
Rating:

## No comments:

No Spamming and No Offensive Language