InfinityCodeX


Check out our blogs where we cover topics such as Python, Data Science, Machine Learning, Deep Learning. A Best place to start your AI career for beginner, intermediate peoples.


Everything you need to Know about Supervised Learning



 Supervised Learning is the Machine Learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.



The evolution of Artificial Intelligence has changed the entire world in terms of technology. It’s advancement are quicker than we predicted with such exponential growth in Artificial Intelligence, Machine Learning is becoming the most trending field of this century, it is starting to redefine the way we live and the way we see the world.

So before diving into the Supervised Learning directly, let’s briefly explore the concept of Machine Learning.

What is Machine Learning?

- Machine Learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, Machine Learning allows computers to find hidden insights without being explicitly programmed.
We can also say that Machine Learning is a subset of Artificial Intelligence that focuses on getting Machines to make decisions by feeding them data.


difference_between_AI_ML_DL


So now we came to know what is Machine Learning.

What are the Types of Machine Learning?

Basically Machine Learning is divided into three parts.

Types_of_ML


What is Unsupervised Learning?

- Unsupervised Learning is a type of Machine Learning that looks for previously undetected patterns in a dataset with no pre-existing labels and minimum of human supervision. The Unsupervised Learning algorithms are not provided any “Answers” to learn from; it must make sense of the data just by observations.

Go here and check out...Top 7 Essential Points about Unsupervised Learning Every Artificial Intelligence developer should know.


What is Reinforcement Learning?

- Reinforcement Learning is a part of Machine Learning where an Agent learns to behave in an environment by performing actions and seeing the results.
It basically performs an action and it either gets rewarded on the actions or it gets punishments. Reinforcement Learning is all about taking appropriate action in order to maximize the rewards in a particular situation.
Reinforcement Learning is all about an Agent who has been put in an unknown environment and it is going to use hit and run method in order to figure out the environment and come up with an appropriate outcome.

reinforcement_learning


Now let’s discuss our today's topics which are Supervised Learning.
Let’s have a look at our Agenda :

(i) What is Supervised Learning?

(ii) Supervised Learning Process

(iii) Supervised Learning and it’s types

(iv) Classification vs Regression

(v) Supervised Learning vs Unsupervised Learning

(vi) Supervised Learning Examples in Real Life

(vii) Advantages of Supervised Learning

(viii) Disadvantages of Supervised Learning

(ix) Supervised Learning Example in Python



(i) What is Supervised Learning?


Supervised Learning model has sets of input variables (x) and an output variable (y). An algorithm identifies the mapping function between (x) and (y) variables. The relationship is y=f(x).

- Supervised Learning algorithms are trained using labeled examples, such as an input where the desired output is known. That means within our dataset we are going to have some historical features with historical labels.

- Let’s say for example we are creating an E-mail spam detection system. Now we will provide information such as a segment of text which has a category label. So we take a bunch of previous E-mails and someone has already gone by those E-mails and classified them using the correct labels as spam or ham (Non-Spam). So there were some E-mails which are were classified as the spam or ham (Non-Spam) so the idea would be for the future text information such as future E-mails using the historical labeled data and Machine Learning algorithm which will learn using historical data in order to predict the new data whether it is a spam or ham.
Supervised Learning is commonly used in applications where historical data predicts likely future events.

(ii) Supervised Learning Process in Machine Learning


supervised_learning_process


So let’s go in and go through this step by step

*Get data and it actually depends on what domain you are working in from where this data actually comes from. This data can come from your customer's feedback, research or it would have been collected from some online database, etc. So at some point the data has to be acquired.

*Once we actually acquire the data then we need to clean and format the data and often we do this using Pandas.

*Then we split the data into Training data and Testing data. What we do here is, we take some portion of our data like 30% to make it our test data, and then we will take the remaining data which is 70% as our training data. Now what we are going to do here is we are going to use that specific training set on our machine learning model in order to fit a model to that training data.

*Then we want to know how our model performs. So then we run that test data through the model and compare the model predictions to the actual correct label that the test data have because remember we actually know the correct label for the test data. So we can run that test data features through the model, get our models predictions and compare it to the correct answer and then we can evaluate the model and then maybe you want to go back based off that performance and adjust the model parameters such as decreasing the amount of test size to 20% or 25%.

*And once we are satisfied by this we can then deploy the model to the real world.

(iii) Supervised Learning and it’s types


As we know from earlier Supervised Learning algorithms are trained using labeled examples, such as an input where the desired output is known.

Types of Supervised Learning :


Types_of_supervised_learning


Supervised Learning is divided into 2 types.

(i) Classification :

- Classification is a process of categorizing a given set of data into classes, It can be performed on both structured or unstructured data. The process starts with predicting the class of given data points. The classes are often referred to as target, label, or categories. The classification predictive modeling is a task of approximating the mapping function from input variables to discrete output variables and the main goal is to identify which class or category the new data will fall into.
To simplify this, Let’s look at some examples: Is that email spam or not, Does a patient is suffering from heart diseases or not, etc.

Classification_in_supervised_learning


Classification problems can be solved with a numerous amount of algorithms. Any algorithm you choose it actually depends on the situation. Here are some of the popular classification algorithms.

*K-Nearest Neighbor
*Random Forest

(ii) Regression :

- Regression Analysis is a predictive modeling technique. It estimates the relationship between a dependent variable we can also call it as target & an independent variable which is also known as a predictor.
Regression algorithm goal is to predict the continuous number such as sales, price, scores, etc. The equation for basic linear regression can be also written as:

y = m*x+c
Where;
y : Dependent variable
x : Independent variable
c : y-Intercept
m : Co-efficient

For simple regression problems such as this, the model predictions are represented by the line of best fit. For models using two independent variables, the plane will be used. Finally, for a model using more than two independent variables, a hyperplane will be used.
To simplify this, Let’s look at some examples: Predicting house prices, predicting student scores, etc.

Regression_in_supervised_learning


There are many different types of Regression Algorithms. The three most common are listed down below :

*Polynomial Regression

(iv) Regression vs Classification


Let’s see the Difference between Regression and Classification.

Regression_vs_Classification


(v) Supervised Learning vs Unsupervised Learning


Now Let’s see the difference between Supervised Learning and Unsupervised Learning.
If you didn’t have knowledge of Unsupervised Learning don’t worry just check out this article of ours.


Supervised_vs_Unsupervised_Learning


(vi) Supervised Learning Examples in Real Life


So as we saw that there are 2 types of supervised learning, let’s see their implementation in the real world.

(a) Real-world implementation of Supervised Learning with respect to Classification :


- Rating of a product or a movie.
- Fraud detection.
- Spam or not spam.
- Risk analysis.

(b) Real-world implementation of Supervised Learning with respect to Regression :


- House price prediction.
- The amount of Fraud.
- Probability of Risk.
- Detecting the increase or decrease of Crime Rates.

(vii) Advantages of Supervised Learning


* You can get very specific about the definition of the classes, which means that you can train the classifier in a way that has a perfect decision boundary to distinguish different classes accurately.
* You can specifically determine how many classes you want to have.
* After training, you don’t necessarily need to keep the training examples in a memory. You can keep the decision boundary as a mathematical formula and that would be enough for classifying future inputs.

(viii) Disadvantages of Supervised Learning


* Your decision boundary might be overtrained. This means that if your training set is not including some examples that you want to have in a class when you use those examples after training, you might not get the correct class label.
* When this an input which is not from any of the classes in reality, then it might get a wrong class label after classification.
* You have to select a lot of good examples from each class while you are training the classifier. If you consider the classification of big data that can be a real challenge.
* Training needs a lot of computation time, so do the classification.
* You might need to use a cloud and leave the training algorithm work overnight or nights before obtaining a good decision boundary model.


(ix) Supervised Learning Example in Python


Let’s see Supervised Learning Examples where we will be using python. We will work on 2 examples, the first example will be related to Regression and the second example will be related to the classification.



(a) Supervised Learning (Regression)



- Here we will be using Linear Regression to predict the house prices in Boston. If you want to know more about Linear Regression do check out this link and continue with this example.


                                               Linear Learning


Q.) Predict Boston House Prices Using Python and Linear Regression



# Importing Essential Libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#Importing Data set


from sklearn.datasets import load_boston
bos_data = load_boston()

# Initializing the dataframe


data = pd.DataFrame(bos_data.data)
data.head()

Output:


head_data


# Adding the feature names to the dataframe


data.columns = bos_data.feature_names
data.head()

Output:


labeled_data



CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centers
RAD index of accessibility to radial highways
TAX full-value property-tax rate per 10,000usd
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population



# Checking the shape of the dataset


data.shape

Output:

(506, 13)



# Overall Description of the dataset


data.describe().transpose()

Output:


data_transpose



# Adding a target variable to the data frame
# Median value of owner-occupied homes in $1000s



data['PRICE'] = bos_data.target

# Finding out the correlation between the features


corr = data.corr()
corr.shape

Output:

(14, 14)



# Plotting the heatmap of correlation between features


plt.figure(figsize=(16,10))
sns.heatmap(corr, cbar=True, annot=True, cmap="BuPu")
plt.ylim(15,0)

Output:


heat_map

# Splitting target variable and independent variables


X = data.drop(['PRICE'], axis = 1)
y = data['PRICE']

# Splitting to training and testing data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)

# Import library for Linear Regression
# Create a Linear regressor
# Train the model using the training sets



from sklearn.linear_model import LinearRegression
model = LinearRegression())
model.fit(X_train, y_train)

Output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)



# Value of y-intercept


model.intercept_

Output:

32.1816401311959



# Value of Co-efficient


model.coef_

Output:

array([-1.09214028e-01, 4.57894028e-02, 4.59321522e-02, 3.40092338e+00,
-1.79347006e+01, 4.21895547e+00, -8.53459799e-03, -1.43742353e+00,
3.09455894e-01, -1.23217028e-02, -8.44931385e-01, 8.04204265e-03,
-4.88716167e-01])



# Model prediction on train data


y_pred = model.predict(X_test)

# Check the model performance and accuracy using Mean Squared Error


from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test,y_pred))

Output:

23.9922269542709



So this was the example of predicting house prices in Boston using Linear Regression which comes under the concept of Regression.


(b) Supervised Learning (Classification)


- A support vector machine is a popular classification algorithm. Basically the way support vector machine works is it draws a hyperplane in n dimension space such that it maximizes the margin between classification groups.


- Here we will be using Support Vector Machine (SVM) to predict the Breast Cancer. If you want to know more about Support Vector Machine do check out this link and continue with this example.

                                            Support Vector Machine


Q.) SVM on Breast Cancer Dataset 



# Importing Essential Data Set


import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


# Import Dataset


from sklearn import datasets

cancer_data = datasets.load_breast_cancer()


# Creating Train & Test Data


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(cancer_data.data, cancer_data.target, test_size=0.2)


# Import Support Vector Machine


from sklearn import svm

cls = svm.SVC(kernel='linear')


# Training the model


cls.fit(X_train,y_train)

Output:




# Predict the model

from sklearn import metrics
pred = cls.predict(X_test)
print("Accuracy :", metrics.accuracy_score(y_test,y_pred = pred))

Output: 


Accuracy : 0.9473684210526315


# Precision score


print("Precision :", metrics.precision_score(y_test,pred))

Output: 


Precision : 0.9506172839506173


# Recall score


print("Recall :", metrics.recall_score(y_test,pred))

Output: 


Recall : 0.9746835443037974


# Classification Report


from sklearn.metrics import classification_report
print("Classification Report :",classification_report(y_test,y_pred=pred))

Output:




classification_report


Hey Guys! So this is Supervised Learning for you. If you have any doubts please leave a comment below & make sure you share this with your friends.

No comments:

No Spamming and No Offensive Language

Powered by Blogger.