Logistic Regression in Machine Learning
Q.1) What is Logistic Regression?
Logistic Regression is the statistical classification model which
deals with categorical dependent variables or we can say it
deals with the data which give output in a binary format like
True/False, 0/1, Yes/No, etc. For example, is the patient suffering from coronavirus
or not?
Logistic Regression is used to describe data in binary format & to
explain the relationship between the dependent variable & multiple independent
variables. These input variables or we can say multiple independent variables
can be discrete, continuous, nominal, ordinal, interval
or ratio-level variables.
Logistic Regression is a predictive analysis. Which means it will
predict the output based on the given input data set.
Q.2) Why is Logistic Regression called Logistic Regression?
Let’s break the term Logistic Regression into 2 parts :
1.
Logistic
2.
Regression
Now let us understand both the term one by one :
1. Logistic: What is Logistic?
The term Logistic means the detailed organization and implementation
of complex operations.
Logistic it is also defined as the Art and Science of obtaining, producing
& distributing material & product in a proper place & in a
proper quantity.
2. Regression: What is Regression?
Regression Analysis is a predictive modeling technique. It
estimates the relationship between a dependent variable we can also call it as target
& an independent variable which is also known as a predictor.
Now as we know the meaning of both the terms Logistic & Regression.
We can combine them & relate it together The detailed organization &
implementation of the complex operations in a predictive modeling technique
which estimates the relationship between independent & dependent variables
& gives output in a binary format is known as Logistic Regression.
Q.3) What is the purpose of Logistic Regression?
Logistic Regression produces results in a binary format that is used
to predict the outcome of categorical dependent variables. So the outcome
should be discrete/categorical in nature such as :
Logistic Regression can be used as the tool of applied statistic &
discrete data analysis.
The outcome which is given by Logistic Regression is in terms of
probability.
Logistic Regression helps in classifying the given data.
Q.4) Why do we use Logistic Regression instead of Linear Regression?
In Linear Regression the value of the dependent variable or we
can say that the value which we want to predict is in a range.
But in Logistic Regression there can be only 2 values either 0 i.e
False or 1 i.e True.
So in the Logistic Regression, we don't need the value below 0
& above 1. So we will tilt the line which is going beyond the range and
make it constant at 0 or 1.
Now once it gets formed into the equation it looks like this :
This is nothing but the ‘S’ curve or Sigmoid function or Sigmoid
function curve. This Sigmoid function converts any value from -∞ to
∞ your discrete values which are either 0 or 1.
Now let’s say that I got a value which is at location 0.65 now where do
I put it. So at this point, the concept of Threshold Value comes which
basically divides the graph in 2 half.
The Threshold Value basically indicates the probability of winning i.e
1 or losing i.e 0.
To check where my given value is 1 or 0 just compare it with Threshold
Value if the value is greater than the Threshold Value then it's 1 &
if it is less than the Threshold Value than its false.
To create this curve we need to make an equation.
So now let us see how an equation is formed to imitate this
functionality
NOTE: The Logistic Regression Equation is Derived from straight
line.
Equation of a straight line :
y = m1*x1+m2*x2+...+mn*xn+c Range : -∞
to ∞
Let’s try to reduce the Logistic Regression Equation from
straight-line equation :
y = m1*x1+m2*x2+...+mn*xn+c
Range : 0 to 1 in Logistic Regression Equation
Now, to get the range of y between 0 & ∞, let’s transform y :
y/(1-y) } y = 0 then 0 & y = 1 when 1 Range : 0
to ∞
Let us transform it further, to get range between -∞ to ∞ :
log[y/(y-1)] => y = m1*x1+m2*x2+...+mn*xn+c } final Logistic
Regression Equation
Don’t worry guys you don't have to memorize these formulas this all math
is packed into the Sklearn library.
Q.5) What is the Difference between Linear Regression and Logistic Regression?
Linear Regression vs Logistic Regression :
Here are a few of the real-life examples which I am mentioning below &
we will do practical’s of 2 of them.
1.
Is the mail spam or not?
2.
Will it rain today or not?
3.
Did the passenger survive in Titanic or not?
✔
4.
Will that person purchase that apartment or not?
✔
5.
Do you have coronavirus or not?
Q.7) Logistic Regression in Python :
Here are the 2 examples which we will be doing:
NOTE: There are 5 steps while solving Logistic Regression Problem :
(i) import all the libraries :
In this we have to import all the libraries we will be requiring.
(ii) Analyzing the Data :
In this step, we have to analyze the data by applying visual &
graphical libraries & functions
(iii) Data Wrangling :
In this step, we modify the data as per the requirement.
(iv) Test & Train the Data :
In this step, we train the data through the fit method & then test it.
(v) Check Accuracy :
In this step, we check the accuracy of our model.
1.) Will that person will purchase that apartment or
not?
Q.) A construction company has released a few
apartments in the market. Using the previous data about the scales of their
apartments, they want to predict the category of people who might be interested
in buying this
Step 1 : Import all the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
apt_data=pd.read_csv(r"E:\Git_repo\Logistic_House_data.csv")
apt_data.head()
output :
apt_data.info()
output :
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
Gender
10 non-null object
Age 10 non-null int64
Salary 10 non-null int64
Purchased 10 non-null int64
dtypes: int64(3), object(1)
memory usage: 448.0+ bytes
Step 2 : Analyzing the Data
#How many of the people has bought the
house
plt.figure(figsize=(7,7))
sns.countplot(x="Purchased",data=apt_data)
output :
plt.figure(figsize=(7,7))
sns.countplot(x="Purchased",hue="Gender",data=apt_data)
output :
Step 3 : Data Wrangling (Not required)
# Create
a dummy variable for male & female
dummies_gen=pd.get_dummies(apt_data["Gender"],drop_first=True)
dummies_gen
output :
apt_data=pd.concat([apt_data,dummies_gen],axis=1)
apt_data
output :
apt_data=apt_data.drop("Gender",axis=1)
apt_data
output :
Step 4 : Test & Train the Data
X=apt_data.drop("Purchased",axis=1)
y=apt_data["Purchased"]
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(X,y)
output :
LogisticRegression(C=1.0, class_weight=None,
dual=False, fit_intercept=True,intercept_scaling=1,
l1_ratio=None, max_iter=100,multi_class='warn',
n_jobs=None, penalty='l2',random_state=None,
solver='warn', tol=0.0001,
verbose=0,warm_start=False)
model.score(X,y)
output :
0.7
pred=model.predict(X_test)
pred
output :
array([0, 1], dtype=int64)
from sklearn.metrics import classification_report
val=classification_report(y_test,pred)
val
output :
y_test.value_counts()
output :
1 1
0 1
Name: Purchased, dtype: int64
from sklearn import metrics
print(metrics.confusion_matrix(y_test,pred))
output :
[[1 0]
[0 1]]
print(metrics.recall_score(y_test,pred))
output :
1.0
Step 5 : Check Accurcay
from
sklearn.metrics import accuracy_score
val2=accuracy_score(y_test,pred)
val2
output :
1.0
Checkout my github :
Q.1) Dataset apartment:
Q.1) Code apartment :
Q.2) Dataset Titanic :
Q.2) Code Titanic :
The second example is of the Titanic survival rate example you have to do on your own if you got any trouble while doing it do check my Code Titanic code which is present above.
Hey, guys enjoyed my contain?... Please do leave a comment below & do practice the Titanic problem & comment on your answer.
No comments:
No Spamming and No Offensive Language