Linear Regression Algorithm
Hey guys today we will be looking at the concept of Linear Regression Algorithm. Before directly diving into the topics let's split the term in to Linear & then Regression.
What do we mean by Linear?
Linearity is the property of a mathematical relationship or function which means that it can be graphically represented as a Straight Line.
What do we mean by Regression?
Regression Analysis is a predictive modeling technique. It estimates the relationship between a dependent variable we can also call it as target & an independent variable which is also known as predictor.
What is Linear Regression?
A Linear Regression is the easiest algorithm in Machine Learning. It is a statistical model that attempts to show the relationship between 2 variables with an linear equation.
We covered all most all the basics of this in our
previous blog so please check it out blog_10.2 .
Before diving into the coding let’s understand some
math of the Linear Regression. Don’t worry math will be use to understand how
the algorithm works & by the way all this algorithms are wrapped inside the
sklearn( ie. scikit-learn library) so you don’t have to understand all the
complex math / stats formulas.
Now lets take an example :
Price (X)
|
Number of Chocolates (Y)
|
1
|
2
|
2
|
4
|
3
|
5
|
4
|
4
|
5
|
5
|
Price (x)
|
Number of Chocolates (y)
|
1
|
2
|
2
|
4
|
3
|
5
|
4
|
4
|
5
|
5
|
Mean (x`)=3
|
Mean (y`)=4
|
Now plot the both the mean on the graph.
See our goal is to find or predict the best fit line
using the least square method. For that we have to first find regression of
line.
Now lets find our equation of Regression line.
Regression of Line : y=m*x+c
Price (x)
|
Number of Chocolates (y)
|
(x-x`)
|
(y-y`)
|
(x-x`)^2
|
(y-y`)^2
|
1
|
2
|
-2
|
-2
|
4
|
4
|
2
|
4
|
-1
|
0
|
1
|
0
|
3
|
5
|
0
|
1
|
0
|
0
|
4
|
4
|
1
|
0
|
1
|
0
|
5
|
5
|
2
|
1
|
4
|
2
|
Mean (x`)=3
|
Mean (y`)=4
|
∑=10
|
∑=6
|
Now we have to find m;
(x-x`)
|
(y-y`)
|
(x-x`)*(y-y`)
|
-2
|
-2
|
4
|
-1
|
0
|
0
|
0
|
1
|
0
|
1
|
0
|
0
|
2
|
1
|
2
|
∑=6
|
m = 6/10 = 0.6
Substitute the value of m & x
y`=m*x`+c
c=y`-m*x`
c=4-(0.6*3)
c=4-1.8
c=2.2
y=m*x+c
|
2.8
|
3.4
|
4
|
4.6
|
5.2
|
∑=20
|
Now Calculate distance between actual & predicted
value. We have to reduce the distance.
Now we have calculate the best fit line now its time
to check goodness of fit / how good our model will perform.
For that we Have to use R-Squared method.
*R-Squared Method :
It is a statistical measure of how close the data are
to the fitted regression line.
It is also known as coefficient of determination, or
the coefficient of multiplication determination.
R-square= 3.6/6 = 0.6
Now we found the R-Square but guys the The best regression
line we can get when R-Square is 1.
*Do R-square values are always low?
Any filed that attempts to predict human behaviour
such as psychology typically has R-Square value less than 50% which help us to
conclude that humans are hard to predict & if your R-Square values are low
but you have statistically significant predictors then you can still draw
important conclusionabout how change in the predicators values assoicated with
the changes in the response value regardless of the R-Square the significant
coefficient still represent the mean change in the response of one unit in the
predicator while holding other predicator in the modle constant.
Hey guys enjoyed my contain?... Please do leave a comment below & do practice the Titanic problem & comment your answer.
No comments:
No Spamming and No Offensive Language