Linear Regression Algorithm

Hey guys today we will be looking at the concept of Linear Regression Algorithm. Before directly diving into the topics let's split the term in to Linear & then Regression.

What do we mean by Linear?

Linearity is the property of a mathematical relationship or function which means that it can be graphically represented as a Straight Line.

What do we mean by Regression?

Regression Analysis is a predictive modeling technique. It estimates the relationship between a dependent variable we can also call it as target & an independent variable which is also known as predictor.

What is Linear Regression?

A Linear Regression is the easiest algorithm in Machine Learning. It is a statistical model that attempts to show the relationship between 2 variables with an linear equation.

We covered all most all the basics of this in our previous blog so please check it out blog_10.2 .

Before diving into the coding let’s understand some math of the Linear Regression. Don’t worry math will be use to understand how the algorithm works & by the way all this algorithms are wrapped inside the sklearn( ie. scikit-learn library) so you don’t have to understand all the complex math / stats formulas.

Now lets take an example :

Price (X)	Number of Chocolates (Y)
1	2
2	4
3	5
4	4
5	5

Price (x)	Number of Chocolates (y)
1	2
2	4
3	5
4	4
5	5
Mean (x`)=3	Mean (y`)=4

Now plot the both the mean on the graph.

See our goal is to find or predict the best fit line using the least square method. For that we have to first find regression of line.

Now lets find our equation of Regression line.

Regression of Line : y=m*x+c

Price (x)	Number of Chocolates (y)	(x-x`)	(y-y`)	(x-x`)^2	(y-y`)^2
1	2	-2	-2	4	4
2	4	-1	0	1	0
3	5	0	1	0	0
4	4	1	0	1	0
5	5	2	1	4	2
Mean (x`)=3	Mean (y`)=4			∑=10	∑=6

Now we have to find m;

(x-x`)	(y-y`)	*(x-x`)(y-y`)**
-2	-2	4
-1	0	0
0	1	0
1	0	0
2	1	2
		∑=6

m = 6/10 = 0.6

Substitute the value of m & x

y`=m*x`+c

c=y`-m*x`

c=4-(0.6*3)

c=4-1.8

c=2.2

y=m*x+c

2.8

3.4

4.6

5.2

∑=20

Now Calculate distance between actual & predicted value. We have to reduce the distance.

Now we have calculate the best fit line now its time to check goodness of fit / how good our model will perform.

For that we Have to use R-Squared method.

*R-Squared Method :

It is a statistical measure of how close the data are to the fitted regression line.

It is also known as coefficient of determination, or the coefficient of multiplication determination.

Now we have to compare the distance between actual & the predicted line.

R-square= 3.6/6 = 0.6

Now we found the R-Square but guys the The best regression line we can get when R-Square is 1.

*Do R-square values are always low?

Any filed that attempts to predict human behaviour such as psychology typically has R-Square value less than 50% which help us to conclude that humans are hard to predict & if your R-Square values are low but you have statistically significant predictors then you can still draw important conclusionabout how change in the predicators values assoicated with the changes in the response value regardless of the R-Square the significant coefficient still represent the mean change in the response of one unit in the predicator while holding other predicator in the modle constant.

Hey guys enjoyed my contain?... Please do leave a comment below & do practice the Titanic problem & comment your answer.

InfinityCodeX