Gradient Descent And Cost Function In Machine Learning
Today we will be covering concepts such as Gradient
Descent, Learning rate & Mean Square Error. We will be writing a python
code in which we will be implementing Gradient Descent.
Don’t worry we will be looking only at he basic
equations not full derivation of every concepts because all the math is wrapped
inside the API of Tensorflow & Sklearn. So there is not point to understand
the whole math in detail & implement it. The only benefit of studying math
is that it will help us to implement machine learning model at correct location.
But the same thing we can do by solving multiple examples & writing codes.
Okay now enough of the talk now get down to the
business.
At school times we had learned the equation y = m*x+c
& we solved problem related to it. But now lets take an example &
understand the above mentioned topics.
eg:
y = 2x+3 ;
where x=[1,2,3,4,5]
Now to derive “y” we just substitute the values of x one by one. Which gives us answers :
y=[5,7,9,11,13]
In machine learning what happen is we have the values
of Independent variable & we get values of dependent variable with the help
of independent variable. But the actual thing which we derive in machine learning
is the equation so as we can take example of our previous example where we predicted
the value
check out that Blog(https://www.infinitycodex.in/2020/02/data-science-ss-104-linear-regression.html)
the equation which we find out is know as prediction
function & one we know that prediction function we can predict price of any
house.
Now if you had check out that blog where I gave you an
example of house_area & house_price. In that example we had drew the
scatter plot which looked like this.
In which we tried to if out the best fit line.
But did you thought that we can actually pass multiple
lines from that scatter point graph like this :
Now the question arised is which line among them is
the best.
The best way to determine that thing is to first draw
any random line on the plot for example we took this :
Now we now we have to find out “Δ” which
is the distance between the Actual data point & where the predicted line is
located.
Now we use this formula :
In this formula we did is squared all the Δ to
not get the negative number & then we sum them & divide it by 1/n.
Where n = number of Δ we
have. This gives us mean square error.
MEAN SQUARE ERROR:
MSE is also known as Cost Function.
The Algorithm to find the mean squared error is known as
Gradient Descent.
GRADIENT DESCENT COST FUNCTION:
Gradient Descent Definition : Gradient Descent is an
algorithm that finds best fit line for given training data set.
Gradient Descent is basically an trial & error
approach. Now below I had shown you an 3D plot in which co-efficient(m)
& intercept(b) which we took (c) is plotted against MSE.
the plane which you see above is the value of MSE. In
Gradient Descent we start with random values for example lets take the value of
m = 0 & b = 0 which will be on the top of that graph which you
see above we have to go reach towards the depth of that graph by adjusting the
values of m & b. The circle you see on the depth of that
graph is minimum where the error will be least.
Every line will have different value of m &
b which will give us different values & will plot on the graph at
different location the more the point is towards minimum(where
the cost will be minimum) also known as minima(gradient
descent cost function global minimum) the better we get the
line on scatter plot. The red line which we had plotted is actually the minimum &
reached the depth so we can say that the line which had plotted on the
graph best fit line.
Here is the perfect visualization.
In this we kept value of b constant & changed
the value of m
iteration means trial & error. The steps which we
take to go towards the minima / depth is known as iterations. The
more the number of iterations happens the better the result you get.
Now lets talk about how to take steps towards the minima(gradient
descent cost function global minimum).
Our first considerations would be to take steps of
fixed size like this :
The main disadvantage of this is that sometimes
we may miss the minima(gradient descent cost function
global minimum).
Our second consideration is that we may kept decreasing
the size of steps we take while approaching towards minima(gradient
descent cost function global minimum).
Now the question comes how much size of step we must
decrease.
So the answer for that is first of all we will draw
tangent(slop) on all the point.
Now I’ll not cover math in detail. But every one just
remember this formulas this are nothing but the partial derivatives of m
& b.
To understand the concept completely visit this website
which includes the concepts of derivatives & partial derivatives.
How we will use this formula…initially we will start
by using random values of m & b then we will include the step size & the formula for step size
is :
learning rate
is how fast we have to take step size.
GRADIENT DESCENT COST FUNCTION EXAMPLE:
Now we will take the example where we will be doing gradient
descent cost function in python.
code :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
def gradient_descent(x,y):
#step 1 we
initially take random value of m & b here i took 0
m_curr=0
b_curr=0
#step 2 we n
which is number of Δ which would be equal to number of x we have
n=len(x)
#step 3 we
took the learning rate
rate=0.01
#step 4 we
took iterations we want.
it_num=1000
#step 5 we
plot the graph
plt.scatter(x,y,color="black")
#step 6 we
iterate the it times
for i in
range(it_num):
yp=m_curr*x+b_curr
#y predicted
plt.plot(x,yp,color="yellow") #sum of square of every delta value
md=-(2/n)*sum(x*(y-yp)) #partial derivative of m
bd=-(2/n)*sum(y-yp) #partial derivative of b
m_curr=m_curr-rate*md
b_curr=b_curr-rate*bd
x=np.array([1,2,3,4,5])
y=np.array([5,7,9,11,13])
gradient_descent(x,y)
output :
Here is the link to my jupyter notebook :
Hey guys, do you found our content informative? please leave a comment below & don't forget to share it with your friends.
No comments:
No Spamming and No Offensive Language