Gradient Descent and Cost Function in Machine Learning

Today we will be covering concepts such as Gradient Descent, Learning rate & Mean Square Error. We will be writing a python code in which we will be implementing Gradient Descent.

Don’t worry we will be looking only at he basic equations not full derivation of every concepts because all the math is wrapped inside the API of Tensorflow & Sklearn. So there is not point to understand the whole math in detail & implement it. The only benefit of studying math is that it will help us to implement machine learning model at correct location. But the same thing we can do by solving multiple examples & writing codes.

Okay now enough of the talk now get down to the business.

At school times we had learned the equation y = m*x+c & we solved problem related to it. But now lets take an example & understand the above mentioned topics.

eg:

y = 2x+3 ; where x=[1,2,3,4,5]

Now to derive “y” we just substitute the values of x one by one. Which gives us answers :

y=[5,7,9,11,13]

In machine learning what happen is we have the values of Independent variable & we get values of dependent variable with the help of independent variable. But the actual thing which we derive in machine learning is the equation so as we can take example of our previous example where we predicted the value

check out that Blog(https://www.infinitycodex.in/2020/02/data-science-ss-104-linear-regression.html)

the equation which we find out is know as prediction function & one we know that prediction function we can predict price of any house.

Now if you had check out that blog where I gave you an example of house_area & house_price. In that example we had drew the scatter plot which looked like this.

In which we tried to if out the best fit line.

But did you thought that we can actually pass multiple lines from that scatter point graph like this :

Now the question arised is which line among them is the best.

The best way to determine that thing is to first draw any random line on the plot for example we took this :

Now we now we have to find out “Δ” which is the distance between the Actual data point & where the predicted line is located.

Now we use this formula :

In this formula we did is squared all the Δ to not get the negative number & then we sum them & divide it by 1/n. Where n = number of Δ we have. This gives us mean square error.

MEAN SQUARE ERROR:

MSE is also known as Cost Function.

The Algorithm to find the mean squared error is known as Gradient Descent.

GRADIENT DESCENT COST FUNCTION:

Gradient Descent Definition : Gradient Descent is an algorithm that finds best fit line for given training data set.

Gradient Descent is basically an trial & error approach. Now below I had shown you an 3D plot in which co-efficient(m) & intercept(b) which we took (c) is plotted against MSE.

the plane which you see above is the value of MSE. In Gradient Descent we start with random values for example lets take the value of m = 0 & b = 0 which will be on the top of that graph which you see above we have to go reach towards the depth of that graph by adjusting the values of m & b. The circle you see on the depth of that graph is minimum where the error will be least.

Every line will have different value of m & b which will give us different values & will plot on the graph at different location the more the point is towards minimum(where the cost will be minimum) also known as minima(gradient descent cost function global minimum) the better we get the line on scatter plot. The red line which we had plotted is actually the minimum & reached the depth so we can say that the line which had plotted on the graph best fit line.

Here is the perfect visualization.

In this we kept value of b constant & changed the value of m

iteration means trial & error. The steps which we take to go towards the minima / depth is known as iterations. The more the number of iterations happens the better the result you get.

Now lets talk about how to take steps towards the minima(gradient descent cost function global minimum).

Our first considerations would be to take steps of fixed size like this :

The main disadvantage of this is that sometimes we may miss the minima(gradient descent cost function global minimum).

Our second consideration is that we may kept decreasing the size of steps we take while approaching towards minima(gradient descent cost function global minimum).

Now the question comes how much size of step we must decrease.

So the answer for that is first of all we will draw tangent(slop) on all the point.

Now I’ll not cover math in detail. But every one just remember this formulas this are nothing but the partial derivatives of m & b.

To understand the concept completely visit this website which includes the concepts of derivatives & partial derivatives.

website : https://www.mathsisfun.com/calculus/derivatives-introduction.html

How we will use this formula…initially we will start by using random values of m & b then we will include the step size & the formula for step size is :

learning rate is how fast we have to take step size.

GRADIENT DESCENT COST FUNCTION EXAMPLE:

Now we will take the example where we will be doing gradient descent cost function in python.

code :

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

def gradient_descent(x,y):

#step 1 we initially take random value of m & b here i took 0

m_curr=0

b_curr=0

#step 2 we n which is number of Δ which would be equal to number of x we have

n=len(x)

#step 3 we took the learning rate

rate=0.01

#step 4 we took iterations we want.

it_num=1000

#step 5 we plot the graph

plt.scatter(x,y,color="black")

#step 6 we iterate the it times

for i in range(it_num):

yp=m_curr*x+b_curr #y predicted

plt.plot(x,yp,color="yellow") #sum of square of every delta value

md=-(2/n)*sum(x*(y-yp)) #partial derivative of m

bd=-(2/n)*sum(y-yp) #partial derivative of b

m_curr=m_curr-rate*md

b_curr=b_curr-rate*bd

x=np.array([1,2,3,4,5])

y=np.array([5,7,9,11,13])

gradient_descent(x,y)

output :

Here is the link to my jupyter notebook :

https://github.com/Vegadhardik7/Git_Prac_Repo/blob/master/jup_gradientdescent.ipynb

Hey guys, do you found our content informative? please leave a comment below & don't forget to share it with your friends.

InfinityCodeX

Gradient Descent And Cost Function In Machine Learning

GRADIENT DESCENT COST FUNCTION:

GRADIENT DESCENT COST FUNCTION EXAMPLE:

You May Also Like

No comments:

Subscribe

Categories

Blog Archive

Recent Posts

Pages

Random Posts

Tags

Popular Posts