Linear Regression:

6 min readMar 7, 2022

Everyone wants to try their hands on Machine Learning at some point of time in their software career. The first algorithm mostly all books and online courses starts with is the Linear regression. Let us decode it!

Linear + Regression

Linear = arranged in a straight line.
Regression = process of estimating the relationships between a dependent variable given one or more independent variables.

So, the idea of understanding the relationship between 2 variables by plotting a linear line is coined as linear regression. Let us take an example, Price of the house with respect to the size of the house.

The relationship between these two variables Size_of_house and Price_of_house can be plotted as:

If someone ask you to predict the house rate given the house_size=3(10 sq mt). That person has given the answer also beforehand house_prize= 3(10 L). Here comes Linear regression to predict this value.

First, we need to define the relationship between house_size and house_price. So, that we can make use of it for the task at hand. As anyone can tell it is a straight line (remember Linear regression). The equation of a straight line is usually written this way:

y = mx + b

As the basic rule of math, let us take the unknown term as y. Here the unknown is house_price.

y = house_price & x = house_size.

predicted_house_price = m(house_size) + b

Wait…who are these m and b? why do they keep following us? m and b are constants. For now, let us assume m=5 and b=7. house_size=3(10 sq mt). Let us substitute all these values in the above equation.

predicted_house_price = (5 x 3)+ 7 = 22 (10 L)

We got the house_price for 3(10 sq mt) as 220L😳 (Seriously…)

Anyone guessed the reason why it is not even close to 3(10 L)?

Right this is because of the constants we chose, We should select m and b in such a way that predicted_house_price should be close enough to actual_ house_price. But how?

Cost Function

The cost function helps to measure how accurate the output of the operation is compared to the actual output. For regression, the common cost function we use is Mean Squared Error(MSE).

Let us unpack this,

θ₀ = m, θ₁= b
m is the size of input data
ŷ ᶦ is the predicted value for i ᵗʰ term
y ᶦ is the actual value for i ᵗʰ term
Mean of actual — predicted is taken (MEAN = sum of the terms / number of the terms)
We are squaring it because we worry a lot about the magnitude of the equation.
The whole equation is multiplied by 1/2 because, minimising the entire equation to its half will result in the same output with little computations.
The predicted value(ŷ) is obtained using y = mx+b.

To know more about why we are talking about cost function and how is it related to the selection of constants m and b value. Let us draw some graphs!

For simplicity let us always take b= 0 and try different values for m and substitute it the cost function equation.

m = 1, b = 0

For convention, using units house_price(10L) & house_size(10 sq mt)

Linear equation = 3(m) + bŷ = 3y = 3

J(0,0) = 1/2 * [(0–3)²] = 0

This clearly shows that right values of constants plays an important role in right prediction why because every change in constant m has a equivalent change cost function J (which is nothing but our ultimate equation to validate the linear regression results). Also, when cost function is minimum(0), we got absolute results.

So, minimising the cost function is the way to predict accurate results for new data.

Gradient Descent Algorithm

It is an optimisation algorithm used to minimise any function by iteratively moving in the direction of steepest descent.

To understand what gradient descent is, imagine you are standing in the hill top and you should reach the foothills , the twist here is you are blind folded😎. How will you reach?

So, you will take one step at a time in the direction of slope and try to reach the foothills. Exactly this is what gradient descent does to minimise the cost function.

Unpacking,

α constant learning rate(positive integer) : if α is small we are taking slower steps to reach the hill bottom so gradient descent is slow. If α is too big then we might miss the hill bottom and gradient descent fail to converge. α is kept constant because the derivatives anyways reduces to reach the minima.
θ ᵢ value gets updated until we reach the global minima.

For linear regression after taking the derivate of MSE, the equation will be modified as,

Btw for folks who are wondering what is this magic🤔?

Gradient descent equation derivation (Src: Image by author)

Task

Using above equation to minimise cost function so that it fits a line for our dataset close to actual value which can be used for future predictions.

Step 1: Choose some constants m & b value randomly

Wait… but there are infinite numbers out there how come I chose one!😅

In Maths, m = Slope (Change in y/ Change in x)

The slope of the line is calculated as the change in y divided by change in x, so the calculation will look like,

(x1, y1) = (1, 2.5) and (x2, y2) = (3,1)m = (y2 - y1) / (x2 - x1)m = 0.25

In Maths, b = y-intercept / bias

The y-intercept / bias shall be calculated using the formula y-y1 = m(x-x1)

y - 2.5 = 0.25 (x - 1)y = 0.25x + 2.25b = 2.25

Step 2: Use these constant values to make prediction (put it on y=mx+b)

so using the above intuition as base,

m = 0.25 & b = 2.25 (Src: Image by author)

Our first guess on the constants are not so bad😁 as we got 2 of them correct!

Step 3: Use Gradient descent to optimise and find better m & b values(put it on Gradient Descent for MSE equation)

On substitution , we got m = 0.24 and b = 2.23

m = 0.24 & b = 2.23 (Src: Image by author)

This goes on until we get the predicted y close to actual y with less MSE.

Linear regression for sample dataset taken(Line in blue)

Now we can use the blue line to predict the unknown dependent variable house_prize given house_size.

Instant example, house_size = 1.8 (10 sq mt), house_price = 2.8(10 L)

Here, I have attempted to explain Linear regression to the best of my knowledge. soon will get back with a actual dataset example to make a perfect linear regression model.

Please let me know about any suggestions or clarifications in comments.

Resources:

Mathematics for Machine Learning : Linear Regression & Least Square Regression

Machine learning is all about Mathematics, though many libraries are available today which can apply the complex…

towardsdatascience.com