Multiple Linear Regression and Polynomial Regression

Multiple Linear Regression and Polynomial Regression

3 rd part of the Machine learning brief article series (Ato Z)

·

4 min read

Multiple Linear Regression :

In the previous article, we saw Linear Regression (those who are new to the series kindly checkout and continue this) which maps the linear relationship between X and Y, the target variable (Y) which is dependent only on one variable X but in the real world, this does not happen the target variable depends on many variables (Features) for example if you want to determine the price of the house there are many factors which affects the price of the house so want to modify the hypothesis (Equation) which is suitable to handle the multiple input values (or independent values).

The hypothesis of Multiple Linear Regression is

0_pJsp76_deJvdDean.png

the linear regression hypothesis is modified to suit many independent variables. from the above hypothesis can you infer anything this is nothing but the equation of the plane. In linear regression line tries to fit the data whereas in multiple linear regression plane tries to fit the data

The remaining everything is similar to that of Linear regression let us see the Multiple Linear Regression from scratch in python


def hypothesis(X,theta):
   - '''
    X : np.array. shape - (m,n)
    theta : np.array. shape - (n,1)

    return : np.array (m,1)
    '''

    return X.dot(theta)

def error(X,Y,theta):
    '''
    X : np.array. shape - (m,n)
    Y : np.array. shape - (m,1)
    theta : np.array. shape - (n,1)


    '''

    Y_ = hypothesis(X,theta)
    e = np.sum((Y-Y_)**2)

    return e/(2*X.shape[0])

def gradient(X,Y,theta):
    '''
    X : np.array. shape - (m,n)
    Y : np.array. shape - (m,1)
    theta : np.array. shape - (n,1)

    return : np.array (n,1)
    '''

    Y_ = hypothesis(X,theta)
    grad = np.dot(X.T,(Y_ - Y))

    return grad/X.shape[0]

def gradient_descent(X, Y, learning_rate = 0.1, max_iters=100):

    n = X.shape[1]
    theta = np.zeros((n,1))
    error_list = []

    for i in range(max_iters):
        e = error(X,Y,theta)
        error_list.append(e)

        #Gradient descent
        grad = gradient(X,Y,theta)
        theta = theta - learning_rate*grad

    return theta,error_list

the image of multiple linear regression would be

1_HLN6FxlXrzDtYN0KAlom4A.png

Polynomial Regression:

Polynomial Regression forms a relation between the Dependent variable (Y) and the Independent variable as n th degree polynomial This is the formula of Polynomial Regression

image.png

Why the need for polynomial regression?

Suppose you have the non-linear dataset where the relationship between X and Y is nonlinear in such scenario polynomial regression will be more helpful.

Let us see the Example of this scenario:

First I import the dependency:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

For explaining Polynomial Regression I am going to create dataset which has a non linear relation between X and Y

np.random.seed(1) 
X = np.random.rand(50,1) 
y = 0.7*(X**5) -  2.1*(X**4) +  2.3*(X**3) +  0.2*(X**2) +  0.3* X +  0.4*np.random.rand(50,1) # no data in world is perfect 


fig = plt.figure() 
plt.scatter(X, y) 
plt.xlabel("x") 
plt.ylabel("y")
plt.show()

image.png

Let us now predict the value of y using simple Linear regression :

from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
outputs = model.predict(X)


fig = plt.figure() 
plt.scatter(X, y) 
plt.plot(X, outputs, color='red')
plt.xlabel("x") 
plt.ylabel("y")
plt.show()
print(model.score(X,y))

image.png

this is the result of Linear regression the score is Approximately 89%

Let us do the same with Polynomial Regression

I am going to create a second degree feature in the dataset and going to predict the result

X_deg2 = np.hstack([X, X**2])

X_deg2.shape

(50, 2)

Again creating a model to predict the improved dataset

model_deg2 = LinearRegression().fit(X_deg2, y)
outputs = model_deg2.predict(X_deg2)


fig = plt.figure() 
plt.scatter(X, y) 
plt.scatter(X, outputs, color='red')
plt.xlabel("x") 
plt.ylabel("y")
plt.show()
print(model_deg2.score(X_deg2,y))

0.937213227713278

image.png

from this image, we can say that this model is somewhat better than previous model the Score of this model is around 93% which is pretty much higher compared to previous model

Similarly we can create third degree polynomial also

X_deg3 = np.hstack([X, X**2, X**3])
model_deg3 = LinearRegression().fit(X_deg3, y)
output = model_deg3.predict(X_deg3)

fig = plt.figure()
plt.scatter(X, y, label="samples")
plt.scatter(X, output, label="prediction")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()
print(model_deg3.score(X_deg3, y))

0.9384895307987052

image.png

this model also fits perfectly the score is around 94%

As we can see that as the feature increases the score also increases this is only upto certain level. In this way Polynomial Regression can be used for Non Linear dataset.

Another easy method to create polynomial feature on to dataset using Sklearn 's PolynomialFeature method

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(5)
X_poly = poly.fit_transform(X)
X_poly.shape

(50, 6)

This is about Multiple Linear Regression and Polynomial Regression in the next article I will cover Logistic Regression.Happy Learning ....

image.png

if you like my work Kindly like and leave your comments....

Did you find this article valuable?

Support SANJAl by becoming a sponsor. Any amount is appreciated!