Regression is the process of process of
estimating the relationships between a dependent (or target)
variable and one or more independent (or predictor) variables. It finds
application in the area of Inference Analysis. It is a handy
technique for forecasting the future trends in data.
Example: Consider that a HR head wants to
fix salary of a new employee. For finalizing the salary the head, considers the
various parameters like the level of education, no of years of experience, last
position held, expertise level etc. Now if the salary is predicted using
only one parameter say 'no of years of experience' then this type of regression
is called as Simple Linear Regression (one target and one predictor
variable) . Also, if multiple parameters say 'level of education', 'no of years
of experience', 'last position held' are used to fix the salary then it becomes
Multivariate Regression (single target, multiple predictor variables).
Irrespective
of the model you choose for the task of performing Simple Linear Regression,
you need to complete the following steps.
- Prepare the training data: This step may involve
operations such as data cleaning, transformation etc.
- Create the model for
prediction: During
this step, the model of your choice needs to be initialized and
configured.
- Train the model: During this step, the model
is trained on the data created in step 1 above,
- Deploy the model for
prediction: This
step accepts the test data and predicts the value of the target variable.
In this
Article, let us explore three simple ways of performing Simple Linear Regression
using the models such as:
- Random Forest
- Support Vector Machine (SVM)
- Multi Layer Perceptron (MLP)
Let us consider the training data from the file
'Salaries.csv'.
Problem
Statement: Using
this data, we want to predict the salary of new person (target variable) using
the parameter of 'no. of years of experience' (predictor variable).
In this
Article, let us explore three simple ways of performing Simple Linear
Regression using the models such as:
- Random Forest
- Support Vector Machine (SVM)
- Multi Layer Perceptron (MLP)
4. Let us explore these regression
models.
1. 1. Random Forest: A random forest is an ensemble that consists of many decisions trees. It uses bagging and feature randomness when building each individual tree. While predicting, for the purpose of maximizing the prediction accuracy, it considers the prediction which has been generated by the maximum trees.
The 'sklearn' library in Python can be used to create the random forest as shown below.
Python3
# prediction using Random forest
# Importing the libraries
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
Now let us initialize the training data
set.
Python3
data = pd.read_csv('Salaries.csv')
x = data.iloc[:, 1:2].values # so x=Yrs. of Experience
y = data.iloc[:, 2].values # so y= Salary in Rs.
Next step is to initialize the Random Forest
model and feeding the training dataset to it.
Python3
# Create a Random Forest model.Default no of
trees=100
model = RandomForestRegressor()
#Train the model using the training data
model.fit(x, y)
One the model is trained, you can use it for the
task of prediction. Let us try to predict the salary of a person whi has
experience of 7.4 years.
Python3
#Predict the salary for test dataset
Y_pred = model.predict(np.array([7.4]).reshape(1,
1)) # test the output by changing values
print("Predicted Salary=", Y_pred)
Output: Predicted Salary= [82500.]
2. Support Vector Machine (SVM): A support vector machine (SVM)
is a supervised machine learning model that can be used for both the tasks of
classification and regression. After giving an SVM model sets of labeled
training data they’re able to predict the target. The SVM models use kernel
functions to avoid complex computations which make them suitable for handling
the large data.
The 'sklearn' library in Python can be used
to create the SVM as shown below.
Python3
# prediction using SVM
from sklearn import svm
from sklearn import metrics
import pandas as pd
data = pd.read_csv('Salaries.csv')
x = data.iloc[:, 1:2].values # so x=Yrs. of Experience
y = data.iloc[:, 2].values # so y= Salary in Rs.
#Create a svm with Linear Kernel
model = svm.SVC() # model = svm.SVC(kernel='linear')
#Train the model using the training data
model.fit(x,y)
#Predict the salary for test dataset
y_pred = model.predict(np.array([7.4]).reshape(1,
1))
print("Predicted Salary=", y_pred)
Output: Predicted Salary= [80000]
3. Multi Layer Perceptron (MLP): It is one of the most
common neural network models used in machine learning. A multi-layered
perceptron consists of interconnected neurons transferring information to each
other. The MLP is a feedforward neural network, which means that the data is
transmitted from the input layer to the output layer in the forward direction.
The connections between the layers are assigned weights. The weight of a
connection specifies its importance. The technique of 'Backpropagation'
is used to optimize the weights of an MLP till the weights converge to predict
the correct values.
The
'sklearn' library in Python can be used to create the MLP regressor as shown
below.
Python3
# prediction using NN: MLP
from sklearn.neural_network import MLPRegressor
import pandas as pd
import numpy as np
data = pd.read_csv('Salaries.csv')
x = data.iloc[:, 1:2].values # so x=Yrs. of Experience
y = data.iloc[:, 2].values # so y= Salary in Rs.
# create the MLPRegressor model
nn = MLPRegressor(solver='lbfgs', alpha=1e-1,
hidden_layer_sizes=(5, 2), random_state=0)
#Train the model using the training sets
nn.fit(x,y)
#predict the salary of a person who has experience
of 7.4 years.
y_pred = nn.predict(np.array([7.4]).reshape(1, 1))
print("Predicted Salary=", y_pred)
Output: Predicted Salary=
[88285.71344169]
Conclusion: The three models discussed have
different levels of accuracy as depicted from the output obtained. So the
'prediction accuracy' parameter affects the decision of selecting the proper
model for the task of prediction.
No comments:
Post a Comment