डॉ. रुपाली कुलकर्णी : Classification : Iris Dataset : Predicting Class Labels

Classification: It is a process of categorizing a given set of data into classes, It can be performed on both structured or unstructured data. The process starts with predicting the class of given data points. The classes are often referred to as target, label or categories.

Random forest classifier: Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction.

Iris Dataset: The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.

Attribute Information:

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica

Below is the code to create Random Forest Classifier for classifying custom samples supplied from user. Output is class label (plan type : Setosa/Versicolour/Virginica)

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_csv('Iris.csv')

data_points = data.iloc[:, 1:5]

labels = data.iloc[:, 5]

#split

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(data_points,labels,test_size=0.2)

# Classify using Random forest

from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier()

random_forest.fit(x_train, y_train)

print('Training data accuracy {:.2f}'.format(random_forest.score(x_train, y_train)*100))

print('Testing data accuracy {:.2f}'.format(random_forest.score(x_test, y_test)*100))

# predict for User Input

X_new = np.array([[3, 2, 1, 0.2], [ 4.9, 2.2, 3.8, 1.1 ], [ 5.3, 2.5, 4.6, 1.9 ]])

#classfication of the species from the input vector

classify = random_forest.predict(X_new)

print("classification of Species: {}".format(classify))

The output is predicted class labels.

डॉ. रुपाली कुलकर्णी

Menu

Tuesday, August 23, 2022

Classification : Iris Dataset : Predicting Class Labels

No comments:

Post a Comment