Classification model with Python code from Scratch

This is step by step Machine learning Classification models with Python code from Scratch using Scikit learn library. In this tutorial, i have explain each and every steps with code and compile result. To evaluated this model we used publicly available kaggal Salary classification dataset  . There are two classes  with 15 features . There are few null values  and used various method to filled up, we also used feature selection method and nearly popular 7 machine learning model to understand it completely.

In this section, we imported important libraries like numpy, pandas, sklearn seaborn and matplotlib etc.

#Import important Libraries

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler  #library to normalize
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score, auc, precision_recall_curve, roc_curve
from sklearn.linear_model import LogisticRegression   
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
import pandas as pd                     
import numpy as np                      
import seaborn as sns                   
from matplotlib import pyplot as plt

In this part we read dataset and find the number of row and column of dataset. Classification model with Python code from Scratch.

df = pd.read_csv("salary.csv")
nRow, nCol = df.shape                
print(nRow)
print(nCol)

Here we perform some preprocessing  technique to organize the data . Here tolist() function convert column into list .

df=pd.DataFrame(df)          
df_1=df.columns.tolist()              
print(df_1)

Classification model with Python code

It help to visualize the whole dataset.

df.head(5)

Classification model with Python code

This function gives the information about Null values against each features.

df.isnull().sum()

Classification model with Python code

This function help to convert categorical data into numerical value to intemperately easily using label Encoder.

from sklearn.preprocessing import LabelEncoder

def labelencoder(df):                                     #It convert the catorigcal and string data into numerical values to 
    for c in df.columns:                                        #interperate easily.
        if df.dtype=='object': 
            df = df.fillna('N')
            lbl = LabelEncoder()
            lbl.fit(list(df.values))
            df = lbl.transform(df.values)
    return df

 


Visualization of data after performing label Encoder technique.

data1=labelencoder(df)                           
data1

Classification model with Python code

It drop the salary column as a target value and consider all other values as it feature .

Labels=data1['salary'] 
dataX=data1.drop('salary',axis=1)

There are various method to to handle Missing and NAN values . Here we used Simple Imputer to filled the missing values using means strategy.

from sklearn.impute import SimpleImputer                                                                                                                          
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
imputer = imputer.fit(dataX)
X = imputer.transform(dataX)

In feature selection method the recursive  feature estimator (RFE) and Brota is extensively used . In this tutorial we used Brota  with random forest  classifier. It select useful features and drop less useful features.

from sklearn.ensemble import RandomForestClassifier
model =RandomForestClassifier(max_depth=1) 
from boruta import BorutaPy
feat_selector = BorutaPy(model, n_estimators='auto', verbose=1, random_state=101)
feat_selector.fit(X,Labels)
print(feat_selector.support_) 
print(feat_selector.ranking_) 
X_filtered1 = feat_selector.transform(X)

Classification model with Python code

Here we normalized the whole selected features using MinMaxScaler . Actually MinMaxScaler is like Final_selected = (Final_selected – Final_selected.mean()) / Final_selected.std(). It give the values between 0 and 1.

Final_selected=X_filtered1
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))                        
Final_selected = scaler.fit_transform(Final_selected)

This step divided the whole dataset into training and testing by keeping test size 20%.

X_train, X_test,y_train, y_test = train_test_split(Final_selected,Labels, test_size=0.20,random_state=9)

print('training features =',X_train.shape)
print('testing features =',y_train.shape)
print('training labels=',X_test.shape)
print('testing labels =',y_test.shape)

Classification model with Python code

Now our dataset is completely well organized and ready to feed the model. The overall preprocessing technique is difficult part in AI and machine learning. Now we used different model one by one and see the output performance in terms of accuracy and confusion matrix. Note , we have two class classification. It also giving performance parameter precision , recall fi-score. lets start….

Random Forest (RF) Classifier

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)       
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

Support Vector Machine (SVM) Classifier

from sklearn.svm import SVC
from sklearn import svm
#model = svm.LinearSVC(multi_class="ovr")
model = svm.SVC(kernel='rbf', gamma=7.9, C=20, decision_function_shape='ovo')
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

Decision Tree (DT) Classifier

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

Gradient Boosting Machine (GBM) Classifier

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(random_state=101)
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

K nearest Neighbor (KNN) Classifier

from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=2)
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

EXtreme Gradient Boosting (XGB) Classifier

from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

Multi Layer Perceptron (MLP) Classifier

from sklearn.neural_network import MLPClassifier
MLP = MLPClassifier(max_iter=1500,activation='relu', learning_rate_init=0.001,shuffle=True,
                    learning_rate='constant', beta_1=0.999, beta_2=0.9 , momentum=0.88,
                    power_t=0.9, solver='lbfgs', alpha=1e-6, random_state=101)
MLP.fit(X_train, y_train)
prediction = MLP.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))

Classification model with Python code

Conclusion:

In this tutorial, i try to cover all traditional machine learning classifier with complete python code to understand classification problem. It is very helpful for all beginners and machine learning aspiration. This help to build you basics in the field of machine learning  and data science. In this whole , i try to discuss data load, data preparation,  and model development. This is classification problem and we will share with you regression problem in few coming days. Keep in touch . Classification model with Python code from Scratch

If you have any question, any idea or anything’s about this tutorial pleased comment.

 

Read more: Random forest python code:

 

 

Add a Comment

Your email address will not be published. Required fields are marked *