# Classification model with Python code from Scratch

This is step by step Machine learning Classification models with Python code from Scratch using Scikit learn library. In this tutorial, i have explain each and every steps with code and compile result. To evaluated this model we used publicly available kaggal Salary classification dataset  . There are two classes  with 15 features . There are few null values  and used various method to filled up, we also used feature selection method and nearly popular 7 machine learning model to understand it completely.

In this section, we imported important libraries like numpy, pandas, sklearn seaborn and matplotlib etc.

```#Import important Libraries

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler  #library to normalize
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score, auc, precision_recall_curve, roc_curve
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt```

In this part we read dataset and find the number of row and column of dataset. Classification model with Python code from Scratch.

```df = pd.read_csv("salary.csv")
nRow, nCol = df.shape
print(nRow)
print(nCol)```

Here we perform some preprocessing  technique to organize the data . Here tolist() function convert column into list .

```df=pd.DataFrame(df)
df_1=df.columns.tolist()
print(df_1)```

It help to visualize the whole dataset.

`df.head(5)`

This function gives the information about Null values against each features.

`df.isnull().sum()`

This function help to convert categorical data into numerical value to intemperately easily using label Encoder.

```from sklearn.preprocessing import LabelEncoder

def labelencoder(df):                                     #It convert the catorigcal and string data into numerical values to
for c in df.columns:                                        #interperate easily.
if df.dtype=='object':
df = df.fillna('N')
lbl = LabelEncoder()
lbl.fit(list(df.values))
df = lbl.transform(df.values)
return df```

Visualization of data after performing label Encoder technique.

```data1=labelencoder(df)
data1```

It drop the salary column as a target value and consider all other values as it feature .

```Labels=data1['salary']
dataX=data1.drop('salary',axis=1)```

There are various method to to handle Missing and NAN values . Here we used Simple Imputer to filled the missing values using means strategy.

```from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
imputer = imputer.fit(dataX)
X = imputer.transform(dataX)```

In feature selection method the recursive  feature estimator (RFE) and Brota is extensively used . In this tutorial we used Brota  with random forest  classifier. It select useful features and drop less useful features.

```from sklearn.ensemble import RandomForestClassifier
model =RandomForestClassifier(max_depth=1)
from boruta import BorutaPy
feat_selector = BorutaPy(model, n_estimators='auto', verbose=1, random_state=101)
feat_selector.fit(X,Labels)
print(feat_selector.support_)
print(feat_selector.ranking_)
X_filtered1 = feat_selector.transform(X)```

Here we normalized the whole selected features using MinMaxScaler . Actually MinMaxScaler is like Final_selected = (Final_selected – Final_selected.mean()) / Final_selected.std(). It give the values between 0 and 1.

```Final_selected=X_filtered1
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
Final_selected = scaler.fit_transform(Final_selected)```

This step divided the whole dataset into training and testing by keeping test size 20%.

```X_train, X_test,y_train, y_test = train_test_split(Final_selected,Labels, test_size=0.20,random_state=9)

print('training features =',X_train.shape)
print('testing features =',y_train.shape)
print('training labels=',X_test.shape)
print('testing labels =',y_test.shape)```

Now our dataset is completely well organized and ready to feed the model. The overall preprocessing technique is difficult part in AI and machine learning. Now we used different model one by one and see the output performance in terms of accuracy and confusion matrix. Note , we have two class classification. It also giving performance parameter precision , recall fi-score. lets start….

### Random Forest (RF) Classifier

```from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))```

### Support Vector Machine (SVM) Classifier

```from sklearn.svm import SVC
from sklearn import svm
#model = svm.LinearSVC(multi_class="ovr")
model = svm.SVC(kernel='rbf', gamma=7.9, C=20, decision_function_shape='ovo')
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))```

### Decision Tree (DT) Classifier

```from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))```

### Gradient Boosting Machine (GBM) Classifier

```from sklearn.ensemble import GradientBoostingClassifier
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))```

### K nearest Neighbor (KNN) Classifier

```from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=2)
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))
```

### EXtreme Gradient Boosting (XGB) Classifier

```from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(X_train, y_train)
prediction = model.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))```

### Multi Layer Perceptron (MLP) Classifier

```from sklearn.neural_network import MLPClassifier
MLP = MLPClassifier(max_iter=1500,activation='relu', learning_rate_init=0.001,shuffle=True,
learning_rate='constant', beta_1=0.999, beta_2=0.9 , momentum=0.88,
power_t=0.9, solver='lbfgs', alpha=1e-6, random_state=101)
MLP.fit(X_train, y_train)
prediction = MLP.predict(X_test)

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
print(classification_report(y_test, prediction))
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))```

## Conclusion:

In this tutorial, i try to cover all traditional machine learning classifier with complete python code to understand classification problem. It is very helpful for all beginners and machine learning aspiration. This help to build you basics in the field of machine learning  and data science. In this whole , i try to discuss data load, data preparation,  and model development. This is classification problem and we will share with you regression problem in few coming days. Keep in touch . Classification model with Python code from Scratch

If you have any question, any idea or anything’s about this tutorial pleased comment.

Read more: Random forest python code: