Machine Learning Algorithm in Python

What is a Support Vector Machine?

Machine learning is a subclass of artificial intelligence and data science, in which by the help of data and algorithm , we learned to improve accuracy like a human. Human brain is a like a machine model similarly we can say brain based machine , which trained to make decision without human intervention. In learning processes a lot of technique and algorithms which support  machine to make enable decision on the based of training , and what it have learned from that training.  Support Vector Machine  (SVM)  is a strong machine learning classifier, many people prefer support vector machine because it is more accurate and uses less computing power. SVM, or Support Vector Machine, can be used to perform both classification and regression tasks. It is however widely used for classification objectives.

The top machine learning classifier and algorithm which  make perfect and  fast decision with in no time.  It can be applicable for real time system. The machine learning and AI completely change the world into digital form . Every people learned from internet according to their need , now a days  reading book is drastically reducing  due to all theses technology.

You can also study more about machine learning and algorithm :  scikit-learn 

Read also  Block chain.     

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
sns.set()

 

Import all required  library , you can import all sklearn libraries  for instance  GaussianNB, KNeighborsClassifier, RandomForestClassifier, XGBoost,GBMClassifier etc,.

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))
scores={}                                                                                      #Machine learning

 

You have to load data in csv file ,it may be features, sensor values or any things, then need to preprocessed it by using means or median , if their is NAN  or missing value .

Then divide the dataset into training and testing set with ratio 70% training and 30% testing…To split this data you have bulit-in library   click here . Then the  X_train, X_test are used to train and test different classifier .

This is linear regression model, it is function which takes  X_train, X_test of your dataset and return the prediction.

def LinearRegressionModel(X_train,y_train,X_test):
classifier = LogisticRegression(random_state=0)
classifier.fit(X_train,y_train)
#Predict results for Test set
y_pred = classifier.predict(X_test)
scores['LogisticRegression'] = classifier.score(X_train, y_train)
return y_pred                                                                                  #Machine learning

 

Similarly it also take same parameter X_train, X_test  and you can used linear kernal or RBF kernal , here you can also change gamma value and random_state value to check its reslut  on different value.

def KenelSVMModel(X_train,y_train,X_test):
classifier = SVC(kernel='rbf', random_state=0,C=1.0,gamma=0.1)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
scores['KernelSVM'] = classifier.score(X_train, y_train)
return y_pred                                                                           #Machine learning

 

This is random forest model function  , which also take two input data X_train, X_test, then return back predicted result.

def RFModel(X_train,y_train,X_test):
classifier = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
scores['RandomForest'] = classifier.score(X_train, y_train)
return y_pred                                                                                  #Machine learning

 

Similarly NaiveBayesModel also take two input  X_train, X_test, and same process occure.  ,

def NaiveBayesModel(X_train,y_train,X_test):
classifier = GaussianNB()
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
scores['NaiveBayes'] = classifier.score(X_train, y_train)
return y_pred                                                                                    #Machine learning

 

This is K-nearest neighbour (KNN) , it make cluster on the basis of  nerest value or data. It also retrun predicted result according to your data.

def KNNModel(X_train,y_train,X_test):
classifier=KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
scores['KNN'] = classifier.score(X_train, y_train)
return y_pred                                                                                   #Machine learning

 

Importing train dataset, Removing few variables from dataset as they won’t be much helpful in prediction. Filling in NaN data with mean and most repeated values.

dataset_original_train = pd.read_csv('../input/train.csv',header=0)
colNamesToDrop=['PassengerId','Name','Ticket','Cabin']
dataset_train = dataset_original_train.drop(colNamesToDrop,axis=1)
dataset_train["Embarked"] = dataset_train["Embarked"].fillna("S")
dataset_train["Age"].fillna(dataset_train["Age"].median(), inplace=True)

 

Importing test dataset,, we can  dorpout or fill any missing value from whole dataset.

dataset_original_test = pd.read_csv('../input/test.csv',header=0)
dataset_test = dataset_original_test.drop(colNamesToDrop,axis=1)
dataset_test["Age"].fillna(dataset_test["Age"].median(), inplace=True)
dataset_test["Fare"].fillna(dataset_test["Fare"].median(), inplace=True)
dataset_test["Embarked"] = dataset_test["Embarked"].fillna("S")

 

we can Convert DataFrame to Array  and also we can get independent variable   X_train, Y_train, X_test,Y_test data from our dataset.

X_train = dataset_train.iloc[:,1:8].values                                                                                                               

y_train = dataset_train.iloc[:, 0].values
X_test = dataset_test.iloc[:].values

 

 

Encoding categorical data of both train and test dataset , which convert the label value or string value  into numeric  value to perfromed evaluation.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X_train[:, (1)] = labelencoder_X.fit_transform(X_train[:, (1)])
X_train[:, (6)] = labelencoder_X.fit_transform(X_train[:, (6)])
X_test[:, (1)] = labelencoder_X.fit_transform(X_test[:, (1)])
X_test[:, (6)] = labelencoder_X.fit_transform(X_test[:, (6)])
onehotencoder = OneHotEncoder(categorical_features = [6])
X_train = onehotencoder.fit_transform(X_train).toarray()
X_test = onehotencoder.fit_transform(X_test).toarray()

 

Avoid Dummy variable trap                                                                                                                      Peforming Feature Scaling on both test and train datasets

X_train=X_train[:,1:]
X_test=X_test[:,1:]  from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

 

when your preprocessed data is completely prepared , then call the above method  to perform training and testing. Here all the classifier are called by their function name and give predicted result against each classifier.

Linear_ypred = LinearRegressionModel(X_train,y_train,X_test)
kSVM_ypred = KenelSVMModel(X_train,y_train,X_test)
RF_ypred = RFModel(X_train,y_train,X_test)
NB_ypred = NaiveBayesModel(X_train,y_train,X_test)
KNN_ypred = KNNModel(X_train,y_train,X_test)

df=pd.DataFrame([[key,value] for key,value in scores.items()],columns=["Algorithm","Score"]) sns.factorplot(x="Algorithm", y="Score", hue="Algorithm", data=df, kind="bar");

 

Even though i see RF model is having 97% of score- I feel that, it is overfitting.
Hence i would like to consider SVM model.

result = pd.DataFrame({
"PassengerId": dataset_original_test["PassengerId"],
"Survived": kSVM_ypred
})
result.to_csv('submission.csv', index=False)                                                              #Machine learning

 

The above  algorithm is best algorithm  and frequently used in many application.

KNN which decide on the based of k-nearest neighbor , if k value keep 2 then it check the  neighbor of two and make clustering ,Clustering in machine learning is to make cluster on the basis of similarities,  if we set it 3 then it check for neighbor 3 and so on.. Similarly in Decision Tree , it make a tree  then on the basis of some condition they move from root toward leaves to make decision.

Random Forest  is based on multiple  combination of decision tree and make decision more accurately as compare to other classifier.

Further more read:::  XAIBLOCKCHAIN TECHNOLOGYIMPORTANCE OF CODING, TRANSFER LEARNING

Add a Comment

Your email address will not be published. Required fields are marked *