Machine Learning Algorithm in Python
What is a Support Vector Machine?
Machine learning is a subclass of artificial intelligence and data science, in which by the help of data and algorithm , we learned to improve accuracy like a human. Human brain is a like a machine model similarly we can say brain based machine , which trained to make decision without human intervention. In learning processes a lot of technique and algorithms which support machine to make enable decision on the based of training , and what it have learned from that training. Support Vector Machine (SVM) is a strong machine learning classifier, many people prefer support vector machine because it is more accurate and uses less computing power. SVM, or Support Vector Machine, can be used to perform both classification and regression tasks. It is however widely used for classification objectives.
The top machine learning classifier and algorithm which make perfect and fast decision with in no time. It can be applicable for real time system. The machine learning and AI completely change the world into digital form . Every people learned from internet according to their need , now a days reading book is drastically reducing due to all theses technology.
You can also study more about machine learning and algorithm : scikit-learn
Read also Block chain.
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.neighbors import KNeighborsClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.naive_bayes import GaussianNB sns.set()
Import all required library , you can import all sklearn libraries for instance GaussianNB, KNeighborsClassifier, RandomForestClassifier, XGBoost,GBMClassifier etc,.
from subprocess import check_output print(check_output(["ls", "../input"]).decode("utf8")) scores={} #Machine learning
You have to load data in csv file ,it may be features, sensor values or any things, then need to preprocessed it by using means or median , if their is NAN or missing value .
Then divide the dataset into training and testing set with ratio 70% training and 30% testing…To split this data you have bulit-in library click here . Then the X_train, X_test are used to train and test different classifier .
This is linear regression model, it is function which takes X_train, X_test of your dataset and return the prediction.
def LinearRegressionModel(X_train,y_train,X_test): classifier = LogisticRegression(random_state=0) classifier.fit(X_train,y_train) #Predict results for Test set y_pred = classifier.predict(X_test) scores['LogisticRegression'] = classifier.score(X_train, y_train) return y_pred #Machine learning
Similarly it also take same parameter X_train, X_test and you can used linear kernal or RBF kernal , here you can also change gamma value and random_state value to check its reslut on different value.
def KenelSVMModel(X_train,y_train,X_test): classifier = SVC(kernel='rbf', random_state=0,C=1.0,gamma=0.1) classifier.fit(X_train,y_train) y_pred = classifier.predict(X_test) scores['KernelSVM'] = classifier.score(X_train, y_train) return y_pred #Machine learning
This is random forest model function , which also take two input data X_train, X_test, then return back predicted result.
def RFModel(X_train,y_train,X_test): classifier = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0) classifier.fit(X_train,y_train) y_pred = classifier.predict(X_test) scores['RandomForest'] = classifier.score(X_train, y_train) return y_pred #Machine learning
Similarly NaiveBayesModel also take two input X_train, X_test, and same process occure. ,
def NaiveBayesModel(X_train,y_train,X_test): classifier = GaussianNB() classifier.fit(X_train,y_train) y_pred = classifier.predict(X_test) scores['NaiveBayes'] = classifier.score(X_train, y_train) return y_pred #Machine learning
This is K-nearest neighbour (KNN) , it make cluster on the basis of nerest value or data. It also retrun predicted result according to your data.
def KNNModel(X_train,y_train,X_test): classifier=KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2) classifier.fit(X_train,y_train) y_pred = classifier.predict(X_test) scores['KNN'] = classifier.score(X_train, y_train) return y_pred #Machine learning
Importing train dataset, Removing few variables from dataset as they won’t be much helpful in prediction. Filling in NaN data with mean and most repeated values.
dataset_original_train = pd.read_csv('../input/train.csv',header=0) colNamesToDrop=['PassengerId','Name','Ticket','Cabin'] dataset_train = dataset_original_train.drop(colNamesToDrop,axis=1) dataset_train["Embarked"] = dataset_train["Embarked"].fillna("S") dataset_train["Age"].fillna(dataset_train["Age"].median(), inplace=True)
Importing test dataset,, we can dorpout or fill any missing value from whole dataset.
dataset_original_test = pd.read_csv('../input/test.csv',header=0) dataset_test = dataset_original_test.drop(colNamesToDrop,axis=1) dataset_test["Age"].fillna(dataset_test["Age"].median(), inplace=True) dataset_test["Fare"].fillna(dataset_test["Fare"].median(), inplace=True) dataset_test["Embarked"] = dataset_test["Embarked"].fillna("S")
we can Convert DataFrame to Array and also we can get independent variable X_train, Y_train, X_test,Y_test data from our dataset.
X_train = dataset_train.iloc[:,1:8].values y_train = dataset_train.iloc[:, 0].values X_test = dataset_test.iloc[:].values
Encoding categorical data of both train and test dataset , which convert the label value or string value into numeric value to perfromed evaluation.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder_X = LabelEncoder() X_train[:, (1)] = labelencoder_X.fit_transform(X_train[:, (1)]) X_train[:, (6)] = labelencoder_X.fit_transform(X_train[:, (6)]) X_test[:, (1)] = labelencoder_X.fit_transform(X_test[:, (1)]) X_test[:, (6)] = labelencoder_X.fit_transform(X_test[:, (6)]) onehotencoder = OneHotEncoder(categorical_features = [6]) X_train = onehotencoder.fit_transform(X_train).toarray() X_test = onehotencoder.fit_transform(X_test).toarray()
Avoid Dummy variable trap Peforming Feature Scaling on both test and train datasets
X_train=X_train[:,1:] X_test=X_test[:,1:] from sklearn.preprocessing import StandardScaler sc_X = StandardScaler() X_train = sc_X.fit_transform(X_train) X_test = sc_X.transform(X_test)
when your preprocessed data is completely prepared , then call the above method to perform training and testing. Here all the classifier are called by their function name and give predicted result against each classifier.
Linear_ypred = LinearRegressionModel(X_train,y_train,X_test) kSVM_ypred = KenelSVMModel(X_train,y_train,X_test) RF_ypred = RFModel(X_train,y_train,X_test) NB_ypred = NaiveBayesModel(X_train,y_train,X_test) KNN_ypred = KNNModel(X_train,y_train,X_test) df=pd.DataFrame([[key,value] for key,value in scores.items()],columns=["Algorithm","Score"]) sns.factorplot(x="Algorithm", y="Score", hue="Algorithm", data=df, kind="bar");
Even though i see RF model is having 97% of score- I feel that, it is overfitting.
Hence i would like to consider SVM model.
result = pd.DataFrame({ "PassengerId": dataset_original_test["PassengerId"], "Survived": kSVM_ypred }) result.to_csv('submission.csv', index=False) #Machine learning
The above algorithm is best algorithm and frequently used in many application.
KNN which decide on the based of k-nearest neighbor , if k value keep 2 then it check the neighbor of two and make clustering ,Clustering in machine learning is to make cluster on the basis of similarities, if we set it 3 then it check for neighbor 3 and so on.. Similarly in Decision Tree , it make a tree then on the basis of some condition they move from root toward leaves to make decision.
Random Forest is based on multiple combination of decision tree and make decision more accurately as compare to other classifier.
Further more read::: XAI, BLOCKCHAIN TECHNOLOGY, IMPORTANCE OF CODING, TRANSFER LEARNING