In this tutorial, I am going to talk about one of the popular traditional machine learning model Random forest python code from scratch with an Example.
Random forest is the combination of N-numbers of decision tree and performed voting to get final best output result. This classifier is extensively used for Classification and Regression problems. It builds decision trees on different samples and take majority vote for classification and average vote for regression process. Now Random Forest is one two primary algorithms when it comes to traditional machine learning. I say traditional machine learning because I want to mention that this is not a neural networks-based approach. This is not a deep learning based approach you are talking about and why is it great well it works amazing if you have sparse data or if you don’t have a lot of training data yeah if you have thousands of images that you have labeled then go ahead and use neural network approach but if you don’t have that if you only have tens of images if we in fact if you even have only a few pixels that you manually paint it as and created labels then random forest does an excellent job and I started off this just you know my talk by saying this is one of the two algorithms the second one actually is support vector machines. I would say random forests even beats support vector machines on for image processing problems at least it did that on most of the images. I tried to segment and I segmented everything from light microscope to electron microscope fifth SEM and even x-ray microscope or micro CT images so based on all these type of images random forests consistently beat support vector machines and don’t even think of other algorithms like naïve bass or other machine learning. I don’t even know why people use those I think they’re there for teaching purposes so people can teach about what naive Bayes you know what the history of machine learning is but if you really want to get your hands dirty random forest is not a bad place to start with ok you can also try support vector machines but let’s understand this first now let’s start understanding by looking at the terminology itself random and forest.
This is the some a collection of trees which is true this is collection of a bunch of decision trees . Random forest is supervised machine learning that means, we need labeled data . Now once all of these possible branches in our decision tree end into these leaf nodes. when you invoke this random forest algorithm into Python this is exactly what it is using to calculate these splits okay now if life is great with the decision tree why are we even talking about random forests the main disadvantage of decision tree is it suffers from overfitting.
import pandas as pd
from matplotlib import pyplot as plt
#Drop the irrelevant columns
df.drop(['Image An'], axis=1, inplace=True)
df.drop(['user'], axis=1, inplace=True)
#handling missing values
#Convert non-numeric data to numeric
#Define dependent variable
#Define Indepenent variable
#Split data into train and test datset
from sklearn.model_selection import train_test_split
X_train, X_test,Y_train, Y_test=train_test_split(X,Y, test_size=0.3,random_state=20)
#Random forest model
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
print ("Accuracy = ", metrics.accuracy_score(Y_test, prediction_test))
#Test accuracy for various test sizes and see how it gets better with more training data
#One amazing feature of Random forest is that it provides us info on feature importances
# Get numerical feature importances
#importances = list(model.feature_importances_)
#Let us print them into a nice format.
feature_list = list(X.columns)
feature_imp = pd.Series(model.feature_importances_,index=feature_list).sort_values(ascending=False)