Confusion Matrix python plot
CM Confusion Matrix python plot:
Classification refers to the process of classifying a set of data into classes. In Machine Learning (ML), you create the problem, clean and add feature variables, train the model, measure it’s performance, and improve it with a cost function. How can we evaluate its performance? What are the most important features to be aware of? An easy and general answer is to compare actual and predicted values. However, this does not resolve the problem.
Let’s take a look at the MNIST dataset to try to understand the problem. Confusion matrix python plot
# Importing the dataset. from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784', version=1) # Creating independent and dependent variables. y = mnist['data'], mnist['target'] Read more :: Machine learning algorithm # Splitting the data into training set and test set. X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:] # Training a binary classifier. #Confusion matrix python plot y_train_5 = (y_train == 5) # True for all 5s, False for all other digits. y_test_5 = (y_test == 5)
#Building a dumb classifier that just classifies every single image in the “not-5” class #Confusion matrix python plot.
from sklearn.model_selection import cross_val_score from sklearn.base import Base Estimator class Never5Classifier(Base Estimator): def fit(self, X, y=None): def predict(self, X): return np.zeros((len(X), 1), dtype=bool) never_5_clf = Never5Classifier() cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")
You would see an array of accuracy if you ran the same code in an IDE. Each one would have a higher than 90% accuracy. This is because about 10% of images are 5s. If you run the same code on an IDE, you’d get an array of accuracies each with above 90% accuracy.
Confusion Matrix python plot:
The confusion matrix is a better way to assess the classifier’s performance. It is important to count how many instances of class A have been classified as class B. To find out the number of times classifiers confused images of 5s and 3s, for example, look in the 5th row or 3rd column of this confusion matrix. s
# Creating some predictions.
from sklearn.model_selection import cross_val_predict y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
Predictions can be made on the test set. However, you should only use the test sets at the end
of your project once you have a classifier ready for launch.
# Constructing the confusion matrix.
from sklearn.model_selection import cross_val_predict y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
A confusion matrix contains a row representing an actual class and a column representing a predicted class. For more info about the confusion matrix click here. The confusion matrix gives you a lot of information, but sometimes you may prefer a more concise metric.
-
Precision
precision = (TP) / (TP+FP)
TP refers to the number true positives and FP the number false positives.
One positive prediction can be made and verified to be correct. This is trivial to achieve perfect precision. This would be inefficient as the classifier would ignore any positive instances. -
Recall
Recall = (TP)/(TP+FN).
# Finding precision and recall
from sklearn.model_selection import cross_val_predict y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
# To compute the F1 score, simply call the f1_score() function:
Confusion matrix python plot
Classifiers with similar recall and precision are preferred by the F1 score. You may not want this, as precision is important in certain contexts and recall is more important in others. If you train a classifier for video detection, it would be preferable to have a classifier reject many good videos (low recall), but keep only the safe videos (high precision). In such cases, it may even be worth adding a human pipeline to verify that the classifier is selecting the right videos. If you train a classifier for shoplifters detection using surveillance images, it’s fine if it only has 30% precision. However, it must have 99% recall. It’s not possible to have it both ways. Increased precision can reduce recall, and vice versa. This is the precision/recall compromise.
Read more :: Machine learning algorithm
Appreciation to my father who shared with me concerning this webpage, this website is actually amazing.