ML Detailed Guide to Using Sklearn Vote Classifiers

A voting classifier is a machine learning model that is trained on the aggregate of many models and predicts the output (class) based on its highest probability of having a selected category as an output.

It simply aggregates the results of each classifier passed to the Voting Classifier and outputs the category based on the highest voting prediction. The idea is not to create separate ad hoc models and look for accuracy for each, but to create a model to train those models and predict the output based on their total voting majority for each output class.

The vote classifier supports two types of votes.

  1. Hard voting: In hard voting, the predicted output category is the one with the highest voting majority, i.e., the one with the highest probability of prediction by each classifier. Suppose three classifiers predict the output category (A, A, B), so most people predict one as the output. So one will be the final prediction.
  2. Soft Vote: In a soft vote, the output category is a prediction based on the average of the probabilities assigned to that category. Suppose the three models are given some inputs, and the prediction probability for the class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So the average of class A is 0.4333 and B is 0.3067, and the winner is clearly the top-notch one because it has the highest probability that each classifier averages.

Note:

Make sure to include multiple models for use by the voting classifier to ensure that errors generated by one model can be resolved by another.

Code: Python code that implements the voting classifier

# importing libraries
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
  
# loading iris dataset
iris = load_iris()
X = iris.data[:, : 4 ]
Y = iris.target
  
# train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20 , random_state = 42 )
  
# group /ensemble of models
estimator = []
estimator.append(( 'LR' , LogisticRegression(solver = 'lbfgs' , multi_class = 'multinomial' , max_iter = 200 )))
estimator.append(( 'SVC' , SVC(gamma = 'auto' , probability = True )))
estimator.append(( 'DTC' , DecisionTreeClassifier()))
  
# Voting Classifier with hard voting
vot_hard = VotingClassifier(estimators = estimator, voting = 'hard' )
vot_hard.fit(X_train, y_train)
y_pred = vot_hard.predict(X_test)
  
# using accuracy_score metric to predict accuracy
score = accuracy_score(y_test, y_pred)
print ( "Hard Voting Score % d" % score)
  
# Voting Classifier with soft voting
vot_soft = VotingClassifier(estimators = estimator, voting = 'soft' )
vot_soft.fit(X_train, y_train)
y_pred = vot_soft.predict(X_test)
  
# using accuracy_score
score = accuracy_score(y_test, y_pred)
print ( "Soft Voting Score % d" % score)

Output:

Hard Voting Score 1
Soft Voting Score 1

Example:

Input  :4.7, 3.2, 1.3, 0.2 
Output :Iris Setosa

Actually, for soft polls, the output precision will be higher because it is the average probability of all estimator combinations. As for the basic iris dataset that we have overfitted, so the output will not be much different.

First, your interview preparation enhances your data structure concepts with the Python DS course.