ENSEMBLING TECHNIQUES

Ex No                                    Implement Ensembling Techniques

 CLICK HERE : PROGRAM

Aim

 To write a python program to implement the ensembling techniques.

Algorithm

1.     Import the necessary modules and packages

2.     Load the dataset

3.     Load the models(SVM, Random Forest, Decision tree)

4.     Combine the models and train them using dataset

5.     Predict the category of the new data point.


DETAIL VIEW:

This code trains and evaluates various machine learning classifiers, including individual base classifiers and ensemble techniques, on the Digits dataset from sklearn. Here's a breakdown of what's happening at each step:

1. Loading the Data

digits = load_digits()
X_train, X_test, y_train, y_test...
(digits.data, digits.target,...
  • The Digits dataset is loaded, which contains images of handwritten digits (0-9).
  • train_test_split splits the dataset into a training set (80% of the data) and a test set (20% of the data). This is done using X_train, X_test for features and y_train, y_test for labels.

2. Defining the Base Classifiers

knn = KNeighborsClassifier()
dtc = DecisionTreeClassifier()
lr = LogisticRegression()
  • KNeighborsClassifier (KNN), DecisionTreeClassifier (DTC), and LogisticRegression (LR) are defined as base classifiers. These will be used both individually and as components of ensemble methods.

3. Defining the Ensemble Classifiers

bagging = BaggingClassifier(estimator=dtc, n_estimators=10, random_state=42)
boosting = AdaBoostClassifier(estimator=dtc, n_estimators=10, random_state=42)
stacking = StackingClassifier(estimators=[('knn', knn), 
('dtc', dtc), ('lr', lr)], final_estimator=lr)

voting_hard = VotingClassifier(estimator..

voting_soft = VotingClassifier(estimata....
  • Stacking: The outputs of multiple base classifiers (knn, dtc, and lr) are combined using a final estimator (here, lr). It is a more complex ensemble method, where predictions from multiple classifiers are used as inputs to another model.
  • Voting: This combines multiple classifiers using either hard voting (majority vote) or soft voting (averaging the predicted probabilities). In hard voting, the final prediction is the one that appears most frequently, while in soft voting, the final prediction is based on the average probabilities.

4. Training the Models

knn.fit(X_train, y_train)
dtc.fit(X_train, y_train)
lr.fit(X_train, y_train)
bagging.fit(X_train, y_train)
boosting.fit(X_train, y_train)
stacking.fit(X_train, y_train)
voting_hard.fit(X_train, y_train)
voting_soft.fit(X_train, y_train)
  • Each model is trained using the training data (X_train, y_train).

5. Making Predictions

knn_pred = knn.predict(X_test)
dtc_pred = dtc.predict(X_test)
lr_pred = lr.predict(X_test)
bagging_pred = bagging.predict(X_test)
boosting_pred = boosting.predict(X_test)
stacking_pred = stacking.predict(X_test)
voting_hard_pred = voting_hard.predict(X_test)
voting_soft_pred = voting_soft.predict(X_test)
  • Predictions are made on the test data (X_test) for each model.

6. Calculating the Average and Weighted Average Predictions

average_pred = np.mean([knn_pred, dtc_pred, lr_pred], axis=0).astype(int)
weighted_average_pred = np.average([knn_pred, dtc_pred, lr_pred], axis=0, weights=weights)
.astype(int)
  • Average Prediction: The predictions of knn, dtc, and lr are averaged. Since the predictions are discrete classes (not continuous values), the result is rounded to the nearest integer using .astype(int).
  • Weighted Average Prediction: This is similar, but the predictions of each model are weighted by a predefined set of weights (weights = [0.3, 0.3, 0.4]). This gives more importance to one model (in this case, lr), as it's assigned a higher weight.

7. Calculating Accuracy Scores

print("KNN accuracy:", accuracy_score(y_test, knn_pred))
print("Decision tree accuracy:", accuracy_score(y_test, dtc_pred))
print("Logistic regression accuracy:", accuracy_score(y_test, lr_pred))

print("\nNORMAL ENSEMBLE TECHNIQUES:")
print("Hard voting accuracy:", accuracy_score(y_test, voting_hard_pred))
print("Soft voting accuracy:", accuracy_score(y_test, voting_soft_pred))

print("Average accuracy:", accuracy_score(y_test, average_pred))
print("Weighted average accuracy:", accuracy_score(y_test, weighted_average_pred))

print("\nADVANCED ENSEMBLE TECHNIQUES:")
print("Bagging accuracy:", accuracy_score(y_test, bagging_pred))
print("Boosting accuracy:", accuracy_score(y_test, boosting_pred))
print("Stacking accuracy:", accuracy_score(y_test, stacking_pred))
  • The accuracy of each model's predictions is calculated using the accuracy_score() function, comparing the predicted labels (knn_pred, dtc_pred, etc.) with the true labels (y_test).
  • The accuracy for individual base classifiers is printed, followed by the accuracy of normal ensemble techniques (hard and soft voting).
  • The average accuracy scores of predictions from base classifiers (average_pred, weighted_average_pred) are also displayed.
  • Finally, the accuracy for advanced ensemble techniques (bagging, boosting, and stacking) is printed.

Output Example:

KNN accuracy: 0.9833333333333333
Decision tree accuracy: 0.9444444444444444
Logistic regression accuracy: 0.9777777777777777

NORMAL ENSEMBLE TECHNIQUES:
Hard voting accuracy: 0.9777777777777777
Soft voting accuracy: 0.9777777777777777
Average accuracy: 0.9611111111111111
Weighted average accuracy: 0.9777777777777777

ADVANCED ENSEMBLE TECHNIQUES:
Bagging accuracy: 0.9861111111111112
Boosting accuracy: 0.9861111111111112
Stacking accuracy: 0.9888888888888889

Conclusion:

  • Individual Classifiers: This section shows the performance of the base classifiers (KNN, Decision Tree, and Logistic Regression) on the test data.
  • Ensemble Techniques: It demonstrates how combining classifiers through techniques like voting, bagging, boosting, and stacking can improve the model's performance. In most cases, ensemble techniques perform better than individual classifiers.
  • Averages: The weighted average and simple average predictions combine the individual classifier outputs to further improve performance, especially when certain models perform better on specific types of data.

Summary:

This is a comprehensive code that evaluates both base classifiers and several ensemble methods on a standard classification dataset, giving insights into how combining classifiers can improve model accuracy.

No comments:

Post a Comment