Ex
No Implement
Ensembling Techniques
Aim
To write a python program to implement the
ensembling techniques.
Algorithm
1. Import the necessary modules and packages
2. Load the dataset
3. Load the models(SVM, Random Forest, Decision tree)
4. Combine the models and train them using dataset
5. Predict the category of the new data point.
DETAIL VIEW:
This code trains and evaluates various machine learning classifiers, including individual base classifiers and ensemble techniques, on the Digits dataset from sklearn
. Here's a breakdown of what's happening at each step:
1. Loading the Data
digits = load_digits()
X_train, X_test, y_train, y_test...
(digits.data, digits.target,...
- The Digits dataset is loaded, which contains images of handwritten digits (0-9).
train_test_split
splits the dataset into a training set (80% of the data) and a test set (20% of the data). This is done usingX_train
,X_test
for features andy_train
,y_test
for labels.
2. Defining the Base Classifiers
knn = KNeighborsClassifier()
dtc = DecisionTreeClassifier()
lr = LogisticRegression()
- KNeighborsClassifier (KNN), DecisionTreeClassifier (DTC), and LogisticRegression (LR) are defined as base classifiers. These will be used both individually and as components of ensemble methods.
3. Defining the Ensemble Classifiers
bagging = BaggingClassifier(estimator=dtc, n_estimators=10, random_state=42)
boosting = AdaBoostClassifier(estimator=dtc, n_estimators=10, random_state=42)
stacking = StackingClassifier(estimators=[('knn', knn),
('dtc', dtc), ('lr', lr)],
final_estimator=lr)
voting_hard = VotingClassifier(estimator..
voting_soft = VotingClassifier(estimat
a....
- Stacking: The outputs of multiple base classifiers (
knn
,dtc
, andlr
) are combined using a final estimator (here,lr
). It is a more complex ensemble method, where predictions from multiple classifiers are used as inputs to another model. - Voting: This combines multiple classifiers using either hard voting (majority vote) or soft voting (averaging the predicted probabilities). In hard voting, the final prediction is the one that appears most frequently, while in soft voting, the final prediction is based on the average probabilities.
4. Training the Models
knn.fit(X_train, y_train)
dtc.fit(X_train, y_train)
lr.fit(X_train, y_train)
bagging.fit(X_train, y_train)
boosting.fit(X_train, y_train)
stacking.fit(X_train, y_train)
voting_hard.fit(X_train, y_train)
voting_soft.fit(X_train, y_train)
- Each model is trained using the training data (
X_train
,y_train
).
5. Making Predictions
knn_pred = knn.predict(X_test)
dtc_pred = dtc.predict(X_test)
lr_pred = lr.predict(X_test)
bagging_pred = bagging.predict(X_test)
boosting_pred = boosting.predict(X_test)
stacking_pred = stacking.predict(X_test)
voting_hard_pred = voting_hard.predict(X_test)
voting_soft_pred = voting_soft.predict(X_test)
- Predictions are made on the test data (
X_test
) for each model.
6. Calculating the Average and Weighted Average Predictions
average_pred = np.mean([knn_pred, dtc_pred, lr_pred], axis=0).astype(int)
weighted_average_pred = np.average([knn_pred, dtc_pred, lr_pred], axis=0, weights=weights)
.astype(int)
- Average Prediction: The predictions of
knn
,dtc
, andlr
are averaged. Since the predictions are discrete classes (not continuous values), the result is rounded to the nearest integer using.astype(int)
. - Weighted Average Prediction: This is similar, but the predictions of each model are weighted by a predefined set of weights (
weights = [0.3, 0.3, 0.4]
). This gives more importance to one model (in this case,lr
), as it's assigned a higher weight.
7. Calculating Accuracy Scores
print("KNN accuracy:", accuracy_score(y_test, knn_pred))
print("Decision tree accuracy:", accuracy_score(y_test, dtc_pred))
print("Logistic regression accuracy:", accuracy_score(y_test, lr_pred))
print("\nNORMAL ENSEMBLE TECHNIQUES:")
print("Hard voting accuracy:", accuracy_score(y_test, voting_hard_pred))
print("Soft voting accuracy:", accuracy_score(y_test, voting_soft_pred))
print("Average accuracy:", accuracy_score(y_test, average_pred))
print("Weighted average accuracy:", accuracy_score(y_test, weighted_average_pred))
print("\nADVANCED ENSEMBLE TECHNIQUES:")
print("Bagging accuracy:", accuracy_score(y_test, bagging_pred))
print("Boosting accuracy:", accuracy_score(y_test, boosting_pred))
print("Stacking accuracy:", accuracy_score(y_test, stacking_pred))
- The accuracy of each model's predictions is calculated using the
accuracy_score()
function, comparing the predicted labels (knn_pred
,dtc_pred
, etc.) with the true labels (y_test
). - The accuracy for individual base classifiers is printed, followed by the accuracy of normal ensemble techniques (hard and soft voting).
- The average accuracy scores of predictions from base classifiers (
average_pred
,weighted_average_pred
) are also displayed. - Finally, the accuracy for advanced ensemble techniques (bagging, boosting, and stacking) is printed.
Output Example:
KNN accuracy: 0.9833333333333333
Decision tree accuracy: 0.9444444444444444
Logistic regression accuracy: 0.9777777777777777
NORMAL ENSEMBLE TECHNIQUES:
Hard voting accuracy: 0.9777777777777777
Soft voting accuracy: 0.9777777777777777
Average accuracy: 0.9611111111111111
Weighted average accuracy: 0.9777777777777777
ADVANCED ENSEMBLE TECHNIQUES:
Bagging accuracy: 0.9861111111111112
Boosting accuracy: 0.9861111111111112
Stacking accuracy: 0.9888888888888889
Conclusion:
- Individual Classifiers: This section shows the performance of the base classifiers (
KNN
,Decision Tree
, andLogistic Regression
) on the test data. - Ensemble Techniques: It demonstrates how combining classifiers through techniques like voting, bagging, boosting, and stacking can improve the model's performance. In most cases, ensemble techniques perform better than individual classifiers.
- Averages: The weighted average and simple average predictions combine the individual classifier outputs to further improve performance, especially when certain models perform better on specific types of data.
Summary:
This is a comprehensive code that evaluates both base classifiers and several ensemble methods on a standard classification dataset, giving insights into how combining classifiers can improve model accuracy.
No comments:
Post a Comment