DECISION TREE MODULE - DT

Ex No 6                                   Build Decision tree

  CLICK HERE : PROGRAM

Aim

To build decision trees and random forests

Algorithm

Build decision tree

1.     Import necessary packages and libraries

2.     Load the dataset

3.     Load the algorithm decision tree and train the algorithm using the dataset

4.     Predict the category of new data

Print the graph for the decision tree

EXPLANATION:

Explanation of the Code:

This Python script uses the Iris dataset and a Decision Tree Classifier to predict the species of a flower given certain attributes. It also generates and visualizes the decision tree. Below is a breakdown of each section of the code:


1. Loading the Iris Dataset

iris = load_iris()
X, y = iris.data, iris.target
targets = iris.target_names
  • load_iris(): This function from sklearn.datasets loads the famous Iris dataset. It contains 150 samples of iris flowers, each with four features:
    • Sepal length
    • Sepal width
    • Petal length
    • Petal width
    • These features are stored in iris.data (a 2D NumPy array).
  • iris.target: This contains the target values (species of iris) corresponding to each sample. The species are encoded as integers: 0 = setosa, 1 = versicolor, 2 = virginica.
  • iris.target_names: This is an array of species names corresponding to the target values.

2. Training the Decision Tree Classifier

clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
  • tree.DecisionTreeClassifier(): This initializes the Decision Tree classifier from sklearn.tree. By default, it uses the Gini impurity criterion for splitting.
  • clf.fit(X, y): This trains the Decision Tree classifier on the features (X) and the target labels (y). After this line, the classifier has learned how to predict the species based on the attributes of the flowers.

3. Predicting for a New Sample

X_pred = [6.7, 3.0, 5.2, 2.3]  # A new flower sample
y_pred = clf.predict([X_pred])
print("Prediction is: {}".format(targets[y_pred]))
  • X_pred: This is a new flower sample represented by a list of four values: [6.7, 3.0, 5.2, 2.3], which correspond to the sepal length, sepal width, petal length, and petal width of a new iris flower.
  • clf.predict([X_pred]): The classifier uses the learned decision tree to predict the species of the flower given the input sample X_pred.
  • print("Prediction is: {}".format(targets[y_pred])): This line retrieves the name of the predicted species by indexing the targets array with the predicted label (y_pred). It prints the species' name corresponding to the predicted class.

4. Exporting and Visualizing the Decision Tree

dot_data = tree.export_graphviz(
    clf, 
    out_file=None,
    feature_names=iris.feature_names, 
    class_names=iris.target_names,
    filled=True, 
    rounded=True, 
    special_characters=True
)
graph = graphviz.Source(dot_data)
graph
  • tree.export_graphviz(...): This function converts the decision tree into a DOT format string, which is used for creating visualizations with Graphviz. Here's what each parameter means:

    • clf: The trained decision tree model.
    • out_file=None: Specifies not to write the output to a file (we'll render it directly).
    • feature_names=iris.feature_names: Uses the feature names from the Iris dataset (['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']) for the labels of the nodes.
    • class_names=iris.target_names: Uses the species names (['setosa', 'versicolor', 'virginica']) for the labels of the classes.
    • filled=True: This fills the nodes with color based on the majority class in the node (makes the tree more visually intuitive).
    • rounded=True: Gives the nodes rounded corners.
    • special_characters=True: Ensures that special characters (like <=) are displayed properly.
  • graphviz.Source(dot_data): This renders the decision tree using the DOT string generated earlier. The decision tree is visualized in an interactive format where you can see the features used at each node, the thresholds for splits, and the class labels at the leaves.

    The graph object is an interactive representation of the decision tree. It will be displayed in a Jupyter notebook or in any environment that supports rendering Graphviz visuals.


What will happen when this code runs?

  1. Prediction Output: For the sample input [6.7, 3.0, 5.2, 2.3], the classifier will predict the species (e.g., "versicolor", "virginica", etc.) based on the decision tree it has learned.

    Example output might be:

    Prediction is: ['versicolor']
    
  2. Decision Tree Visualization: The decision tree will be displayed, showing how the model makes decisions at each node:

    • The feature at each node (e.g., petal length, sepal width).
    • The threshold (e.g., if petal length <= 2.45).
    • The class at the leaf node (the predicted species).

    The tree will provide a detailed look into how the model splits the data based on features and the criteria it used for each split.


Conclusion:

This script trains a Decision Tree classifier on the Iris dataset, makes predictions for new data, and visualizes the decision tree structure. The decision tree provides an interpretable way to understand how the model is making decisions based on the features of the dataset.

If you have any further questions or need more details on any part of the process, feel free to ask!


No comments:

Post a Comment