How to Implement Decision Trees in Python (Train, Test, Evaluate, Explain)

Learn how to implement Decision Trees in Python using sci-kit-learn. Explore essential parameters, tree pruning, feature importance, and performance metrics for building accurate and interpretable models. Perfect for machine learning beginners.

Introduction to Decision Trees in Python with Scikit-Learn

What You Will Learn in This Tutorial

Hey! In the last article, we learned the theoretical details of decision trees. So how do they learn? How they are trained?

Recap: Theoretical Basics of Decision Trees

In this article, I will show you how to implement them on sci-kit-learn using Python. We will look into some of the parameters that you can use. We will look into some very helpful functions that it has built-in. Generally, we will try to get a better understanding of how you can start using them today on a Python or on a Jupiter notebook.

Setting Up: Importing the Dataset

Using Built-in Datasets in Scikit-Learn

We have a bunch of things to look into, but the first thing that we need to do is import a data set for this one. I'm using a built-in data set from Scikit-learn. You can import it like this. Basically, if you want other data sets, of course, go to scikit-learn data sets. Just google it and then you'll find a list of data sets that are built-in in sci-kit-learn, that you can easily import, this one is called the breast cancer data set.

Overview of the Breast Cancer Dataset for Classification

It's a classification data set. Basically, these are all the calculations or measurements done on a tumor that was collected or a mass that was collected from a breast. Of course, we are not doctors. But, we don't really have to also you know, we're just using this data set to understand how decision trees can be implemented with scikit-learn.

Preprocessing the Data: Splitting into Training and Testing Sets

So this is just a data set that looks like we have 30 columns each of which is a measurement for a tumor and then as a target variable, we are learning. If one line of one data point or one tumor is benign or malignant. So that means benign means it's not cancerous. I don't really know the medical term there, but I think it's like just not bad for you. Everything's fine, if it's malignant. It means that it has cancer and you need to be treated on a very simple level.

Before we feed this data set to ours, let me make it a little bit closer. Before we trade it to our decision tree, of course, we need to divide it into training and testing data. So just to kind of give you a structure. First, I'm going to show you, how you can train a simple default decision tree? We are going to go into the details of how you can change it? What do you need? Coming back to training and testing separation.

Features (X) and Target (Y): What They Mean in Our Dataset

Inside Scikit-learn, there is a default built-in function to separate training and testing data sets. So you might remember this from your other training of machine. For machine learning, x means just all the features that you want to give to determine if this data point is benign or malignant. This is all the columns that I have in this data set and y is a target value basically and that's going to be either 0 or 1, depending on if it's like good or bad.

This built-in data or this built-in function only needs the x and the y. Basically, as I showed here, the test size means how much of it would you like to set aside for testing? So for training, the training data set means we are giving it all the examples. But we are also giving it the answer. So the model can train and learn.

Whereas for the test one, we're only going to be giving the features for the columns and then it's going to create the predictions itself. So that's how we're going to test to see if our model is performing well or not.

Training a Simple Decision Tree Classifier

How to Import the DecisionTree Classifier in Scikit-Learn

How to do predictions? How to train the decision tree? Basically, very simple you have to import the decision tree classifier or the model itself from the scikit-learn library. You have to create it. Once we start putting in parameters, this is where we're going to put it in and then all you have to say is classifier fit. You give the training values for the x values and the y values and then it creates you a nice little decision tree model, that's all.

Now we're using the default parameter. So if you run this one it gives you all the parameters that are being used in this model. You're going to see that it's the default value. So these are all the default values. Max depth is none, max features is none. We're going to learn more about what these are in a second. If you want to use instead the regressor, it's very simple. Then you only need to import the regressor. How to import these things, when you find the documentation from Google?

Making Predictions with the Decision Tree Model

Predicting on the Test Dataset: A Step-by-Step Guide

So basically, by just writing a sci-kit learn decision tree regression or second learn decision tree classification, you're going to be either the first or the second page to pop up. So this is the scikit-learn documentation. You just need to scroll down and then see these examples. It's going to tell you how to import it here. Basically, you just need to copy and paste this code into your notebook, okay.

Now, we trained our data set. But of course, you want to make some predictions with it. There are two ways of how you can get predictions. The first one is by giving it the x-test data set. So it's basically my data points where I want to or where I haven't told the model. The answer is just to get the answers that can come up. Let's look at what it looks like. So it's something like, my training or my whole data set had 569 rows for the testing.

We set aside nearly 200. If I make a prediction, it's going to give me the class number that it's predicting. So it's going to say for the first one that I am predicting and it's malignant. The second one, I'm predicting that it's benign. Second one, I'm pretty good at sb9 so on and so forth. So this is just a way to see generally, how your model is predicting the new information that it's giving you.

Understanding Prediction Probabilities with `predict_proba`

Another way to see predictions is using predict proba which will give you the probability for each of the classes. So let's see. It says for the first instance. So the first data point is this one, I am predicting that it's going to be class 0 with a 1 out of 1 chance and class 1 with a 0 out of 1 chance. So there is a reason that all our probabilities are 0 or 1. Because we did not have any early stopping criteria. So our tree just grew and grew and grew and grew all the way until there was no other way to split.

If we put very simple stopping criteria like, let's say max that can be four. So the maximum depth that the tree can have three or can have is four. So you can see here and let me remove this one and then we do predictions again. The predictions also are still going to look the same, because it's going to give us the prediction. So it's going to tell us either zero or one based on which one has the higher prob probability if you look at the probabilities.

Now, we're going to see that they're a little bit different. So now it's kind of less sure which one it should be because we stopped the tree before it was able to grow all the way where the leaves are going to be pure of one class. So if this doesn't make sense to you, you should go back and read the first article. I think then it's going to make more sense to you what I mean here, all right.

Evaluating Model Performance: Accuracy, Confusion Matrix, and Precision

Using Scikit-Learn’s Built-in Accuracy Metric

We have the probabilities, and we have the predictions for each of the test instances. But how are we going to see, if this is good or not. We have to compare it to the actual information. How are we going to do that is by using performance metrics. Most of the performance metrics that you're going to need or use are already going to be built-in and scikit-learn. So first one is accuracy.

Generating a Confusion Matrix

For example, it says this model has an accuracy of 0.92 or 93. If you want to see it, you can find the confusion matrix. So this is you know, if the class is zero and when the class is zero and it's predicted as zero that happens 69 times. When the class is one and predicted as one, this happened 105 times and these are the wrongly classified instances.

Understanding Precision, Recall, and F1 Score

Another one I can see is the precision score. For example, so this is basically precision. If you want to see recall, so let's see, okay. You know I'll just show you how I find these things. So I could learn recall and this is the first thing that pops up and then I can go look at the examples. It says I need this one to calculate recall and my true values are called y test and predictions are called predictions. Then I get my recall score too.

There is also a nice function that they have here, it's called classification report. I think then you can basically see precision and recall and f1 score and everything together. There also might be a regression report. Similarly, you can find here the Mac maker macro average weighted average, etc. So there are some like precision and recall separately for malignant and benign. I will not go into details of what these things are in this article. Because you know this is about decision trees.

I can make a separate article about that later, but let me know if you would like to learn about it. So before we go into future importance and other things, I want to show you some of the parameters that decision trees have. So let's go back to our model. So it was we're only using max steps 4 right now. So let me pull up a list of the stopping criteria. So these are all the settings.

Fine-Tuning Your Decision Tree: Important Parameters to Know

Adjusting the Depth of the Tree with `max_depth`

Let's say that you can change to stop the tree from growing all the way to its maximum passable length and maximum depth. Tell me, how deep can the tree be? When there is one node, so there is one decision that's the depth of one. So when you make the decision or when there's only one node, then that's a depth of one.

When you have a decision node there, then you make a split tree. Your tree has a depth of two and then those nodes split and then you get a decision tree of that depth of three. So that goes further and further. Sometimes decision trees can grow to be very long. So if you wanted something, these things are not really things that you can know.

Beforehand you cannot really say, "Oh yeah, I want my max step to be three". Because I know that's going to give me the best results, No most of the time what you do is you try different values for this and then you see which one works best. So max step basically gives you the depth of the tree.

Exploring Stopping Criteria: `min_samples_split`, `min_samples_leaf`, and More

There are some other ones here. As you can see these are all the stopping criteria. How do they change? When you stop the tree? You can find them all here. These are all the parameters that the tree has you can go and read about them. Here, they have different ways of stopping the tree's growth.

Let's say the training process. I think if you have more than one stopping criteria set up, it's going to just stop with the first one that it reaches. So just you know go ahead and learn more about them and then try it. You'll see the difference that it creates in your performance or the creation of the performance of the tree.

Criterion for Splitting: Gini vs Entropy

Another thing that's important for us, is the stopping criteria which is the approach of the decision tree. So here are some of the approaches or some of the settings that you can change for the approach. The first one is the criterion. What is a criterion? So in the previous article, I talked about if you remember that, there are two different algorithms. You can use the cart algorithm and id3 algorithm and they are using different metrics of which feature they should use to split the data set.

Criterion basically depends on or determines that one is the default. The default one is ginny, but you can also use entropy again. If you don't know what these things are and if you're curious, go back to the first article. Then read it. The article about decision tree theory. You will know what I'm talking about here.

Randomization in Decision Trees: Splitter, Max Features, and Random State

How the Splitter Parameter Affects the Tree’s Decisions

So this is the parameter that will help you determine, how to grow the tree or what to use to make the splits? The second thing that we can use to determine the approach is to grow the decision tree or train the decision tree splitter.

Basically, you have two options. You either choose the best one based on these two criteria or you can just choose it randomly. That's also an option. You can also say you know what I want to go crazy. I want to just choose it in a random way which feature to split on. You can do that that's also a possibility.

Limiting the Number of Features with `max_features`

Another important one to know is max features. So in the decision tree, we have or let me show you here we have 30 columns. You can say whenever you want to make a decision every time. You want to make a split only use 20 of them and then the decision tree will decide which 20 randomly. Then it will compare their entropy to each other or their information gained to each other and select the best one.

So maybe the actual best one is outside of that group of 20. But still, it will choose only the one the best one in that group of 20. So that's an option that's available to you. If you want, you can choose the not the max depth. But the maximum number of features is less than the amount of columns that you have in the total amount of columns.

Ensuring Consistent Results with `random_state`

There will be some more randomization involved. A random state is basically helping this randomization when you choose less than the amount of columns that you have. Then there needs to be some randomization. If you give an integer for the random state, then it's going to be creating the same results from this randomization every single time. But if you don't give anything to the random state, it's going to be super random every time.

If I add a parameter to be like random state five, then I'm going to get the same tree over and over again. Even, when I have some randomization involved. I think this was clear. These are all the things that you can do to change the approach of the tree.

Handling Imbalanced Classes with Class Weights

Why Class Weights Matter in Classification Tasks

There is one other thing that's important to know and that is this one class weight. So why did I run this? It's not like gonna run class weight. This is specific to classification and this doesn't exist in regression you know. So when you're doing classification, you're going to have one two, or three, or maybe more classes that your model is going to try to predict.

How to Set Class Weights in Scikit-Learn Decision Trees

Sometimes one of those classes might be a little bit more important than the other ones and with class weight, you can determine this in the model. You're basically going to say, hey! It's more important or it's worse, when you make an error predicting class one. Then when you make a mistake, try to predict class two. So then it's going to take it into consideration into how it grows or into its genie index or entropy.

These are all the relevant parameters. The next thing that we can look at is the future importance. So with decision trees as I said in the previous article. I think I mentioned it one of the best things is that it's interpreted. Interpretable very hard word to say, so you can actually understand how the decision tree is deciding.

Interpreting Decision Trees: Feature Importance and Visualization

Visualizing the Importance of Features

One of the things that comes with this is, that you can actually learn or you can see which features are more important than the others. So let's go here, I'm getting all my features. These are all the features that I have and it's very simple. You can just say classification future importance.

Well, let me show you what this creates? Anyways, in the first place and then it will give me a list corresponding to this list of how important that corresponding feature is to determine the result of your prediction.

Plotting the Decision Tree Structure

So when I put it into a data frame and everything I can see future importance like this. If you want this to be very common to do, you can make it into a plot. Let's see, you can basically see it in a plot and then it's easier to kind of understand, okay. Verse perimeter is one of the best things that determines if a tumor is benign or malignant. So this is very important to really understand your model. This is not really possible with most other machine learning algorithms. So this is a really big plus for decision trees.

Another way to see how your data set is? Or how your model is working to create a plot of the tree? This is very simple. You are basically creating this tree of like an actual tree of how it decides. So for example, you know when you get one line and one data point. It says worst perimeter is it bigger than 110.25 or greater than or lower than that one. If it's lower than that one, go here.

If it's greater than that one goes here and then another decision point and another decision point. So as far as the depth goes, as you can see this is the first step. Second, this is the third level and Fourth level. This whole thing is in the fifth level. Now that actually counted it. I said the next step to be four, so probably this starts from zero and then one, two, three, and four.

We can actually see how it changes. When we changed the parameters, we said the max steps should be 4. So let's change that to like not before. Let's just have the default tree. Let's have it grow as much as possible and then, let's look at how the tree looks, okay. So this is a much busier tree. Maybe the depth doesn't even go further, but probably this is then enough you know for the tree to grow. But then it's kind of like a more dense tree.

Pruning Decision Trees: Reducing Overfitting

Understanding Minimal Cost Complexity Pruning (`ccp_alpha`)

One other important thing that I nearly forgot to mention is pruning for decision trees. How it's an important way or how it's a very used way of making sure that the tree is not overfitting to your data set? How do you do that with scikit-learn? With a parameter actually. So this parameter is called ccp alpha.

We can go and see the definition here. It's basically that there is an algorithm on top of the decision tree algorithm to prune it and it's called minimal cost complexity pruning. I'm not going to go into details, but if you want to read more about it, there here's a link for it to describe the details and the general math that is behind it. When you give it a value that's bigger than zero, the default value is zero. When you give it a value bigger than zero, then it's going to prune your tree.

Conclusion: Key Takeaways on Using Decision Trees in Scikit-Learn

Recap of the Steps to Implement a Decision Tree

We saw that our tree was a little bit big. So if I give it this value, I expect the tree to be a little bit pruned. So this is what we have right now and when the tree is a bit pruned. Then as you can see that value is apparently even already too big.

So then I have a very nice and simple tree that was pruned. Let's see, how the accuracy changed. I'm curious. I didn't even know that I had to do the prediction again. Let's see now, okay.

Final Thoughts on Fine-Tuning and Evaluating Your Model

It looks like the accuracy increased, you know that's perfect. I guess our tree was overfitting, maybe to our decision or to our data set. I think that's all we have at decision trees. So just you know, it's very simple as I said. Just import them.

Import one of the sets that they have built into their system and just play around with it. Change some of the settings and then see how it changes your accuracy. But, I hope this was helpful. I hope you learned something at least and thanks for reading. I'll see you around.

How to Implement Decision Trees in Python (Train, Test, Evaluate, Explain) - icoversai

Table of Contents: