Unpacking the Power of Decision Trees: A Comprehensive Guide

Learn everything you need to know about decision trees, including a Python example

Jan 5, 2022

10 min read

The Decision Tree is a machine learning algorithm that takes its name from its tree-like structure and is used to represent multiple decision stages and the possible response paths. The decision tree provides good results for classification tasks or regression analyses.

What do we use Decision Trees for?

With the help of the tree structure, an attempt is made not only to visualize the various decision levels but also to put them in a certain order. For individual data points, predictions can be made, for example, a classification by arriving at the target value along with the observations in the branches.

The decision trees are used for classifications or regressions depending on the target variable. If the last value of the tree can be mapped to a continuous scale, we speak of a regression tree. On the other hand, if the target variable belongs to a category, we speak of a classification tree.

Due to this simple structure, this type of decision making is very popular and is used in a wide variety of fields:

Business management: Opaque cost structures can be illustrated with the help of a tree structure and make clear which decisions entail how many costs.
Medicine: Decision trees are helpful for patients to find out whether they should seek medical help.
Machine Learning and Artificial Intelligence: In this area, decision trees are used to learn classification or regression tasks and then make predictions.

Structure of a Decision Tree

A tree essentially consists of three major components: Root, Branches, and Nodes. To better understand these components, let’s take a closer look at an example tree that helps us decide whether or not to exercise outside today.

The top node "Weather" is the so-called root node, which is used as the basis for the decision. Decision trees always have exactly one root node, so that the entry point for all decisions is the same. At this node hang the so-called branches with the decision possibilities. In our case, the weather can be either cloudy, sunny, or rainy. Two of the branches ("sunny" and "rainy") hang so-called nodes. At these points, a new decision has to be made. Only the branch "cloudy" leads directly to a result (leaf). So from our tree, we can already read that we should always go outside for sports when the weather is cloudy.

In sunny or rainy weather, on the other hand, we have to consider a second component, depending on our weather result. For the node "Humidity" we can choose between "high" and "normal". If the humidity is high, we end up with the "No" leaf. In sunny weather paired with high humidity, it is therefore not advisable to do sports outside.

If the weather is rainy, we are in another branch within our decision tree. Then we have to make a decision at the node "wind". The decision options here are "strong" or "weak". Again, we can read two rules: If it is raining but the wind is weak, we can do sports outside. If it is raining coupled with strong wind, on the other hand, we should stay at home.

This very simple example can of course be further extended and refined. For the nodes "humidity" and "wind", for example, one could consider replacing the subjective decision options with concrete rules (strong wind = wind speeds > 10 kmh) or subdividing the branches even more finely.

What is the so-called Pruning?

Decision trees can quickly become complex and confusing in real-world use cases since in most scenarios more than two decisions are needed to end up with one result. To prevent this, trained decision trees are often pruned.

Reduced error pruning is a bottom-up algorithm that starts at the leaves and gradually works its way to the root. This involves taking the entire decision tree and leaving out one node including the decisions. Then, a comparison is made to see if the prediction accuracy of the truncated tree has deteriorated. If this is not the case, the tree is shortened by this node and the complexity of the decision tree is reduced.

In addition to the possibility of shortening the tree after the training, there are also methods to keep the complexity low already before or during the training. A popular algorithm for this is the so-called early stopping rule. During training, a decision is made after each created node as to whether the tree is to be continued at this point, i.e. whether it is a decision node, or whether it is a result node. In many cases, the so-called Gini Impurity is used as a criterion.

Simply put, it expresses the probability that a label will be set incorrectly at this node if it is simply assigned randomly, i.e. based on the distribution at this node. The smaller this ratio, the higher the probability that we can prune the tree at this point without having to fear large losses in the accuracy of the model.

Advantages and Disadvantages of Decision Trees

The simple and understandable structure makes the decision tree a popular choice in many use cases. However, the following advantages and disadvantages should be weighed before using this model.

Decision trees as part of Random Forests

Random Forest is a supervised machine learning algorithm that is composed of individual decision trees. Such a type of model is called an ensemble model since an "ensemble" of independent models is used to compute a result. In practice, this algorithm is used for various classification tasks or regression analyses. The advantages are the usually short training time and the traceability of the procedure.

The Random Forest consists of a large number of these decision trees, which work together as a so-called ensemble. Each individual decision tree makes a prediction, such as a classification result, and the forest uses the result supported by most of the decision trees as the prediction of the entire ensemble. Why are multiple decision trees so much better than a single one?

The Random Forest works due to the so-called principle of the wisdom of many. It says that the decision of many decision trees is better than the result of one unique tree. This is a principle which is true for various use cases and was first recognized at fair.

In the 20th century, it was pretty common for oxes to be sold at a fair and their weight had to be determined. In 1906, such an ox was shown to eight hundred different people which were asked to guess the weight of the animal. In the end, the median of eight hundred individual guesses were just about 1 % away from the final weight. There was not a single estimate which came that close to the actual result meaning that the sum of individuals had a better estimate than any other person.

This finding can be translated to other fields such as the Random Forest meaning that several decision trees and their aggregated prediction outperform an individual tree.

However, there is one prerequisite for this to hold. The decision cannot be correlated otherwise the errors of an individual tree would not be compensated by another one. Let’s get back to our fair example.

The median estimate of the weight will only be better than a single guess if all participants did not involve in any kind of agreement meaning that they are uncorrelated. Otherwise, the estimates of several people could influence others which would not result in the wisdom of the many.

Train a Decision Tree in Python

The Skicit-Learn Python module provides a variety of tools needed for data analysis, including the decision tree. Among other things, it is based on the data formats known from Numpy. To create a decision tree in Python, we use the module and the corresponding example from the documentation.

The so-called Iris Dataset is a popular training dataset for creating a classification algorithm. It is an example from biology and deals with the classification of so-called iris plants. About each flower the length and width of the petal and the so-called sepal are available. Based on these four pieces of information, it is then to be learned which of the three iris types this specific flower is.

With the help of Skicit-Learn, a decision tree can be trained in just a few lines of code:

So we can train a decision tree relatively easily by defining the input variable X and the classes Y to be predicted, and training the decision tree from Skicit-Learn on them. With the function "predict_proba" and concrete values, a classification can then be made:

So this flower with the made-up values would belong to the first class according to our Decision Tree. This genus is called "Iris Setosa".

How to interpret Decision Trees?

With the help of MatplotLib, the trained decision tree can be drawn.

The optimal decision tree for our data has a total of five decision levels:

For the simple interpretation of this tree, we are interested in the value in the first and the last line. The tree is read from top to bottom. This means that in the first decision level we check whether the length of the petal is less than or equal to 2.45 cm. The conditions are always formulated so that there is only "True" in the left branch and "False" in the right branch.

So, if a concrete flower has a petal that is less than or equal to 2.45 cm, we are in the left branch (in the orange tile), which is also a result leaf. Thus we know that in this case, the flower belongs to the class "Setosa".

If, on the other hand, the petal is longer, we go along the right branch and are faced with another decision, namely whether the petal has a maximum width of 1.75 cm. We work our way through the tree until we reach a result sheet that provides information about the classification.

This is what you should take with you

Decision trees are another machine learning algorithm that is mainly used for classifications or regressions.
A tree consists of the starting point, the so-called root, the branches representing the decision possibilities, and the nodes with the decision levels.
To reduce the complexity and size of a tree, we apply so-called pruning methods that reduce the number of nodes.
Decision Trees are well suited to vividly represent decision-making and make it explainable.
However, when training, one has to pay attention to many details in order to obtain a meaningful model.

_If you like my work, please subscribe here or check out my website Data Basecamp! Also, medium permits you to read 3 articles per month for free. If you wish to have unlimited access to my articles and thousands of great articles, don’t hesitate to get a membership for $5 per month by clicking my referral link:_ https://medium.com/@niklas_lang/membership

Learn Coding: 13 free sites to help you start

Introduction to Random Forest Algorithm

Understanding MapReduce with the Help of Harry Potter

Written By

Niklas Lang

See all from Niklas Lang

Algorithms, Data Science, Decision Tree, Machine Learning, Random Forest

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/a-complete-guide-to-decision-trees-ac8656a0b4bb/