Types of Decision Tree Algorithms in Machine Learning

In Machine Learning, Decision Trees work by splitting the data into subsets based on feature values, essentially dividing the input space into regions with similar output values. There are several algorithms for building decision trees, each with its unique way of deciding how to split the data and when to stop splitting. In this article, I’ll take you through the types of Decision Tree algorithms used for building decision trees in Machine Learning that you should know.

Types of Decision Tree Algorithms in Machine Learning

Let’s understand the types of Decision Tree algorithms in Machine Learning you should know, how they work, and when to use which one.

ID3 (Iterative Dichotomiser 3)

ID3 algorithm selects the attribute that offers the highest Information Gain, which aims to reach the most homogeneous nodes possible. It uses Entropy, a measure of disorder or unpredictability, to assess the purity of a node. Information Gain is then calculated to determine how much entropy is reduced after a dataset is split on an attribute. The process is recursive, with splits occurring until no further gain is possible or other stopping criteria are met.

ID3 is particularly effective for problems where the data attributes are categorical because it does not natively handle continuous variables or missing data.

For example, in a marketing application where the goal is to classify customers based on categorical attributes like gender, membership status, or product preferences, ID3 can efficiently partition the data to predict customer behaviour or classify types of customers. However, it’s less suited for problems involving a large number of continuous variables or a high risk of overfitting due to its lack of pruning mechanisms.

Let’s understand how ID3 works with an example. Let’s say a company wants to classify its customers based on two categorical attributes: “Membership Status” (Gold, Silver, None) and “Product Preference” (Books, Electronics, Clothes). The target variable is “Likely to Purchase” (Yes, No). In this problem:

The root node will start with the “Membership Status” attribute, as it provides the highest Information Gain in classifying customers on their likelihood to purchase.
The tree will then split into three branches: Gold, Silver, and None.
Under each membership status, the tree will further split based on “Product Preference”.
The leaves of the tree will indicate the prediction of “Likely to Purchase” based on the combination of membership status and product preference.

C4.5 (Successor of ID3)

Building on ID3, C4.5 introduces several enhancements, such as the ability to handle both continuous and categorical data, deal with missing values, and implement tree pruning to mitigate overfitting. It uses the Information Gain Ratio to counteract the bias toward attributes with more levels. For continuous attributes, C4.5 decides on a threshold that best divides the dataset according to the Information Gain Ratio.

C4.5’s adaptability makes it suitable for a broad range of applications, including those that require dealing with mixed data types or missing information.

For instance, in medical diagnosis, where datasets often contain a mix of patient data types (e.g., blood pressure readings, which are continuous, and symptoms presence, which is categorical) and missing values are common, C4.5 can efficiently handle such complexities. It’s also a good choice when a balance between accuracy and model complexity is necessary, thanks to its pruning capabilities.

Let’s see how C4.5 will solve a problem with an example. Let’s say we have a dataset that contains patient data for diagnosing a particular disease. Attributes include “Age” (continuous), “Symptom Presence” (categorical: Present, Absent), and “Blood Pressure” (continuous). The target variable is “Disease” (Positive, Negative). In this problem:

The root node might be “Symptom Presence” if it offers the highest Information Gain Ratio after considering the split’s purity and the number of outcomes.
For patients with symptoms present, the tree could next consider “Blood Pressure” using a threshold value to split patients into high and low blood pressure groups.
In the absence of symptoms, “Age” might be considered, with a different threshold to categorize patients into higher or lower risk categories.
The leaves will represent the final classification of “Disease” based on the path taken through the tree.

CART (Classification and Regression Trees)

CART employs a binary splitting approach, where each node is split into two child nodes, and it uses Gini Impurity for classification tasks and Mean Squared Error for regression. This approach is straightforward yet powerful, as it allows CART to construct highly interpretable models. The algorithm builds a fully grown tree and then prunes it using cost-complexity pruning to find the optimal subtree, which addresses the overfitting problem.

CART’s versatility in handling both classification and regression tasks makes it a go-to algorithm for many real-world problems.

For example, in a real estate application predicting both the category of housing (e.g., apartment, duplex, single-family home) and the price of properties, CART can manage these tasks within the same framework. Its binary splitting nature also makes it particularly useful for datasets where binary decisions are inherent to the problem structure, such as in loan approval processes, where decisions are fundamentally yes/no.

Let’s see how CART will solve a problem with an example. Let’s say we have a dataset with features like “Location” (Urban, Suburban, Rural), “Size” (continuous, in square feet), and “Age” of the property (continuous, in years). The task is to predict both the category of housing (classification: Apartment, Duplex, Single-family) and its price (regression). In this problem:

The root node might split the data based on “Location”, as it’s a significant factor in both pricing and housing type.
For urban locations, the next split could be on “Size” to differentiate between apartments and duplexes, with further splits estimating price ranges.
In suburban and rural areas, “Age” might be a deciding factor, with newer homes being more likely single-family and possibly commanding higher prices.

Choosing the Right Algorithm

Below is a table describing the comparison of ID3, C4.5, and CART that will help you select the right Decision Tree algorithms for your problem:

Feature	ID3	C4.5	CART
Handles Continuous Data	No	Yes	Yes
Handles Categorical Data	Yes	Yes	Yes
Type of Tree	Multiway	Multiway	Binary
Metric	Information Gain	Information Gain Ratio	Gini Impurity (Classification), MSE (Regression)
Pruning	No	Yes	Yes
Overfitting	Prone	Less Prone	Less Prone with Pruning

All Decision Tree algorithms have their strengths and weaknesses, and the choice among them should be based on the specific requirements of your dataset and the problem you are trying to solve. Here are some takeaways that will help you choose the right algorithm:

For categorical data: ID3 or C4.5 can be effective, with C4.5 offering improvements in handling continuous data and pruning.
For a mix of continuous and categorical data: C4.5 or CART are better choices, with CART providing a more generalized approach suitable for both classification and regression.
To avoid overfitting: CART and C4.5 with their pruning capabilities are preferable.

Summary

So ID3, C4.5, and CART are the types of Decision Tree algorithms you should know. ID3 algorithm selects the attribute that offers the highest Information Gain, which aims to reach the most homogeneous nodes possible. C4.5 introduces several enhancements, such as the ability to handle both continuous and categorical data, deal with missing values, and implement tree pruning to mitigate overfitting. CART employs a binary splitting approach, where each node is split into two child nodes, and it uses Gini Impurity for classification tasks and Mean Squared Error for regression.

I hope you liked this article on the types of Decision Tree algorithms in Machine Learning you should know. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.