Learning ML Part II: Decision Trees.

Machine Learning Part II: Decision trees.

This is part II of my blog for Machine Learning algorithms.  The core of my learning will be based on Georgia Techs ML class which can be found on Udacity , It is the same class that students take to earn credit in their online masters program. This week we will cover decision trees and what they are used for. A decision tree is just a series of choices that lead to a certain conclusion. For example let's say that you are going to play ball if it is sunny outside, and it is not muddy, and if the humidity is low. However you will not play ball if any of those cases are not true. Each conclusion (to play or not to play) is based on series of true or false values conditions.  Here is another example below:


The above tree shows a "20 questions" style deduction where each successive question tries to narrow down what the conclusion is (twenty questions). It asks more general questions first before narrowing down to the more specific questions. Would narrowing down from specific to general make more sense? Probably not. If your first question was to ask "is it a pig"? You may have a small chance of getting it right off the bat but most likely you get a no and it will tell you nothing about what you can ask next to narrow your guesses down further.

Making Decision Trees

So Making a Decision tree is fairly simple. A and B is a condition it is expressed as below:

A or B is a condition and it is expressed as below.

Algorithm for making descion trees.

An algorithm for building Decision trees ID3:


  1. Find the best attribute
  2. Assign A as a decision attribute for node
  3. For each value of A create a descendant of node
  4. For each value of A create a descendant of nodes
  5. Sort training examples of leaves
  6. If examples perfectly classified then stop.
  7. Else iterate over leaves

Example of ID3

How would you convert a table into a decision tree using ID3?  The ID3 algorithm builds decision trees using a top down greedy approach. The example table below is broken down into attributes and classifications.  A attribute is a decision node and classification is the end outcome of tree. The attributes are Outlook, Temp, Humidity, and wind.  The classification is whether we play or not. We said that the first step for ID3 is to find the "best" attribute. The best value is one in which you get the best information gain.

Information gain splits the data into two halves. Low information gain is one in which classifications (Yes or No) is evenly distributed and high information gain is one in which classification can be split into less evenly distributed groups, it is more opinionated. To actually calculate this value we will use entropy.

Entropy is is a measure of how homogeneous a dataset is.  For a binary classification, it ranges from 0 to log2(2).  Where 0 is all data in the set is the same.

Entropy formula :

For a more specific case the entropy forumula becomes this:

From entropy we can obtain gain, which measures the reduction in tormentor that results from partitioning the data on that attribute.


Finally, Here is an example of how Gain is calculated across all attributes. The attribute with a highest gain will become the root node.

After the root is found the Tree should look like this:

The process should be repeated over again (ie. find the Gains for each attribute and get the highest and set it as the root), but this time for subset of the training data that has the outlook of Sunny. That is the basic idea of descision trees and ID3. Look out next week for more machine learning algorithms.

-Joe Kurokawa

Leave a Comment