The FP-Growth (Frequent Pattern Growth) algorithm efficiently mines frequent itemsets from large transactional datasets. Unlike the Apriori algorithm which suffers from high computational cost due to candidate generation and multiple database scans. FP-Growth avoids these inefficiencies by compressing the data into an FP-Tree (Frequent Pattern Tree) and extracts patterns directly from it.
Data Compression: First FP-Growth compresses the dataset into a smaller structure called the Frequent Pattern Tree (FP-Tree). This tree stores information about item sets (collections of items) and their frequencies without need to generate candidate sets like Apriori does.
Mining the Tree: The algorithm then examines this tree to identify patterns that appear frequently based on a minimum support threshold. It does this by breaking the tree down into smaller "conditional" trees for each item making the process more efficient.
Generating Patterns: Once the tree is built and analyzed the algorithm generates the frequent patterns (itemsets) and the rules that describe relationships between items.
Imagine youβre organizing a party and want to know popular food combinations without asking every guest repeatedly.
List food items each guest brought transactions.
Count items and remove infrequent ones filter by support.
Group items in order of popularity and create a tree where paths represent common combinations.
Instead of repeatedly asking guests you explore this tree to discover patterns. For example, you might find that pizza and pasta often come together or that cake and pasta are also a common pair.
This is exactly how FP-Growth finds frequent patterns efficiently.
Working of FP- Growth Algorithm
Problem Statement: Consider a small grocery store transaction dataset. Each entry shows the set of items purchased together by a customer:
A conditional pattern base contains all prefix paths leading to a specific item. Letβs examine the paths ending with Butter.
Paths that end with Butter:
Bread β Milk β Butter (1 occurrence)
Bread β Butter (1 occurrence)
Milk β Butter (1 occurrence)
Thus, the conditional pattern base for Butter is:
[ (Bread, Milk): 1, (Bread): 1, (Milk): 1 ]
Step 5: Build Conditional FP-Trees
Using the conditional pattern base, construct a smaller FP-tree for each item to identify frequent patterns involving that item. Butterβs conditional FP-tree input:
(Bread, Milk): 1
(Bread): 1
(Milk): 1
Count all items:
Bread: 2
Milk: 2
Since both meet the support threshold (β₯2), we can now generate frequent patterns:
{Butter, Bread}
{Butter, Milk}
{Butter, Bread, Milk}
Repeat the process for Milk and Bread as needed.
Step 6: Extract All Frequent Itemsets
From the FP-tree and conditional trees, we get these frequent itemsets:
{Bread}
{Milk}
{Butter}
{Bread, Milk}
{Bread, Butter}
{Milk, Butter}
{Bread, Milk, Butter}
All of these appear at least 2 times in the transactions.