VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/association-rule/

⇱ Association Rule - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Association Rule

Last Updated : 2 May, 2026

Association rules are a fundamental concept used to find relationships, correlations or patterns within large sets of data items. They describe how often itemsets occur together in transactions and express implications of the form:

Where and are disjoint sets of items. This rule suggests that when items in appear, items in tend to appear as well. Association rules originated from market basket analysis and help retailers and analysts understand customer behavior by discovering item associations in transaction data. For example, a rule stating

Indicates that customers who buy bread and butter also tend to buy milk.

Key Components

  • Antecedent (X): The "if" part representing one or more items found in transactions.
  • Consequent (Y): The "then" part, representing the items likely to be purchased when antecedent items appear.

Rules are evaluated based on metrics that quantify their strength and usefulness:

Rule Evaluation Metrics

1. Support: Fraction of transactions containing the itemsets in both X and Y.

Support measures how frequently the combination appears in the data.

2. Confidence: Probability that transactions with X also include Y.

Confidence measures the reliability of the inference.

3. Lift: The ratio of observed support to that expected if X and Y were independent.

  • Lift > 1 implies a positive association β€” items occur together more than expected.
  • Lift = 1 implies independence.
  • Lift < 1 implies a negative association.

Example Transaction Data

Transaction ID

Items

1

Bread, Milk

2

Bread, Diaper, Beer, Eggs

3

Milk, Diaper, Beer, Coke

4

Bread, Milk, Diaper, Beer

5

Bread, Milk, Diaper, Coke

Considering the rule:

Calculations:

  • Support =
  • Confidence =
  • Lift = (positive association)

Implementation

Let's see the working,

Step 1: Install and Import Libraries

We will install and import all the required libraries such as pandas, mixtend, matplotlib, networkx.

Step 2: Load and Preview Dataset

We will upload the dataset,

Output:

πŸ‘ Screenshot-2025-09-04-125254
Dataset

Step 3: Prepare Data for Apriori Algorithm

Apriori requires this one-hot encoded format where columns = items and rows = transactions with True/False flags.

Output:

πŸ‘ preparing-data
Preparing Data for Apriori Algorithm

Step 4: Generate Frequent Itemsets

We will,

  • Finds itemsets appearing in β‰₯ 1% of all transactions.
  • use_colnames=True to keep item names readable.

Output:

πŸ‘ Frequent-items
Frequent Itemsets

Step 5: Generate Association Rules

We will,

  • Extract rules with confidence β‰₯ 30%.
  • Rules DataFrame includes columns like antecedents, consequents, support, confidence and lift.

Output:

πŸ‘ association-rule
Result

Step 6: Visualize Top Frequent Items

We will,

  • Visualizes the 10 most purchased items.
  • Helps understand popular products in the dataset.

Output:

πŸ‘ Scatter-plot
Visualizing Top Frequent Items

Step 7: Scatter Plot of Rules(Support vs Confidence)

Here we will,

  • Shows the relationship between support and confidence for rules.
  • Color encodes the strength of rules via lift.

Output:

πŸ‘ scatter-plot
Scatter Plot

Step 8: Heatmap of Confidence for Selected Rules

We will,

  • Shows confidence values between top antecedent and consequent itemsets.
  • A quick way to identify highly confident rules.

Output:

πŸ‘ heatmap
Heatmap

Use Cases

Let's see the use case of Association rule,

  • Market Basket Analysis: Identifies products often bought together to improve store layouts and promotions (e.g., bread and butter).
  • Recommendation Systems: Suggests related items based on buying patterns (e.g., accessories with laptops).
  • Fraud Detection: Detects unusual transaction patterns indicating fraud.
  • Healthcare Analytics: Finds links between symptoms, diseases and treatments (e.g., symptom combinations predicting a disease).

Advantages

  • Interpretable and Easy to Explain: Rules offer clear β€œif-then” relationships understandable to non-technical stakeholders.
  • Unsupervised Learning: Works well on unlabeled data to find hidden patterns without prior knowledge.
  • Flexible Data Types: Effective on transactional, categorical and binary data.
  • Helps in Feature Engineering: Can be used to create new features for downstream supervised models.

Limitations

  • Large Number of Rules: Can generate many rules, including trivial or redundant ones, making interpretation hard.
  • Support Threshold Sensitivity: High support thresholds miss interesting but infrequent patterns; low thresholds generate too many rules.
  • Not Suitable for Continuous Variables: Requires discretization or binning before use with numerical attributes.
  • Computationally Expensive: Performance degrades on very large or dense datasets due to combinatorial explosion.
  • Statistical Significance: High confidence doesn’t guarantee a meaningful rule; domain knowledge is essential to validate findings.
Comment
Article Tags: