What is Support and Confidence in Data Mining?

Last Updated : 23 Jul, 2025

Support and Confidence are two important metrices in data mining as it tells us how strong the patterns and trends are that we identify within data. In this article we will learn about them.

What is Support?

Support refers to the relative frequency of an item set in a dataset. It is used to identify frequent item sets in a dataset which can be used to generate association rules. For example, if we set the support threshold to 5% then any itemset that occurs in more than 5% of the transactions in the dataset will be considered as a frequent itemset.

Formula for Support:

Where:

X is the item or combination of items.
Numerator is the number of transactions that contain the item.
Denominator is the total number of transactions in the dataset.

Example:

In a dataset of 100 transactions in a store. If 30 of these transactions include both bread and butter, then support for rule "bread butter" would be:

This means that 30% of the transactions in the dataset contain both bread and butter.

What is Confidence?

Confidence is a measure that indicates how likely it is that item Y will appear in a transaction given that item X is already in the transaction. It is a way of evaluating the strength of association between two items.

Formula for Confidence:

Where:

X is the item or itemset that is already present.
Y is the item or itemset that we are trying to predict.
Support(X ∪ Y) is the support of the combination of both items X and Y.
Support(X) is the support of item X alone.

Example:

In a dataset with 100 transactions if 40 transactions contain bread and 20 transactions contain both bread and butter then confidence for the rule "bread butter" would be:

This means that when bread is bought there is a 50% chance that butter will be bought as well.

How Support and Confidence Work Together?

Support and confidence work together to show how strong and useful a rule or pattern is in data analysis.

High Support means that an item or combination of items appears a lot in the dataset.
High Confidence means that if one item is present there's a strong chance that another item will be present too.

But just because something has high support doesn’t mean it will have high confidence and vice versa. For example an item may appear a lot (high support) but the link between items might not be strong (low confidence).

The table below summarizes the key points between Support and Confidence:

Aspect	Support	Confidence
Definition	Measures how often an itemset appears in a dataset.	Measures the likelihood that an itemset will appear if another itemset appears.
Formula
Purpose	Identifies itemsets that occur frequently in the dataset.	Evaluates the strength of an association rule.
Threshold Usage	Often used with a threshold to identify itemsets that occur frequently enough to be of interest.	Often used with a threshold to identify rules that are strong enough to be of interest.
Interpretation	Interpreted as the percentage of transactions in which an itemset appears.	Interpreted as the percentage of transactions where the second itemset appears, given that the first itemset appears.
Usage in Data Mining	Used for identifying frequent itemsets.	Used for evaluating association rules.