A concept hierarchy organizes data into multiple levels of abstraction-ranging from detailed (low-level) values to more general (high-level) concepts. It allows users to drill down for detailed analysis or roll up for summaries, helping simplify large datasets and improving pattern discovery.
Note: Data mining refers to extracting useful patterns, relationships, and knowledge from large datasets using techniques from statistics, machine learning, and AI.
Example of a Concept Hierarchy
The diagram below shows a hierarchy for the Location dimension.
The root node Location is generalized into Countries (USA, India).
Countries break into States (New York, Gujarat).
States further break down into Cities.
Such hierarchies allow a user to analyze data at the level they needβcity-level detail or country-level summaries.
Example: Age β Age Range (0β12 Child, 13β19 Teen, 20β60 Adult, >60 Senior)
4. Rule-Based Hierarchy
Defined using user-created rules or conditions. Example rule:
IF income > 10,00,000 β High income; ELSE β Low/Medium income
5. Better Representation of Domain Knowledge
Hierarchies capture real-world relationships (e.g., Product β Brand β Category), making the data model easier to understand.
Types of Concept Hierarchies
1. Explicitly Defined Hierarchies
These are manually designed by domain experts or database designers. Example:
Department β Faculty β University
2. Implicitly Defined Hierarchies
These are formed automatically based on:
Attribute values
Numerical ranges
Data types
Relationships in the schema
Methods to Generate Concept Hierarchies
1. Schema-Based Generation: Hierarchy is derived from the database schema itself. Examples:
Primary keyβforeign key relationships
Aggregation levels already built in star/snowflake schemas
2. Rule-Based Generation: Hierarchies designed using user-defined rules or metadata. Example rule:
if age < 18 β Young; if 18β60 β Adult; else β Senior
3. Data-Based Generation: Hierarchies created using data distribution. Methods include:
Clustering
Binning
Range partitioning
Frequency-based grouping
Need of Concept Hierarchy in Data Mining
There are several reasons why a concept hierarchy is useful in data mining:
Better Data Analysis: Hierarchies organize data into manageable levels, making patterns easier to identify. They support summarization and help analysts look at data from different perspectives.
Improved Data Exploration & Visualization: Tree-like structures make dashboards more interactive-users can move between summary and detailed views easily through drill-down or roll-up operations.
Faster and More Accurate Algorithms: Many data mining algorithms work better on generalized data. Hierarchies reduce noise and help algorithms detect clearer patterns.
Data Cleaning & Preprocessing: Concept hierarchies help identify: Outliers, Inconsistent values and Missing ranges, which leads to cleaner, higher-quality datasets.
Better Representation of Domain Knowledge: Hierarchies capture real-world relationships (e.g., Product β Brand β Category), making the data model easier to understand.