VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/fisher-score-for-feature-selection/

⇱ Fisher Score for Feature Selection - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Fisher Score for Feature Selection

Last Updated : 23 Jul, 2025

The Fisher Score is a simple used to select important features for classification tasks. It works by comparing how much a feature varies between different classes versus how much it varies within the same class. Features that show big differences between classes, but are consistent within each class, are considered useful for classification.

Mathematical Definition

For a given feature and a dataset with classes, the Fisher Score is defined as:

Where:

  • : Number of samples in class
  • : Mean of feature over the entire dataset
  • : Mean of feature in class
  • : Standard deviation of feature in class

How It Works

  • Top of the formula (numerator): Measures how different the feature values are between different classes (between-class variance).
  • Bottom of the formula (denominator): Measures how much the feature values vary within the same class (within-class variance).

A feature gets a high Fisher Score if the classes are clearly separated by it and it is consistent within each class.

Use Case Example: Fisher Score in Text Classification

Suppose you are building a spam email classifier using a bag-of-words model. Each email is converted into a vector where each feature represents the frequency of a specific word (e.g., “offer”, “free”, “meeting”, etc.).

Now, you want to select the most informative words that help differentiate between the two classes: spam and not spam.

Here's how Fisher Score helps:

  • For each word (feature), compute the mean frequency of that word in spam emails and in non-spam emails.
  • Also compute the overall mean and standard deviations within each class.
  • Apply the Fisher Score formula to get a score for each word.
  • Words like “free” or “offer” may have high Fisher Scores, indicating they appear frequently in spam and rarely in non-spam, making them highly discriminative.
  • Words like “meeting” or “attached” might appear in both classes similarly, giving them low Fisher Scores, so they can be dropped.

Fisher Score promotes features with high between-class variance (different means across classes) and low within-class variance (consistency within each class), which is ideal for discriminating features.

Python Example

Here’s a basic example using sklearn to compute Fisher scores with the SelectKBest method and f_classif (which computes ANOVA F-statistics, equivalent to Fisher scores):

Output:

👁 fisher
Fisher Score

Fisher Score vs Other Methods

Feature Selection Method

Type

Dependency on Class Labels

Computational Cost

Interpretability

Fisher Score

Filter

Yes

Low

High

Mutual Information

Filter

Yes

Moderate

Medium

Recursive Feature Elim

Wrapper

Yes

High

Medium

PCA

Unsupervised

No

Moderate

Low

Pros and Cons

Pros:

  • Fast and computationally efficient
  • Works well for high-dimensional data
  • Simple to interpret

Cons:

  • Considers each feature independently (ignores interaction)
  • May not capture complex feature relationships
  • Assumes normality and equal variances across classes (in some formulations)
Comment