Feature Engineering in R Programming

Last Updated : 13 Dec, 2025

Feature Engineering in R means creating new features or modifying existing ones to make models work better. It includes cleaning, transforming, scaling, encoding and selecting features for machine learning.

Helps models understand data better
Removes noise and unwanted patterns
Converts raw data into useful inputs
Works with both numeric and categorical features

In R, this is done using packages like dplyr, tidyr, caret and data.table.

Sample Dataset

Output:

👁 Dataframe

Sample Dataset

This dataset has:

Numeric features: age, income
Categorical features: gender, city

We will use this small data to explain each concept.

1. Handling Missing Values

The dataset contains a missing value in income.

Example (add NA for explanation):

Output:

👁 Dataset

Dataset After Handling Missing Values

Explanation:

mean(..., na.rm = TRUE) calculates mean without NA.
Replaces missing entry with the average income.

2. Encoding Categorical Variables

Label Encoding (for binary categories: gender)

Output:

👁 Dataset

Dataset After Label Encoding

Explanation:

Male = 1
Female = 0

One-Hot Encoding (for multi-class: city)

Output:

👁 Dataset

Dataset After One hot encoding

Explanation:

City A, B and C become separate columns:

cityA
cityB
cityC

Each gets 0/1 depending on membership.

3. Feature Scaling

Scaling helps numeric values stay on similar ranges.

Using standard scaling (mean = 0, sd = 1)

Output:

👁 Dataset

Dataset after Using standard scaling

Explanation:

Makes numeric features easier for algorithms like KNN, SVM, etc.

4. Binning (Feature Transformation)

Create age groups:

Output:

👁 Dataset

Dataset after Feature Transformation

Explanation:

Converts continuous age into categories
Helps models see pattern in ranges

5. Feature Construction

Create a new feature: income per year of age

Output:

👁 Dataset

Dataset after Feature Construction

6. Removing Skewness

Apply log transformation to reduce skew in income:

Output:

👁 Dataset

Dataset after Removing Skewness

Explanation:

Helps stabilize values
Makes distribution smoother

7. Final Cleaned Feature-Enhanced Dataset

After all steps, the dataset now looks like this:

original variables (age, income, gender, city)
encoded variables (gender_num, cityA, cityB, cityC)
scaled variables (age_scaled, income_scaled)
transformed variables (income_log, age_group)
constructed feature (income_per_age)

This feature rich dataset is now ready for modeling.

Comment

Article Tags:

R Language

R Machine-Learning

R Data-science

Explore

Introduction

Fundamentals of R

Variables

Input/Output

Control Flow

Functions

Data Structures

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning

Courses

URL: https://www.geeksforgeeks.org/r-language/feature-engineering-in-r-programming/