How to do Conditional Mutate in R

Last Updated : 11 Sep, 2025

In R Programming Language, Mutate() is a function used to create, delete, and modify columns in a dataset. It is used to create columns that are functions of existing variables.

R Mutate() function syntax:

mutate(x, expr)
Parameters:
X: Data Frame
expr: operation on variables

Here we are creating a simple dataset and performing a simple mutate operation to understand how mutate() works. We created a dataset with values and used mutate() to add a new column where the values are squared.

Output:

[1] Original Dataset
1 1 10
2 2 15
3 3 20
4 4 25
5 5 30
[1] Mutated Dataset
1 1 10 100
2 2 15 225
3 3 20 400
4 4 25 625
5 5 30 900

Before learning about Conditional Mutate in R we should know about relational operators present in R.

Operator	is TRUE if
A < B	A is Less than B
A <= B	A is Less than equal to B
A > B	A is Greater than B
A >= B	A is Greater than equal to B
A == B	A is Equal to B
A != B	A is Not Equal to B
A %in% B	A is an element of B

Conditional Mutate in R

In R, mutate() function we can create and modify the columns of the datasets by applying conditions on the columns of the dataset. We can do Conditional Mutate in R in two types

Two types of Conditional Mutate in R:

case_when()
ifelse()

case_when() function in mutate()

case_when() is a function used in mutate() to create and modify the columns of a dataset using conditions. We use these conditions to categorize or eliminate value etc, It has a simple syntax

syntax:

case_when( X ~ Y)
parameters:
X: Condition to be applied
~: tilde
Y: Value to be set

Here x is the condition we will be applying to the dataset '~' is the tilde and right of this is Y which is the value to be inserted in the column.

lets learn about case_when() in detail with some examples

Install Necessary Libraries

tibble: tibble package is used to create and manipulate data frames.
dplyr: dplyr package includes the mutate() function which we are using in the next sections.

Create a simple Dataset

Here, We are creating a simple dataset to perform operations on Conditional Mutate in R. This dataset includes the ID, Name, Age, Gender, and Education of 10 members male and female and we have some NA values in the dataset. We created those missing values to understand how we handle those missing values with mutate().

Output:

A tibble: 10 × 5
 ID Name Age Gender Education 
<int><chr><dbl><chr><chr>
 1 1 Alice 25 Female Bachelor's 
 2 2 Bob 18 Male High School
 3 3 Charlie 22 Male Bachelor's 
 4 4 David NA Male PhD 
 5 5 Eva 35 Female Master's 
 6 6 Frank 16 Male High School
 7 7 Grace 24 Female PhD 
 8 8 Hank NA Male Master's 
 9 9 Ivy 27 Female Bachelor's 
10 10 Jack 33 Male PhD

Select a column and mutate using case_when()

We are selecting the age column from the dataset using the select() function and saving it in another variable age_data for better understanding and this will not affect the whole dataset. We created a new column 'Age_Group' using mutate() and applied some conditions using case_when() where the people with Ages less than or equal to 18 are considered children and Ages above 18 are considered Adults.

Here we created a new variable for better understanding and maintaining the original dataset as it is.

Output:

A tibble: 10 × 2
 Age Age_Group
<dbl><chr>
 1 25 Adult 
 2 18 Child 
 3 22 Adult 
 4 NA NA 
 5 35 Adult 
 6 16 Child 
 7 24 Adult 
 8 NA NA 
 9 27 Adult 
10 33 Adult

Here NA values are considered as NA, people aged 18 and below are considered as Children, and above 18 are considered as Adults. We will handle NA values in the next sections.

The TRUE default argument

TRUE is an argument in the case_when() function and is used as the default case. if all the conditions in the case_when() function are false then this TRUE condition Is applied.

Here we have created a new column 'Is_Child' based on a condition where people with an age less than or equal to 18 are considered children, and the remaining are considered not children. We applied this condition using the TRUE argument.

Output:

A tibble: 10 × 3
 Age Age_Group Is_Child 
<dbl><chr><chr>
 1 25 Adult Not Child
 2 18 Child Child 
 3 22 Adult Not Child
 4 NA NA Not Child
 5 35 Adult Not Child
 6 16 Child Child 
 7 24 Adult Not Child
 8 NA NA Not Child
 9 27 Adult Not Child
10 33 Adult Not Child

Here we used the TRUE argument. People with an age below 18 are considered children, and for NA values, people with an age above 18 are considered not children.

We must use the TRUE argument at the end of all the conditions in the case_when() function; otherwise, every element in the output will be considered the value set in the TRUE condition. Here is an example demonstration.

Matching NAs with is.na()

We are making a new condition for NA values in the case_when() function using is.na() function. Here, we have created a new column 'New_Age_Group' based on three conditions: people with an age below 18 are considered children, those above 18 are considered adults, and we labeled 'Age missing' for NA values.

Output:

A tibble: 10 × 4
 Age Age_Group Is_Child New_Age_Group
<dbl><chr><chr><chr>
 1 25 Adult Not Child Adult 
 2 18 Child Not Child Child 
 3 22 Adult Not Child Adult 
 4 NA NA Not Child Age Missing 
 5 35 Adult Not Child Adult 
 6 16 Child Not Child Child 
 7 24 Adult Not Child Adult 
 8 NA NA Not Child Age Missing 
 9 27 Adult Not Child Adult 
10 33 Adult Not Child Adult

Here you can observe that for NA values it printed as Age Missing and remaining as the condition applied.

Keeping default values of a variable

We can keep the default values of a column and modify specific elements in the column using the TRUE argument. Here, we are creating a new column 'Education_Level' using the case_when() function with the Education column, considering masters and Ph.D. as postgraduates, and leaving the remaining values unchanged.

Output:

A tibble: 10 × 6
 ID Name Age Gender Education Education_Level
<int><chr><dbl><chr><chr><chr>
 1 1 Alice 25 Female Bachelor's NA 
 2 2 Bob 18 Male High School NA 
 3 3 Charlie 22 Male Bachelor's NA 
 4 4 David NA Male PhD Post Graduate 
 5 5 Eva 35 Female Master's Post Graduate 
 6 6 Frank 16 Male High School NA 
 7 7 Grace 24 Female PhD Post Graduate 
 8 8 Hank NA Male Master's Post Graduate 
 9 9 Ivy 27 Female Bachelor's NA 
10 10 Jack 33 Male PhD Post Graduate

In the above example, we categorized both master's and Ph.D. as postgraduate, while the remaining values were marked as NA because we had not used the TRUE argument yet.

Here is an example of using the TRUE function and keeping the default values of a column. We passed the Education variable to the TRUE argument that will set the remaining values to the default values in the Education column.

Output:

A tibble: 10 × 6
 ID Name Age Gender Education Education_Level
<int><chr><dbl><chr><chr><chr>
 1 1 Alice 25 Female Bachelor's Bachelor's 
 2 2 Bob 18 Male High School High School 
 3 3 Charlie 22 Male Bachelor's Bachelor's 
 4 4 David NA Male PhD Post Graduate 
 5 5 Eva 35 Female Master's Post Graduate 
 6 6 Frank 16 Male High School High School 
 7 7 Grace 24 Female PhD Post Graduate 
 8 8 Hank NA Male Master's Post Graduate 
 9 9 Ivy 27 Female Bachelor's Bachelor's 
10 10 Jack 33 Male PhD Post Graduate

Here you can observe that all the remaining values are set to the default values in the Education column.

Multiple conditions, Multiple variables

Here, we are applying multiple conditions to multiple variables or columns using the case_when() function. We have defined conditions for the 'Education' and 'Gender' variables. Males with masters or Ph.D. are categorized as 'Recruit to male Category', females with masters or Ph.D. are categorized as 'Recruit to female Category', and the default TRUE argument is set to 'Not recruited.'

Output:

A tibble: 10 × 7
 ID Name Age Gender Education Education_Level Recruitment_Category 
<int><chr><dbl><chr><chr><chr><chr>
 1 1 Alice 25 Female Bachelor's Bachelor's Not Recruited 
 2 2 Bob 18 Male High School High School Not Recruited 
 3 3 Charlie 22 Male Bachelor's Bachelor's Not Recruited 
 4 4 David NA Male PhD Post Graduate Recruit to Male Category 
 5 5 Eva 35 Female Master's Post Graduate Recruit to Female Category
 6 6 Frank 16 Male High School High School Not Recruited 
 7 7 Grace 24 Female PhD Post Graduate Recruit to Female Category
 8 8 Hank NA Male Master's Post Graduate Recruit to Male Category 
 9 9 Ivy 27 Female Bachelor's Bachelor's Not Recruited 
10 10 Jack 33 Male PhD Post Graduate Recruit to Male Category

Order of priority of conditions

In the case_when() function, the priority order of conditions is crucial. To illustrate, consider the example of creating a new column, 'New_Age_Group' with conditions based on the 'age' column. The priority order is as follows: age below 18 is categorized as a child, below 30 as a younger adult, below 100 as an older adult, and any missing values are labeled as 'age missing.'

We are following the order of conditions in a hierarchical way.

Output:

A tibble: 10 × 4
 Age Age_Group Is_Child New_Age_Group
<dbl><chr><chr><chr>
 1 25 Adult Not Child Young Adult 
 2 18 Child Not Child Child 
 3 22 Adult Not Child Young Adult 
 4 NA NA Not Child Age Missing 
 5 35 Adult Not Child Older Adult 
 6 16 Child Not Child Child 
 7 24 Adult Not Child Young Adult 
 8 NA NA Not Child Age Missing 
 9 27 Adult Not Child Young Adult 
10 33 Adult Not Child Older Adult

By altering the order of the conditions, specifically placing the age under 100 conditions at the top, we observe a significant impact on the output. Consequently, all values in the new column are now set to the 'Older Adult' category.

Here we have given the highest priority to the condition "Age less than 100" which has led to a faulty case in the output. As a result, all values in the output, except for NA values, are categorized as 'Older Adult'. To avoid this condition

We should write the priority of the conditions perfectly
we can use closed bounds to avoid the faulty case

Note: TRUE argument should always be given at the last of the conditions

ifelse() function in mutate()

This is also similar to case_when() where here we include the else statement for the False condition. It is used in the mutate() function to create and modify columns based on the condition. If the condition is TRUE, it is set to a specific value otherwise, it is set to another value.

Syntax:

ifelse(Con, X, Y)
Parameters:
Con: Condition
X: value to be returned if condition is TRUE
Y: value to be returned if condition is FALSE

Here, we are creating a new column 'Army_Eligibility' using the ifelse() function. If the height is greater than 165, individuals are considered eligible for the army; otherwise, they are set to not eligible for the army.

Output:

A tibble: 10 × 8
 ID Name Age Gender Education Education_Level Recruitment_Category New_Education
<int><chr><dbl><chr><chr><chr><chr><chr>
 1 1 Alice 25 Female Bachelor's Bachelor's Not Recruited College or H…
 2 2 Bob 18 Male High School High School Not Recruited High School 
 3 3 Charlie 22 Male Bachelor's Bachelor's Not Recruited College or H…
 4 4 David NA Male PhD Post Graduate Recruit to Male Category Age Missing 
 5 5 Eva 35 Female Master's Post Graduate Recruit to Female Catego… College or H…
 6 6 Frank 16 Male High School High School Not Recruited High School 
 7 7 Grace 24 Female PhD Post Graduate Recruit to Female Catego… College or H…
 8 8 Hank NA Male Master's Post Graduate Recruit to Male Category Age Missing 
 9 9 Ivy 27 Female Bachelor's Bachelor's Not Recruited College or H…
10 10 Jack 33 Male PhD Post Graduate Recruit to Male Category College or H

Conclusion

In conclusion, regarding Conditional Mutate in R, we have two types of functions: case_when() and ifelse(). These functions are used to create and modify columns based on the provided conditions. The case_when() function sets values only if the condition is TRUE, while ifelse() has an additional statement for the FALSE condition, providing flexibility in creating new columns. We learned how to use case_when() and ifelse() functions in mutate() function, we can use multiple conditions on a single variable and multiple variables, and the order of priority should be followed. The TRUE argument should be the last condition to be given. This article covers various topics on Conditional Mutate in R.

Comment

Article Tags:

R Language

Geeks Premier League

Geeks Premier League 2023

Explore

Introduction

Fundamentals of R

Variables

Input/Output

Control Flow

Functions

Data Structures

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning

Courses

URL: https://www.geeksforgeeks.org/r-language/how-to-do-conditional-mutate-in-r/