![]() |
VOOZH | about |
Factors in R Programming Language are used to represent categorical data, such as "male" or "female" for gender. While they might seem similar to character vectors, factors are actually stored as integers with corresponding labels. Factors are useful when dealing with data that has a fixed set of possible values, known as levels. These levels are sorted alphabetically by default, and once created, a factor can only contain those predefined levels.
π R - Factors GeeksforGeeks
To create a factor in R, we use the factor() function, which converts a vector into a factor. Here are the two main steps:
Example: Creating a Gender Factor
Letβs create a factor for gender with the levels "female", "male", and "transgender".
Output
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
Levels can also be predefined by the programmer.
Output
[1] female male male female
Levels: female transgender male
Further one can check the levels of a factor by using function levels().
The function is.factor() is used to check whether the variable is a factor and returns "TRUE" if it is a factor.
Output
[1] TRUE
Function class() is also used to check whether the variable is a factor and if true returns "factor".
Output
[1] "factor"
We can access the elements of a factor. If gender is a factor then gender[i] would mean accessing an element in the factor.
Output
[1] male
Levels: female male
More than one element can be accessed at a time.
Output
[1] male female
Levels: female male
After a factor is formed, its components can be modified but the new values which need to be assigned must be at the predefined level.
Example
Output
[1] female female male female
Levels: female male
For selecting all the elements of the factor gender except element, gender[-i] should be used. So if you want to modify a factor and add value out of predefined levels, then first modify levels.
Output
[1] female male other female
Levels: female male other
Subtract one element at a time by using square brackets to subset the vector and remove the element.
Output
[1] female male female
Levels: female male
A Data frame in R is similar to a 2D array, where each column represents a variable and each row represents a set of values for those variables. When working with data frames in R, we need to keep these points in mind:
Output
age salary gender
1 40 103200 male
2 49 106200 male
3 48 150200 transgender
4 40 10606 female
5 67 10390 male
6 52 14070 female
7 53 10220 transgender
[1] TRUE
In this article, we explored the concept of factors in R, how to create and modify them, and how they are used in data frames to represent categorical data efficiently.