Sampling from a population is a technique in statistics and data analysis. It allows we to draw conclusions about a large group (the population) by examining a smaller, representative subset (the sample). In R programming language, we can perform random sampling to obtain a sample from a population, which is useful for various applications such as hypothesis testing, data visualization, and model building.
Sampling with Replacement
When we sample with replacement, each selected item is returned to the population before the next item is drawn. In R, we can specify this behavior using the replace argument in the sample() function.
1. Creating a Vector and Sampling with Replacement
We create a numeric vector and randomly sample values with replacement.
sample: Used to draw random samples from the given vector.
replace: Decides whether to allow repeated values in the sample.
Output:
[1] 20 30 40
2. Creating a Data Frame and Sampling Rows with Replacement
We create a data frame and draw a sample of rows from it with replacement.
data.frame: Used to create tabular data.
nrow: Returns the number of rows in the data frame.
We demonstrate how to perform random sampling without replacement using basic R functions.
1. Sampling from a Vector without Replacement
We randomly select unique elements from a vector without repetition.
sample: Used to draw values randomly.
replace: Set to FALSE to avoid repetition.
Output
[1] 7 8 4 2 1
2. Shuffling a Deck and Drawing Cards without Replacement
We simulate shuffling a deck of cards and draw a hand without repetition.
sample: Randomizes the order of card indices.
length: Provides the number of elements to shuffle.
Output:
[1] 21 29 1 34 2
Random sampling using the dplyr package
The dplyr package in R is used for data manipulation and transformation. It has many functions that make it simpler to work with data casings and data tables. Using dplyr , random sampling can be performed using the sample_n() and sample_frac() functions.
1. Sampling Rows from a Data Frame using dplyr
We use the dplyr package to randomly sample a fixed number of rows.
library: Loads external packages.
data.frame: Creates structured tabular data.
sample_n: Randomly selects a fixed number of rows.