VOOZH about

URL: https://www.geeksforgeeks.org/r-language/execute-sql-queries-on-a-dataframe-using-r/

⇱ Execute SQL queries on a dataframe using R - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Execute SQL queries on a dataframe using R

Last Updated : 28 Jun, 2025

In R Programming Language we can use the sqldf package in R to execute SQL queries on a data frame. This can be useful for performing various data manipulation tasks using SQL syntax. The sqldf package provides a way to write SQL queries as strings and apply them to a data frame, allowing us to perform operations such as filtering, sorting, aggregation, joining and more.

Installing the Required Packages

To use the sqldf package, we first need to install it using the install.packages() function.

Applying SQL Queries using sqldf

We will be performing some SQL operations on a sample data frame using the sqldf in R programming language. To do that we will first create a sample data frame.

Creating a Sample Dataset

We will create a data frame df with four columns (id, name, year_of_exp and role) each containing respective vectors for five individuals. We will first define vectors for to store the values for all the columns. Then, we will combine these vectors into a data frame using the data.frame() function with stringsAsFactors = FALSE to prevent automatic conversion of character vectors to factors. The result will be a table of data representing the individuals' IDs, names, years of experience and job roles.

Output:

👁 df
Creating Dataset

1. Subsetting

We will execute an SQL query to select specific columns. The sqldf() function is used to execute this query in R programming language.

  • SELECT: Selects specific columns (id, name) from the df data frame.

Output:

👁 subset
Subsetting

2. Sorting

We will execute an SQL query to order the data frame by year_of_exp in descending order.

  • ORDER BY: Orders the rows by the year_of_exp column in descending order (DESC).

Output:

👁 sort
Sorting

3. Aggregation

We will execute an SQL query to calculate the average years of experience.

  • AVG: Calculates the average value of the year_of_exp column and names it as avg_exp.

Output:

👁 aggregate
Aggregation

4. Joining

We will execute an SQL query to combine data from two data frames based on a common column.

  • LEFT JOIN: Combines rows from two data frames (df and df2) where matching id values exist, retaining all rows from df.

Output:

👁 join
Joining

5. Grouping

We will execute an SQL query to group the rows by role and calculate the average years of experience for each role.

  • GROUP BY: Groups the rows by the role column and calculates the average year_of_exp for each role.

Output:

👁 groupby
Grouping

6. Filtering

We will execute an SQL query to select rows where year_of_exp is greater than 5.

  • WHERE: Filters rows where the year_of_exp is greater than 5.

Output:

👁 filter
Filtering

7. Updating

We will execute an SQL query to update the year_of_exp column in the df data frame. Here we updated the year_of_exp column of id=1.

  • CASE WHEN: Applies conditional logic
  • WHEN id = 1 checks if id is 1.
  • THEN: If true, adds 1 to year_of_exp.
  • ELSE: If false, keeps year_of_exp unchanged.
  • AS: Renames the new year_of_exp column.
  • END: Closes the CASE expression.

Output:

👁 updating
Updating

8. Deleting

We will execute an SQL query to delete the row from data frame where years of experience is less than 5.

Note: sqldf doesn't directly support DELETE in the same way as other SQL environments

  • WHERE: Filters and selects rows with year_of_exp less than or equal to 5, effectively deleting others.

Output:

👁 delete
Deleting

In this article ,we learned how to use sqldf to run SQL commands on a data frame in R programming Language.

Comment
Article Tags:

Explore