![]() |
VOOZH | about |
Data Manipulation and Querying (DML + Query Processing) in SQL focuses on how data is retrieved and modified. DML commands (INSERT, UPDATE, DELETE, MERGE) handle data changes, while SELECT with joins, subqueries, grouping, and set operators enables complex querying. Interview questions often test query writing, optimization and handling real-world scenarios like filtering, aggregation, and multi-table joins.
DML (Data Manipulation Language) commands are used to manage data within tables. The main commands are:
Query processing is the series of steps a database takes to execute an SQL query. It involves:
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. This involves dividing large tables into smaller, related tables and defining relationships between them to ensure consistency and avoid anomalies.
Denormalization is the process of combining normalized tables into larger tables for performance reasons. It is used when complex queries and joins slow down data retrieval, and the performance benefits outweigh the drawbacks of redundancy.
A query is a SQL statement used to retrieve, update, or manipulate data in a database. The most common type of query is a SELECT statement, which fetches data from one or more tables based on specified conditions.
View is a virtual table created by a SELECT query. It does not store data itself, but presents data from one or more tables in a structured way. Views simplify complex queries, improve readability, and enhance security by restricting access to specific rows or columns.
GROUP BY clause is used to arrange identical data into groups. It is typically used with aggregate functions (such as COUNT, SUM, AVG) to perform calculations on each group rather than on the entire dataset.
Aggregate functions perform calculations on a set of values and return a single value. Common aggregate functions include:
Subquery is a query nested within another query. It is often used in the WHERE clause to filter data based on the results of another query, making it easier to handle complex conditions.
In short, WHERE applies to individual rows, while HAVING applies to groups
Indexes are database objects that improve query performance by allowing faster retrieval of rows. They function like a book’s index, making it quicker to find specific data without scanning the entire table. However, indexes require additional storage and can slightly slow down data modification operations.
ORDER BY clause sorts the result set of a query in either ascending (default) or descending order, based on one or more columns. This helps present the data in a more meaningful or readable sequence.
A table is a structured collection of related data organized into rows and columns. Columns define the type of data stored, while rows contain individual records.
Cursor is a database object used to retrieve, manipulate, and traverse through rows in a result set one row at a time. Cursors are helpful when performing operations that must be processed sequentially rather than in a set-based manner.
SELECT statement retrieves data from one or more tables. It is the most commonly used command in SQL, allowing users to filter, sort, and display data based on specific criteria.
NULL represents a missing or unknown value. It is different from zero or an empty string. NULL values indicate that the data is not available or applicable.
UNION operator combines the result sets of two or more SELECT queries into a single result set, removing duplicate rows. The result sets must have the same number of columns and compatible data types for corresponding columns.
Example:
SELECT Name FROM Customers
UNION
SELECT Name FROM Employees;
UNION ALL is faster than UNION because it doesn’t perform the additional operation of eliminating duplicates.Example:
SELECT Name FROM Customers
UNION ALL
SELECT Name FROM Employees;
CASE statement is SQL’s way of implementing conditional logic in queries. It evaluates conditions and returns a value based on the first condition that evaluates to true. If no condition is met, it can return a default value using the ELSE clause.
Example:
SELECT ID,
CASE
WHEN Salary > 100000 THEN 'High'
WHEN Salary BETWEEN 50000 AND 100000 THEN 'Medium'
ELSE 'Low'
END AS SalaryLevel
FROM Employees;
Scalar functions operate on individual values and return a single value as a result. They are often used for formatting or converting data. Common examples include:
Example:
SELECT LEN('Example') AS StringLength;
COALESCE function returns the first non-NULL value from a list of expressions. It’s commonly used to provide default values or handle missing data gracefully.
Example:
SELECT COALESCE(NULL, NULL, 'Default Value') AS Result;
1. COUNT(): Counts the number of rows or non-NULL values in a column.
Example:
SELECT COUNT(*) FROM Orders;
2. SUM(): Adds up all numeric values in a column.
Example:
SELECT SUM(TotalAmount) FROM Orders;
NVL(Salary, 0) will replace NULL with 0.
- If the first argument is NOT NULL, returns the second argument.
- If the first argument is NULL, returns the third argument.
Example:
SELECT NVL(Salary, 0) AS AdjustedSalary FROM Employees; -- Replaces NULL with 0
SELECT NVL2(Salary, Salary, 0) AS AdjustedSalary FROM Employees; -- If Salary is NULL, returns 0; otherwise, returns Salary.
Example:
SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;
If two employees have the same salary, they get the same rank, but RANK() will skip a number for the next rank, while DENSE_RANK() will not.
Example:
SELECT Name, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum
FROM Employees;
CTE is a temporary result set defined within a query. It improves query readability and can be referenced multiple times.
Example:
WITH TopSalaries AS (
SELECT Name, Salary
FROM Employees
WHERE Salary > 50000
)
SELECT * FROM TopSalaries WHERE Name LIKE 'A%';
Window functions allow you to perform calculations across a set of table rows that are related to the current row within a result set, without collapsing the result set into a single row. These functions can be used to compute running totals, moving averages, rank rows, etc.
SELECT Name, Salary,
SUM(Salary) OVER (ORDER BY Salary) AS RunningTotal
FROM Employees;
Indexing allows the database to locate and access the rows corresponding to a query condition much faster than scanning the entire table. Instead of reading each row sequentially, the database uses the index to jump directly to the relevant data pages. This reduces the number of disk I/O operations and speeds up query execution, especially for large tables.
Example:
CREATE INDEX idx_lastname ON Employees(LastName);
SELECT * FROM Employees WHERE LastName = 'Smith';
The index on LastName lets the database quickly find all rows matching ‘Smith’ without scanning every record.
Advantages
Disadvantages:
1. GROUP BY: Aggregate rows to eliminate duplicates
SELECT Column1, MAX(Column2)
FROM TableName
GROUP BY Column1;
2. ROW_NUMBER(): Assign a unique number to each row and filter by that
WITH CTE AS (
SELECT Column1, Column2, ROW_NUMBER() OVER (PARTITION BY Column1 ORDER BY Column2) AS RowNum
FROM TableName
)
SELECT * FROM CTE WHERE RowNum = 1;
A correlated subquery is a subquery that references columns from the outer query. It is re-executed for each row processed by the outer query. This makes it more dynamic, but potentially less efficient.
Example:
SELECT Name,
(SELECT COUNT(*)
FROM Orders
WHERE Orders.CustomerID = Customers.CustomerID) AS OrderCount
FROM Customers;
EXCEPT operator is used to return rows from one query’s result set that are not present in another query’s result set. It effectively performs a set difference, showing only the data that is unique to the first query.
Example:
SELECT ProductID FROM ProductsSold
EXCEPT
SELECT ProductID FROM ProductsReturned;
Use Case:
Performance Considerations:
1. Bitmap Index:
2. B-tree Index:
3. Key Difference:
SELECT MAX(Salary) AS SecondHighestSalary
FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);
Explanation:
This query identifies the second-highest salary by selecting the maximum salary that is less than the overall highest salary. The subquery determines the top salary, while the outer query finds the next highest value.
SELECT *
FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);
Explanation:
This query fetches details of employees whose salary exceeds the average salary. The subquery calculates the average salary, and the main query filters rows based on that result.
SELECT ColumnName, COUNT(*)
FROM TableName
GROUP BY ColumnName
HAVING COUNT(*) > 1;
Explanation:
The query uses GROUP BY to group identical values and HAVING COUNT(*) > 1 to identify values that appear more than once in the specified column.
SELECT *
FROM Employee
WHERE JoiningDate > DATE_SUB(CURDATE(), INTERVAL 30 DAY);
Explanation:
By comparing the JoiningDate to the current date minus 30 days, this query retrieves all employees who joined within the last month.
SELECT *
FROM Employee
ORDER BY Salary DESC
LIMIT 3;
Explanation:
The query sorts employees by salary in descending order and uses LIMIT 3 to return only the top three earners.
DELETE FROM Employee
WHERE EmployeeID NOT IN (
SELECT MIN(EmployeeID)
FROM Employee
GROUP BY Column1, Column2
);
Explanation:
This query retains only one row for each set of duplicates by keeping the row with the smallest EmployeeID. It identifies duplicates using GROUP BY and removes rows not matching the minimum ID.
SELECT *
FROM TableA
INNER JOIN TableB ON TableA.ID = TableB.ID;
Explanation:
An INNER JOIN is used to find rows present in both tables by matching a common column (in this case, ID).
SELECT *
FROM Employee
WHERE Name LIKE 'A%' AND Name LIKE '%A';
Explanation:
The query uses LIKE with wildcard characters to filter rows where the Name column starts and ends with the letter 'A'.
SELECT DepartmentID, COUNT(*) AS EmployeeCount
FROM Employee
GROUP BY DepartmentID;
Explanation:
By grouping employees by their DepartmentID and counting rows in each group, the query produces a list of departments along with the employee count.
SELECT *
FROM Employee
WHERE ManagerID IS NULL;
Explanation:
This query selects employees whose ManagerID column is NULL, indicating they don’t report to a manager.
WITH SalaryRank AS (
SELECT Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employee
)
SELECT Salary
FROM SalaryRank
WHERE Rank IN (3, 4);
Explanation:
This query uses the RANK() window function to rank the salaries in descending order. The outer query then selects the 3rd and 4th highest salaries by filtering for those ranks.
SELECT
MAX(CASE WHEN ColumnName = 'Condition1' THEN Value END) AS Column1,
MAX(CASE WHEN ColumnName = 'Condition2' THEN Value END) AS Column2
FROM TableName;
Explanation:
This query converts specific row values into columns using conditional aggregation with CASE. Each column’s value is determined based on a condition applied to rows.
SELECT *
FROM TableName
WHERE UpdatedAt >= NOW() - INTERVAL 1 HOUR;
Explanation:
By comparing the UpdatedAt timestamp to the current time minus one hour, the query retrieves rows updated in the last 60 minutes.
SELECT *
FROM Employee
WHERE DepartmentID IN (
SELECT DepartmentID
FROM Employee
GROUP BY DepartmentID
HAVING COUNT(*) < 5
);
Explanation:
The subquery counts employees in each department, and the main query uses those results to find employees working in departments with fewer than 5 members.
SELECT CASE
WHEN EXISTS (SELECT * FROM TableName) THEN 'Has Records'
ELSE 'No Records'
END AS Status;
Explanation:
The query uses EXISTS to determine if any rows exist in the table, returning a status of 'Has Records' or 'No Records' based on the result.
SELECT e.EmployeeID, e.Salary
FROM Employee e
JOIN Employee m ON e.ManagerID = m.EmployeeID
WHERE e.Salary > m.Salary;
Explanation:
This query joins the Employee table with itself to compare employee salaries to their respective managers’ salaries, selecting those who earn more.
WITH RowNumbered AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum
FROM Employee
)
SELECT *
FROM RowNumbered
WHERE RowNum % 2 = 0;
Explanation:
This query assigns a sequential number to each row using ROW_NUMBER(), then selects rows where the row number is even, effectively fetching alternating rows. The ORDER BY (SELECT NULL) is used to avoid any specific ordering and just apply a sequential numbering.
SELECT DepartmentID
FROM Employee
GROUP BY DepartmentID
ORDER BY AVG(Salary) DESC
LIMIT 1;
Explanation:
Grouping by DepartmentID and ordering by the average salary in descending order, the query returns the department with the highest average.
WITH OrderedEmployees AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum
FROM Employee
)
SELECT *
FROM OrderedEmployees
WHERE RowNum = n;
Explanation:
This query uses ROW_NUMBER() to generate a sequential number for each row. The outer query then retrieves the row where the number matches the desired nth position. The approach is portable across most databases.
SELECT *
FROM Employee
WHERE MONTH(JoiningDate) = MONTH(CURDATE());
Explanation: By comparing the month of JoiningDate to the current month, the query selects all employees who were hired in that month regardless of the year.