![]() |
VOOZH | about |
SQL interview questions for Data Analysts are designed to evaluate both SQL knowledge and problem-solving skills. They cover key topics such as filtering, joins, aggregation and query optimization, often based on real-world data scenarios. These questions are typically categorized into easy, medium and hard levels to assess different skill depths.
SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. It allows analysts to query, update and manage data effectively, which is crucial for data analysis tasks.
SELECT * FROM sales;SELECT DISTINCT product_category FROM sales;Use the WHERE clause to filter records based on specific conditions.
SELECT * FROM sales WHERE amount > 1000;The GROUP BY clause is used to aggregate rows that have the same values in specified columns.
SELECT product_category, SUM(amount)
FROM sales
GROUP BY product_category;
HAVING is used to filter groups after aggregation, while WHERE filters rows before aggregation.
SELECT product_category,
SUM(amount) AS total_amount
FROM sales
GROUP BY product_category
HAVING SUM(amount) > 5000;
Use the ORDER BY clause to sort results in ascending or descending order.
SELECT * FROM sales ORDER BY amount DESC;SELECT product_name, SUM(amount)
FROM sales
GROUP BY product_name;
-- INNER JOIN
SELECT s.product_name, p.price
FROM sales s
INNER JOIN products p
ON s.product_id = p.product_id;
-- LEFT JOIN
SELECT s.product_name, p.price
FROM sales s
LEFT JOIN products p
ON s.product_id = p.product_id;
SELECT amount
FROM sales
ORDER BY amount DESC
LIMIT 5;
A subquery is a query within another query used to perform operations based on the results of the outer query.
SELECT product_name
FROM sales
WHERE amount > (SELECT AVG(amount) FROM sales);
A CROSS JOIN returns the Cartesian product of two tables, where each row in the first table is combined with each row in the second table.
SELECT *
FROM products
CROSS JOIN categories;
Use the AVG() function to calculate the average value.
SELECT AVG(amount)
FROM sales;
The COUNT() function returns the number of rows that match a specified condition.
SELECT COUNT(*)
FROM sales
WHERE product_category = 'Electronics';
SELECT MIN(amount) AS MinAmount,
MAX(amount) AS MaxAmount
FROM sales;
Use the UPDATE statement to modify existing records.
UPDATE sales
SET amount = amount * 1.1
WHERE product_name = 'Laptop';
The CASE statement allows for conditional logic in SQL queries.
SELECT product_name,
CASE
WHEN amount > 1000 THEN 'High'
ELSE 'Low'
END AS sales_category
FROM sales;
Use functions like IS NULL, IS NOT NULL or COALESCE() to handle NULL values.
SELECT COALESCE(discount, 0)
FROM sales;
19. What is the LIMIT clause used for? Provide an example.
The LIMIT clause restricts the number of rows returned by a query.
SELECT *
FROM sales
LIMIT 10;
Use JOIN to combine rows from two or more tables based on related columns.
SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id;
A self-join is a join where a table is joined with itself. It is useful for hierarchical data.
SELECT e1.employee_name AS Employee,
e2.employee_name AS Manager
FROM employees e1
LEFT JOIN employees e2
ON e1.manager_id = e2.employee_id;
SELECT product_name FROM sales
UNION
SELECT product_name FROM returns ;
SELECT product_name FROM sales
UNION ALL
SELECT product_name FROM returns;
Use the GROUP BY clause with HAVING COUNT(*) > 1 to find duplicates.
SELECT product_name, COUNT(*)
FROM sales
GROUP BY product_name
HAVING COUNT(*) > 1;
Window functions perform calculations across a set of table rows related to the current row, such as running totals or rankings.
SELECT product_name, amount,
RANK() OVER (ORDER BY amount DESC) AS rank
FROM sales;
Use the SUM() window function with an appropriate OVER clause.
SELECT order_date, amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
The EXISTS clause checks if a subquery returns any rows.
SELECT product_name
FROM sales
WHERE EXISTS (
SELECT *
FROM returns
WHERE returns.product_id = sales.product_id
);
WHERE: filters rows before aggregation
SELECT *
FROM sales
WHERE amount > 1000;
HAVING: filters groups after aggregation.
SELECT product_name, SUM(amount)
FROM sales
GROUP BY product_name
HAVING SUM(amount) > 5000;
SELECT customer_id, COUNT(order_id)
FROM orders
GROUP BY customer_id;
A TEMPORARY table is a table that exists temporarily during a session and is dropped automatically when the session ends.
CREATE TEMPORARY TABLE temp_sales AS
SELECT * FROM sales WHERE amount > 1000;
The ALTER TABLE statement modifies an existing table structure, such as adding or dropping columns.
ALTER TABLE sales
ADD COLUMN discount DECIMAL(10, 2);
Normalization is the process of organizing data to reduce redundancy and improve data integrity. It ensures that the database is efficient and maintains consistency.
Use the INSERT INTO ... VALUES statement with multiple values or a bulk loading utility.
Indexing improves the speed of data retrieval operations on a table by creating a data structure that allows quick lookups.
Use the CREATE INDEX statement to create an index on one or more columns.
CREATE INDEX idx_product_name
ON sales(product_name);
Common techniques include using indexes, optimizing queries, reducing the use of subqueries and ensuring efficient joins.
The EXPLAIN statement provides information about how a query is executed, including the order of operations and the use of indexes.
EXPLAIN
SELECT * FROM sales WHERE amount > 1000;
The COALESCE() function returns the first non-NULL value from a list of arguments.
SELECT COALESCE(discount, 0)
FROM sales;
Use a combination of SUM() and a window function to calculate the percentage.
SELECT product_name,
SUM(amount) AS total_sales,
(SUM(amount) / SUM(SUM(amount)) OVER ()) * 100 AS percentage
FROM sales
GROUP BY product_name;
A VIEW is a virtual table based on the result of a query. It simplifies complex queries and enhances security.
CREATE VIEW high_value_sales AS
SELECT * FROM sales WHERE amount > 1000;
The DROP TABLE statement deletes an entire table and its data from the database.
DROP TABLE old_sales;DELETE: removes rows from a table based on a condition and can be rolled back.
DELETE FROM sales
WHERE amount < 500;
TRUNCATE: removes all rows from a table and cannot be rolled back.
TRUNCATE TABLE sales;Finding the median requires sorting and using window functions.
WITH OrderedSales AS (
SELECT amount,
ROW_NUMBER() OVER (ORDER BY amount) AS rn,
COUNT(*) OVER () AS total_count
FROM sales
)
SELECT AVG(amount) AS median
FROM OrderedSales
WHERE rn IN ((total_count + 1)/2, (total_count + 2)/2);
The RANK() function assigns a rank to each row within a partition of the result set, with gaps in rank values if there are ties.
SELECT product_name, amount,
RANK() OVER (ORDER BY amount DESC) AS rank
FROM sales;
You can join more than two tables by chaining JOIN operations.
SELECT orders.order_id, customers.customer_name, products.product_name
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
JOIN products ON orders.product_id = products.product_id;
RANK() and DENSE_RANK()?| RANK() | DENSE_RANK() |
|---|---|
| Assigns rank with gaps when there are ties | Assigns rank without gaps |
| Same values → same rank, next rank is skipped | Same values → same rank, next rank continues normally |
| Example ranks: 1, 2, 2, 4 | Example ranks: 1, 2, 2, 3 |
| Example: SELECT name, RANK() OVER (ORDER BY score DESC) FROM students; | Example: SELECT name, DENSE_RANK() OVER (ORDER BY score DESC) FROM students; |
Use the ORDER BY clause with LIMIT to get the last N records.
SELECT *
FROM sales
ORDER BY sale_date DESC
LIMIT 10;
DATEPART() extracts a specific part (e.g., year, month) from a date.
SELECT DATEPART(year, sale_date) AS sale_year
FROM sales;
Use a LEFT JOIN or RIGHT JOIN to include rows from one table even if there are no matching rows in the other table.
SELECT employees.name, departments.department_name
FROM employees
LEFT JOIN departments
ON employees.department_id = departments.department_id;
Use date functions to calculate the difference between two dates.
SELECT DATEDIFF(day, start_date, end_date) AS date_difference
FROM projects;
CHAR and VARCHAR data types?| CHAR | VARCHAR |
|---|---|
| Fixed-length string | Variable-length string |
| Uses full specified length (pads with spaces if needed) | Uses only required space for actual data |
| Faster for fixed-size data | More space-efficient for variable data |