VOOZH about

URL: https://thenewstack.io/how-to-write-sql-queries/

⇱ How to Write SQL Queries - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-27 08:28:20
How to Write SQL Queries
sponsor-oracle,sponsored-post-contributed,
Data / Databases

How to Write SQL Queries

Learn how to use SELECT, FROM, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, OFFSET and FETCH to retrieve data with SQL.
Mar 27th, 2024 8:28am by Gerald Venzl
👁 Featued image for: How to Write SQL Queries
Featured image by Camylla Battani on Unsplash.
Oracle sponsored this post.
SQL is a declarative, English-like domain language for querying, analyzing and manipulating data. SQL originated from relational databases but has since been widely adopted elsewhere. SQL is considered a declarative language, meaning users declare what results they want and not how to get to these results (the latter is the approach of imperative programming languages, such as C, Java and Python). Because of this and the ability to read SQL statements almost as English-language sentences, SQL is generally seen as one of the best high-level declarative programming languages for analyzing data due to its easy-to-learn syntax. SQL has different language elements that can be divided at a high level between queries and data manipulation. SQL queries use the SELECT statement, while SQL used for data manipulation uses the INSERT, UPDATE, DELETE and MERGE statements. The data manipulation statements are collectively called Data Manipulation Language or DML. This article will break down the anatomy of SQL query language, while the second part of the series will describe DML.

Defining SQL Queries

SQL queries are probably the most common operations used in SQL, as they allow users to retrieve and analyze data from one or more tables. SQL query statements include the following elements:
  1. SELECT and FROM
  2. SELECT without FROM
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. HAVING
  7. ORDER BY
  8. OFFSET
  9. FETCH
  10. OFFSET and FETCH
The SELECT statement includes a couple of elements, but only the first two are required: SELECT and FROM. However, some databases including Oracle Database and MySQL make the FROM clause optional if the SELECT refers only to self-contained expressions, such as SELECT 1;SELECT sysdate; and SELECT my_function();. In these cases, the data is not derived from a table, hence FROM is not required. Optional components are illustrated by putting [ ] around them.
SELECT <expressions>
 FROM <table or sub-query>
 [ JOIN <to other table> ON <join condition> ]
 [ WHERE <predicates> ]
 [ GROUP BY <expressions>
 [ HAVING <predicates> ] ]
 [ ORDER BY <expressions> ]
 [ OFFSET ]
 [ FETCH ]
A common misconception is that these components execute in the same order as they appear in the query. This is not the case, as the SELECT component is processed after the HAVING clause. The following lists the order that which the clauses are processed and their purpose:
  1. FROM: Indicates which table(s) to retrieve data from. The FROM clause determines the working set of data that is being retrieved. This usually refers to a table but can also include a subquery (another SELECT query that acts as the input source for the current query).
  2. JOIN: Specifies the rules for joining multiple tables. The JOIN clause is part of the FROM clause and combines the data from multiple tables into one data set. It is one of the fundamental operators of the relational model to combine different relations into one set. The JOIN clause allows join conditions that ensure that only rows that logically belong together are joined (rows with matching primary key –> foreign key relationships). Multiple JOIN clauses can be specified to join multiple tables into the data set. Because the JOIN clause is part of the FROM clause, it cannot be specified without a preceding FROM statement in the query.
  3. WHERE: Filters the rows returned by the query. The WHERE clause filters the data set based on the provided predicates or filter conditions and discards all rows that do not match them. It narrows down the results, for example, to retrieve all countries from the continent of Europe instead of all countries in the world.
  4. GROUP BY: Aggregates (or groups) rows with common values in the specified column(s) into one row. The resulting row is sometimes referred to as a grouping set. Because rows with common values are aggregated into one row, there will only be as many rows as there are unique values. For values of columns not specified in GROUP BY, aggregate functions in the SELECT clause are required to aggregate these values per group.
  5. HAVING: Filters rows resulting from the GROUP BY clause. It is therefore a part of the GROUP BY and cannot be used without a preceding GROUP BY statement in the query.
  6. SELECT: Defines the list of column(s) and expression(s) to appear in the query result output. The SELECT clause computes any expressions and defines the list of columns to be returned, or projected, as the query result.
  7. ORDER BY: Identifies which column(s) to use to sort the resulting data and in which direction to sort them (ascending or descending). If the ORDER BY is omitted, the order of the rows returned by a SQL query is undefined.
  8. OFFSET: Specifies the number of rows to skip in the result set before returning the data.
  9. FETCH: Specifies the number of rows to return from the result.

Using SQL Queries

Now that you’re familiar with what the various SQL query clauses mean, you can start using them. You can follow along with these exercises using the data model in my GitHub repository.

SELECT and FROM

In its simplest form, a SQL query is comprised of a SELECT and a FROM clause:
SQL> SELECT *
 2* FROM regions;
 
REGION_ID	NAME
____________ ________________
AF 	Africa
AN 	Antarctica
AS 	Asia
EU 	Europe
NA 	North America
OC 	Oceania
SA 	South America
 
7 rows selected.
This query selects all rows and all columns (as indicated by the * after SELECT, which is short for “all columns”) from a table called regions. If you want to return a given list of columns, you can call them out specifically:
SQL> SELECT name
 2* FROM regions;
 
NAME
________________
Africa
Antarctica
Asia
Europe
North America
Oceania
South America
 
7 rows selected.

SELECT without FROM

The SELECT statement can also compute expressions, for example, 1+2. Technically, neither the constant 1 nor the constant 2 come from any table, but the ISO SQL standard nevertheless requires a FROM clause. Many databases have “dummy” tables to enable such queries, such as the dual table in Oracle Database.
SQL> SELECT 1+2
 2* FROM dual;
 
 1+2
______
 3
However, many databases, including Oracle Database, have loosened this restriction from the SQL standard and allow queries to omit the FROM clause in such cases:
SQL> SELECT 1+2;
 
 1+2
______
 3

JOIN

The relational model is all about normalizing data, that is putting independent data into separate tables and defining relationships between these tables. To recombine normalized data, you can use joins to join these tables back together. The following example has two tables: the previously queried regions table, and the new countries table. To write a query that joins both tables into one result, you use the JOIN clause. Without a JOIN clause, if you specify the two tables in the FROM clause, every row from the table regions will be multiplied by every row from the table countries. This is often referred to as the cross product and it’s a common mistake that SQL beginners make. For example:
SQL> SELECT r.name, c.name
 2* FROM regions r, countries c;
 
NAME 	NAME
_________ ___________________________________
Africa	Kosovo
Africa	Yemen
Africa	South Africa
Africa	Zambia
Africa	Zimbabwe
Africa	Andorra
Africa	United Arab Emirates
Africa	Afghanistan
Africa	Antigua and Barbuda
Africa	Albania
Africa	Armenia
Africa	Angola
Africa	Argentina
Africa	Austria
Africa	Australia
...
...
...
South America	Uzbekistan
South America	Vatican City
South America	Saint Vincent and the Grenadines
South America	Venezuela
South America	Vietnam
South America	Vanuatu
South America	Samoa
 
1,372 rows selected.
The output from this query is obviously incorrect. There are neither 1,372 countries, nor is Austria located in Africa. What we really want is to join all rows from the countries table with the regions table where the region_id is the same. This is generally referred to as the join condition and can be specified in the ON clause as part of the JOIN clause:
SQL> SELECT r.name, c.name
 2 FROM regions r
 3	JOIN countries c
 4*	ON (r.region_id=c.region_id);
 
NAME 	NAME
_________ ___________________________________
Africa	South Africa
Africa	Zambia
Africa	Zimbabwe
Africa	Angola
Africa	Burkina Faso
Africa	Burundi
Africa	Benin
...
...
...
South America	Ecuador
South America	Guyana
South America	Peru
South America	Paraguay
South America	Suriname
South America	Uruguay
South America	Venezuela
 
196 rows selected.
That’s more like it! There is one more thing to note: The above queries specified SELECT r.name, c.name and put the letters r and c next to the table names. These are table aliases and are required to tell the database which table column you want. If the statement just said SELECT name, name, it would be unclear whether the query refers to the regions table column name or the countries table column name.

WHERE

The WHERE clause filters the rows produced by the FROM clause. Until now, you always got all the rows that were in the tables. If you want to return only all countries in South America, this is where the WHERE clause comes in. It can be used to match all rows with the regions.name column value 'South America':
SQL> SELECT r.name, c.name
 2 FROM regions r
 3	JOIN countries c
 4 	ON (r.region_id=c.region_id)
 5* WHERE r.name = 'South America';
 
NAME 	NAME
________________ ____________
South America	Argentina
South America	Bolivia
South America	Brazil
South America	Chile
South America	Colombia
South America	Ecuador
South America	Guyana
South America	Peru
South America	Paraguay
South America	Suriname
South America	Uruguay
South America	Venezuela
 
12 rows selected.

GROUP BY

The GROUP BY clause is often a puzzle for SQL beginners. The clause is used to aggregate multiple rows into a group, essentially combining multiple rows into one row. When would such a thing ever be useful? Well, for example, the countries table contains a column called population but the regions table does not:
SQL> SELECT r.name, c.name, c.population
 2 FROM regions r
 3	JOIN countries c
 4 	ON (r.region_id=c.region_id)
 5* WHERE r.name = 'South America';
 
NAME 	NAME 	POPULATION
________________ ____________ _____________
South America	Argentina 	44694000
South America	Bolivia 	11306000
South America	Brazil 	208847000
South America	Chile 	17925000
South America	Colombia 	48169000
South America	Ecuador 	16291000
South America	Guyana 	741000
South America	Peru 	31331000
South America	Paraguay 	7026000
South America	Suriname 	598000
South America	Uruguay 	3369000
South America	Venezuela 	31689000
 
12 rows selected.
A common business question might be: “What is the total population of each region?” Given that the regions table does not have a column with that information, the answer can be provided only by calculating the sum of the population column for each country per region. So, you need a mechanism that takes the 196 rows of the countries table and puts them into seven groups or buckets based on their region (because there are seven rows in the regions table). However, the query cannot just put the 196 rows into seven rows; it needs to calculate the sum of the population per region based on the populations of the countries belonging to the region. This can be done by applying the SUM() aggregate function over the population column:
SQL> SELECT r.name, SUM(c.population)
 2 FROM regions r
 3	JOIN countries c
 4 	ON (r.region_id=c.region_id)
 5* GROUP BY r.name;
 
NAME 	SUM(C.POPULATION)
________________ ____________________
Africa 	1263685000
Asia 	4439011000
Europe 748985000
North America 	575767000
Oceania 37556000
South America 	421986000
 
6 rows selected.
This query shows something else interesting. Although there are seven regions in the regions table, the query produced six rows. This is because there is a region 'Antarctica', but there is no country with that region_id in the countries table. Hence, the JOIN clause filters that region out (because there is no matching region_id in the countries table as specified by the ON clause). A GROUP BY clause does not require any JOIN clause; you can create groups in just a single table. For example, “How many countries start with the same letter?” can also be answered via a GROUP BY. To do this, create as many groups per the unique first letter value for all rows by using the SUBSTR() function, and then count the rows that fall into that group or category:
SQL> SELECT SUBSTR(name,1,1), COUNT(*)
 2 FROM countries
 3* GROUP BY SUBSTR(name,1,1);
 
SUBSTR(NAME,1,1) 	COUNT(*)
___________________ ___________
K 	6
Y 	1
S 	26
Z 	2
A 	11
U 	7
B 	17
C 	17
D 	5
G 	11
E 	8
F 	3
M 	18
H 	3
I 	8
J 	3
N 	11
L 	9
O 	1
P 	9
Q 	1
R 	3
T 	12
V 	4
 
24 rows selected.

HAVING

The HAVING clause filters rows resulting from the GROUP BY clause based on the predicate(s) provided. For example, if you want to return only the regions that have a population of more than 500 million people, this cannot be specified in the WHERE clause because the WHERE clause is processed before the GROUP BY clause. Hence, the WHERE clause has no notion of the population of a region. This is where the HAVING clause comes in. From a logical perspective, it behaves the same as the WHERE clause, but it filters at a different processing stage:
SQL> SELECT r.name, SUM(c.population)
 2 FROM regions r
 3	JOIN countries c
 4 	ON (r.region_id=c.region_id)
 5 GROUP BY r.name
 6* HAVING SUM(c.population) > (500 * 1000 * 1000);
 
NAME 	SUM(C.POPULATION)
________________ ____________________
Africa 	1263685000
Asia 	4439011000
Europe 	748985000
North America 	575767000

ORDER BY

The ORDER BY clause sorts the resulting data. So far, the undefined sorting of the rows has worked out, except when it came to “countries per first letter.” The ORDER BY clause can be used to return the rows in alphabetical order:
SQL> SELECT SUBSTR(name,1,1), COUNT(*)
 2 FROM countries
 3 GROUP BY SUBSTR(name,1,1)
 4* ORDER BY SUBSTR(name,1,1);
 
SUBSTR(NAME,1,1) 	COUNT(*)
___________________ ___________
A 	11
B 	17
C 	17
D 	5
E 	8
F 	3
G 	11
H 	3
I 	8
J 	3
K 	6
L 	9
M 	18
N 	11
O 	1
P 	9
Q 	1
R 	3
S 	26
T 	12
U 	7
V 	4
Y 	1
Z 	2
 
24 rows selected.
By default, rows are sorted in ascending order, but you can reverse it with the DESC (DESCENDING) keyword:
SQL> SELECT SUBSTR(name,1,1), COUNT(*)
 2 FROM countries
 3 GROUP BY SUBSTR(name,1,1)
 4* ORDER BY SUBSTR(name,1,1) DESC;
 
SUBSTR(NAME,1,1) 	COUNT(*)
___________________ ___________
Z 	2
Y 	1
V 	4
U 	7
T 	12
S 	26
R 	3
Q 	1
P 	9
O 	1
N 	11
M 	18
L 	9
K 	6
J 	3
I 	8
H 	3
G 	11
F 	3
E 	8
D 	5
C 	17
B 	17
A 	11
 
24 rows selected.

OFFSET

The OFFSET clause specifies the number of rows to skip before starting to return data. This clause is shorthand for what would otherwise require analytic queries or subqueries. For example, asking “Give me all countries in South America ranked by square kilometers except the first three” can be answered with:
SQL> SELECT c.name, c.area_sq_km
 2 FROM countries c
 3	JOIN regions r
 4 	ON (c.region_id=r.region_id)
 5 WHERE r.name = 'South America'
 6 ORDER BY area_sq_km DESC
 7* OFFSET 3 ROWS;
 
NAME 	AREA_SQ_KM
____________ _____________
Colombia 	1138910
Bolivia 	1098581
Venezuela 	912050
Chile 	756102
Paraguay 	406752
Ecuador 	283561
Guyana 	214969
Uruguay 	176215
Suriname 	163820
 
9 rows selected.

FETCH

The FETCH clause specifies the number of rows to return from the result. Some databases call this the LIMIT clause. Like the OFFSET clause, this is also a shorthand and can be used to answer business questions like “What are the top three countries in terms of population?” This can be answered with:
SQL> SELECT name, population
 2 FROM countries
 3 ORDER BY population DESC
 4* FETCH FIRST 3 ROWS ONLY;
 
NAME 	POPULATION
________________ _____________
China 	1384689000
India 	1296834000
United States 	329256000
You may wonder what would happen if two rows tie on the third position; will both rows be returned? Or just the first? For these cases, the FETCH clause provides the ONLY and WITH TIES keywords. The above just used ONLY because it is unlikely that two countries have the same population. However, when ranking countries by letter, there is much more room for overlap. For example, in the countries per first letter example, when ranked by number of countries, it is clear that some letters have the same number:
SQL> SELECT SUBSTR(name,1,1), COUNT(*)
 2 FROM countries
 3 GROUP BY SUBSTR(name,1,1)
 4* ORDER BY COUNT(*) DESC;
 
SUBSTR(NAME,1,1) 	COUNT(*)
___________________ ___________
S 	26
M 	18
B 	17
C 	17
T 	12
A 	11
N 	11
G 	11
L 	9
P 	9
I 	8
E 	8
U 	7
K 	6
D 	5
V 	4
J 	3
H 	3
F 	3
R 	3
Z 	2
Q 	1
Y 	1
O 	1
 
24 rows selected.
If you run the same FETCH clause on this query, the letter C would be omitted from the results, although it has exactly the same number of countries as the letter B:
SQL> SELECT SUBSTR(name,1,1), COUNT(*)
 2 FROM countries
 3 GROUP BY SUBSTR(name,1,1)
 4 ORDER BY COUNT(*) DESC
 5* FETCH FIRST 3 ROWS ONLY;
 
SUBSTR(NAME,1,1) 	COUNT(*)
___________________ ___________
S 	26
M 	18
B 	17
This is where the WITH TIES keyword comes in, as it will include ties in the results:
SQL> SELECT SUBSTR(name,1,1), COUNT(*)
 2 FROM countries
 3 GROUP BY SUBSTR(name,1,1)
 4 ORDER BY COUNT(*) DESC
 5* FETCH FIRST 3 ROWS WITH TIES;
 
SUBSTR(NAME,1,1) 	COUNT(*)
___________________ ___________
S 	26
M 	18
B 	17
C 	17

OFFSET and FETCH

Combining the OFFSET and FETCH clauses allow another nice shorthand that would otherwise require an analytical query or subquery. Consider the question “What is the second smallest country on the planet in terms of square kilometers?” This can be answered by combining OFFSET to return results starting from the second row, and FETCH to fetch just the second row:
SQL> SELECT name, area_sq_km
 2 FROM countries
 3 ORDER BY area_sq_km
 4 OFFSET 1 ROW
 5* FETCH FIRST 1 ROW ONLY;
 
NAME 	AREA_SQ_KM
_________ _____________
Monaco 	2

What’s Next?

My second article in this series will break down the anatomy of SQL’s data manipulation language (DML). You can find the data model used in this article and part two in my GitHub repository for this exercise.
Oracle offers a wide range of technologies for building, testing, and maintaining applications in the cloud and in your data center. Find free tools and learning resources at oracle.com/developer
Learn More
The latest from Oracle
TRENDING STORIES
Gerald Venzl has worked as a developer, DBA, performance engineer, software architect, consultant and enterprise architect. He is still active as a developer in his free time and on open source projects. Gerald is currently VP, developer initiatives, Oracle Database.
Read more from Gerald Venzl
Oracle sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.