VOOZH about

URL: https://thenewstack.io/how-to-run-complex-queries-with-sql-in-vector-databases/

⇱ How To Run Complex Queries With SQL in Vector Databases - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-06 06:31:02
How To Run Complex Queries With SQL in Vector Databases
sponsor-myscale,sponsored-post-contributed,
AI / Data / Large Language Models

How To Run Complex Queries With SQL in Vector Databases

Learn how to use common table expressions, subqueries and joins to perform complex SQL and vector search queries in MyScaleDB.
May 6th, 2024 6:31am by Jianmei Zhang
👁 Featued image for: How To Run Complex Queries With SQL in Vector Databases
Featured image by Benjamin Ashton on Unsplash.
MyScale sponsored this post.

Vector search looks for similar vectors or data points in a data set based on their vector representations. Unlike proprietary vector databases such as Pinecone, Milvus, Qdrant and Weaviate, MyScale is built on the open source, SQL-compatible ClickHouse database.

SQL is an effective tool for managing relational databases. Combining SQL and vectors provides a powerful approach to tackling complex AI-related questions. Users can execute traditional SQL and vector queries on structured data and vector embeddings (data) to address complex queries and analyze high-dimensional data in a unified, efficient manner.

Advanced SQL Techniques for Complex Queries

Simple SQL queries are commands that perform straightforward data retrieval, usually from only one table at a time. Complex SQL queries go beyond standard requests by retrieving data from several tables and limiting the result set with multiple conditions.

A complex query could include features such as:

  • Common table expressions
  • Subqueries
  • Joining to many tables and using different join types

Common Table Expressions

A common table expression (CTE) is a name you give a subquery within your main query. The main reason to do this is to simplify your query, making it easier to read and debug. It can sometimes improve performance, which is another benefit, but it’s mostly about readability and simplification.

Consider a scenario in which you want to determine the average age of customers who bought a particular product. You have a table of customer data, including their name, age and the products they purchased.

Here’s an example query to perform this calculation using a CTE:

This CTE — a temporary named result set (subquery) that can be referenced within a single query — is named product_customers. It’s created using a SELECT statement that retrieves the name and age columns from the customer_data table for customers who purchased the product 'widget'.

Moving the subquery to the top of the query and giving it a name makes it easier to understand what the query does. If your subquery selects a sample embedding vector, you could name your subquery something like target_vector_embed. When you refer to this in the main query, you’ll see this name and know what it refers to.

This is also helpful if you have a long query and need the same logic in several places. You can define it at the top of the query and refer to it multiple times throughout your main query.

So consider using CTEs when having a subquery improves the readability of your query.

Subqueries

A subquery is a simple SQL command embedded within another query. By nesting queries, you can set up larger restrictions on the data included in the result set.

Subqueries can be used in several places within a query, but it’s easiest to start with the FROM statement. Here’s an example of a basic subquery:

I’ll break down what happens when you run the above query:

First, the database runs the “inner query” — the part between the parentheses. If you run this independently, it produces a result set just like any other query. However, after the inner query runs, the outer query runs using the results from the inner query as its underlying table:

Subqueries must have names, which are added after parentheses (the same way you would add an alias to a regular table). This query uses the name sub.

Using Subqueries in Conditional Logic

You can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON or CASE). The following query returns all the entries from the same date as the specified entry in the data set:

This query works because the result of the subquery is only one cell. Most conditional logic will work with subqueries containing one-cell results. However, IN is the only type of conditional logic that will work when the inner query contains multiple results:

Note that you should not include an alias when you write a subquery in a conditional statement. This is because the subquery is treated as an individual value (or set of values in the IN clause) rather than as a table.

Joining Tables

join produces a new table by combining columns from one or multiple tables by using values common to each. Different types of joins are:

  • INNER JOIN: Only matching rows are returned.
  • LEFT JOIN: Nonmatching rows from the left table and matching rows are returned.
  • RIGHT JOIN: Nonmatching rows from the right table and matching rows are returned.
  • FULL JOIN: Nonmatching rows from both tables and matching rows are returned.
  • CROSS JOIN: Produces the Cartesian product of whole tables, as “join keys” are not specified.

Using Complex SQL and Vector Queries in MyScale

SQL vector database MyScale includes several features that help with complex SQL and vector queries.

Common Table Expressions

MyScale supports CTE and substitutes the code defined in the WITH clause for the rest of the SELECT query. Named subqueries can be included in the current and child query context anywhere table objects are allowed.

Vector search is a search method that represents data as vectors. It is commonly used in applications such as image, video and text search. MyScale uses the distance() function to perform vector searches. It calculates the distance between a specified vector and all vector data in a specified column and returns the top candidates.

In some cases, if the specified vector is obtained from another table or the dimension of the specified vector is large and inconvenient to represent, you can use CTE or subquery.

Assume you have a vector table named photos that stores metadata information linked to your photo library’s images, with id, photo_id and photo_embed for the embedding vector.

The following example treats the result of a selection in CTE as a target vector to execute vector search:

Joins and Subqueries

There is limited support in MyScale for join, and using subquery is recommended as a workaround. In MyScale, the vector search is based on the vector index on a table with a vector column. Although the distance() function appears in the SELECT clause, its value is calculated during vector search on the table, not after join. The join result may not be the expected result.

The following are possible workarounds:

  • You can use the distance()...WHERE...ORDER BY...LIMIT query pattern in subqueries that utilize vector indexes and get expected results on vector tables.
  • You can also use subqueries in the WHERE clause to rewrite the join.

Assume you have another table, photo_meta, that stores information about the photo library’s images with photo_id, photo_author, year and title. The following example retrieves relevant photos taken in 2023 from a collection of images:

Here’s what happens when you run the above query:

First, MyScale executes vector search on the table photos to get the required column photo_id and the value of the distance() function for the top five relevant records:

Then, the join runs using the results from the vector table as its underlying table:

Because the vector search doesn’t consider the year photos were taken, the result may be incorrect. To get the correct result, rewrite the join query by using a subquery:

Improve Data Analysis

Advanced SQL techniques like CTEs, subqueries and joins can help you perform complex data analyses and manipulations with greater precision and efficiency. MyScale combines the power of SQL and vectors to provide a powerful approach to tackling complex AI-related questions. With MyScale, you can efficiently execute traditional SQL and vector queries on structured and vector data to address complex queries and analyze high-dimensional data in a unified and efficient manner.

If you are interested in learning more, please follow us on X (Twitter) or join our Discord community. Let’s build the future of data and AI together!

MyScale is an open-source SQL vector database that allows to effectively manage massive volumes of both structured and vector data for developing robust AI applications. It enables every developer to build production-grade GenAI applications with powerful and familiar SQL.
Learn More
TRENDING STORIES
Jianmei Zhang, senior software engineer at MyScale, is passionate about optimizing performance and ensuring efficient storage and retrieval of vector data. With a strong background in database design and development, Jianmei focuses on the development and performance optimization of vector...
Read more from Jianmei Zhang
MyScale sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: ClickHouse.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.