Quantcast
Channel: Artificial Intelligence News, Analysis and Resources - The New Stack
Viewing all articles
Browse latest Browse all 317

How To Run Complex Queries With SQL in Vector Databases

$
0
0
Library

Vector search looks for similar vectors or data points in a data set based on their vector representations. Unlike proprietary vector databases such as Pinecone, Milvus, Qdrant and Weaviate, MyScale is built on the open source, SQL-compatible ClickHouse database.

SQL is an effective tool for managing relational databases. Combining SQL and vectors provides a powerful approach to tackling complex AI-related questions. Users can execute traditional SQL and vector queries on structured data and vector embeddings (data) to address complex queries and analyze high-dimensional data in a unified, efficient manner.

Advanced SQL Techniques for Complex Queries

Simple SQL queries are commands that perform straightforward data retrieval, usually from only one table at a time. Complex SQL queries go beyond standard requests by retrieving data from several tables and limiting the result set with multiple conditions.

A complex query could include features such as:

  • Common table expressions
  • Subqueries
  • Joining to many tables and using different join types

Common Table Expressions

A common table expression (CTE) is a name you give a subquery within your main query. The main reason to do this is to simplify your query, making it easier to read and debug. It can sometimes improve performance, which is another benefit, but it’s mostly about readability and simplification.

Consider a scenario in which you want to determine the average age of customers who bought a particular product. You have a table of customer data, including their name, age and the products they purchased.

Here’s an example query to perform this calculation using a CTE:

This CTE — a temporary named result set (subquery) that can be referenced within a single query — is named product_customers. It’s created using a SELECT statement that retrieves the name and age columns from the customer_data table for customers who purchased the product 'widget'.

Moving the subquery to the top of the query and giving it a name makes it easier to understand what the query does. If your subquery selects a sample embedding vector, you could name your subquery something like target_vector_embed. When you refer to this in the main query, you’ll see this name and know what it refers to.

This is also helpful if you have a long query and need the same logic in several places. You can define it at the top of the query and refer to it multiple times throughout your main query.

So consider using CTEs when having a subquery improves the readability of your query.

Subqueries

A subquery is a simple SQL command embedded within another query. By nesting queries, you can set up larger restrictions on the data included in the result set.

Subqueries can be used in several places within a query, but it’s easiest to start with the FROM statement. Here’s an example of a basic subquery:

I’ll break down what happens when you run the above query:

First, the database runs the “inner query” — the part between the parentheses. If you run this independently, it produces a result set just like any other query. However, after the inner query runs, the outer query runs using the results from the inner query as its underlying table:

Subqueries must have names, which are added after parentheses (the same way you would add an alias to a regular table). This query uses the name sub.

Using Subqueries in Conditional Logic

You can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON or CASE). The following query returns all the entries from the same date as the specified entry in the data set:

This query works because the result of the subquery is only one cell. Most conditional logic will work with subqueries containing one-cell results. However, IN is the only type of conditional logic that will work when the inner query contains multiple results:

Note that you should not include an alias when you write a subquery in a conditional statement. This is because the subquery is treated as an individual value (or set of values in the IN clause) rather than as a table.

Joining Tables

join produces a new table by combining columns from one or multiple tables by using values common to each. Different types of joins are:

  • INNER JOIN: Only matching rows are returned.
  • LEFT JOIN: Nonmatching rows from the left table and matching rows are returned.
  • RIGHT JOIN: Nonmatching rows from the right table and matching rows are returned.
  • FULL JOIN: Nonmatching rows from both tables and matching rows are returned.
  • CROSS JOIN: Produces the Cartesian product of whole tables, as “join keys” are not specified.

Using Complex SQL and Vector Queries in MyScale

SQL vector database MyScale includes several features that help with complex SQL and vector queries.

Common Table Expressions

MyScale supports CTE and substitutes the code defined in the WITH clause for the rest of the SELECT query. Named subqueries can be included in the current and child query context anywhere table objects are allowed.

Vector search is a search method that represents data as vectors. It is commonly used in applications such as image, video and text search. MyScale uses the distance() function to perform vector searches. It calculates the distance between a specified vector and all vector data in a specified column and returns the top candidates.

In some cases, if the specified vector is obtained from another table or the dimension of the specified vector is large and inconvenient to represent, you can use CTE or subquery.

Assume you have a vector table named photos that stores metadata information linked to your photo library’s images, with id, photo_id and photo_embed for the embedding vector.

The following example treats the result of a selection in CTE as a target vector to execute vector search:

Joins and Subqueries

There is limited support in MyScale for join, and using subquery is recommended as a workaround. In MyScale, the vector search is based on the vector index on a table with a vector column. Although the distance() function appears in the SELECT clause, its value is calculated during vector search on the table, not after join. The join result may not be the expected result.

The following are possible workarounds:

  • You can use the distance()...WHERE...ORDER BY...LIMIT query pattern in subqueries that utilize vector indexes and get expected results on vector tables.
  • You can also use subqueries in the WHERE clause to rewrite the join.

Assume you have another table, photo_meta, that stores information about the photo library’s images with photo_id, photo_author, year and title. The following example retrieves relevant photos taken in 2023 from a collection of images:

Here’s what happens when you run the above query:

First, MyScale executes vector search on the table photos to get the required column photo_id and the value of the distance() function for the top five relevant records:

Then, the join runs using the results from the vector table as its underlying table:

Because the vector search doesn’t consider the year photos were taken, the result may be incorrect. To get the correct result, rewrite the join query by using a subquery:

Improve Data Analysis

Advanced SQL techniques like CTEs, subqueries and joins can help you perform complex data analyses and manipulations with greater precision and efficiency. MyScale combines the power of SQL and vectors to provide a powerful approach to tackling complex AI-related questions. With MyScale, you can efficiently execute traditional SQL and vector queries on structured and vector data to address complex queries and analyze high-dimensional data in a unified and efficient manner.

If you are interested in learning more, please follow us on X (Twitter) or join our Discord community. Let’s build the future of data and AI together!

The post How To Run Complex Queries With SQL in Vector Databases appeared first on The New Stack.

Learn how to use common table expressions, subqueries and joins to perform complex SQL and vector search queries in MyScaleDB.

Viewing all articles
Browse latest Browse all 317

Trending Articles