Embeddings are at the heart of modern AI applications. An embedding is a high-dimensional vector that numerically represents the semantic meaning of content like text and images. The number of dimensions in embedding vectors often ranges in the hundreds to over a thousand. This two-part series provides a thorough introduction to embeddings.
Unlike traditional databases, querying a vector database is based on similarity: you give an input vector and the database returns vectors that are most similar to it. This similarity is based on a mathematical operation, for example, the cosine distance. In addition to similarity, you may impose other restrictions, such as limiting the search to vectors with specific metadata fields.
Neither relational databases like PostgreSQL, nor document databases like MongoDB, nor graph databases like Neo4j are particularly well-suited to dealing with high-dimensional embeddings, and by extension, AI applications. That’s where vector databases come in.
Qdrant is an open source vector database with good performance benchmarks and scalability support. It supports advanced features like sparse vectors and hybrid search. You can use it either by self-hosting on your own servers or by using the company's paid cloud services.
This article gives a high level overview of Qdrant. It provides the background for more hands-on guides. The next article in the series consists of practical examples that demonstrate how to use Qdrant.
Qdrant Core Concepts 1. Points Relational databases, like PostgreSQL, store rows of data in tables. The fundamental unit of data is a row. Each item is represented by a row and information about that item is stored as columns of that row.
Similarly, in Qdrant, a vector database, the unit of data is a point. It is called a point as a metaphor to vectors being points in multidimensional embedding space. Points (analogous to rows) exist within collections (analogous to tables).
Each point consists of:
An embedding vector. An ID . Each point has a unique ID, either a UUID or a 64-bit unsigned integer. An optional payload , which is a JSON object containing additional information about the item that the vector represents. In the absence of columns, the payload helps to store additional information relevant to the vector. For example, if the vector is the embedding of a text, the payload might include things like the name of the author, the text itself, the publication link, and so on. This Qdrant document about payloads discusses the concept in greater detail.
In Python, you use the PointStruct module to construct points. To add new points to the database, there are three methods:
upsert upload_collection upload_points upsert is the most commonly used function. upload_collection and upload_points are used to add points in bulk to a collection. They automatically batch the data. Internally, both these methods invoke the upsert method. Note that Qdrant automatically normalizes vectors before storing them.
Some important operations you can do with points are:
Retrieve information on the point, using the retrieve method, similar to selecting rows in a relational database. Update the vector associated with a point using the update_vectors function. Update the payload using the set_payload and overwrite_payload functions. Delete points using the delete function. For operations on points, such as the ones shown above, you first need to identify the point on which to perform the operation. There are two broad ways of identifying points:
Using the IDs of the point. Using filter conditions on the payload. This essentially allows you to shortlist points by imposing conditions on the values of specific keys in the payload. There is a wide range of filtering conditions, as discussed in the Qdrant documentation on filtering . 2. Collections A set of points is a collection. Collections are analogous to tables in relational databases. Collections, like tables, have to have unique names. While creating a collection, it is necessary to specify what type of vectors it is expected to contain:
Size (dimensionality of the vectors) of the vectors to be stored in that collection. A collection can only hold vectors of a pre-specified dimension. It will not accept vectors with a different dimension.The distance metric . All similarity searches on this collection are performed based on this metric. Ideally, you choose the same metric as was used to train the embedding model. Table 1: Comparison of Qdrant with traditional databases 3. Distance Metric The distance is a proxy for how similar vectors are to each other. Vectors that are very similar have a low distance between them. For each collection, you specify a metric based on which to calculate similarity between vectors.
The choice of metric depends on the application at hand. Qdrant allows a few different metrics:
Dot product - Distance.DOT Cosine similarity - Distance.COSINE Euclidean distance - Distance.EUCLID Manhattan distance - Distance.MANHATTAN 4. Multitenancy In relational databases, it is common to create many tables. In the case of vector databases, having many collections negatively affects performance. The logic of splitting data into tables also doesn't apply here, because you just store vectors. So it is recommended to use just a single collection.
However, you might need to restrict access such that a user can only query based on their vectors and not the vectors of other users. A common way to do this is by attaching an extra key, such as group_id to the payload (discussed earlier) of each point.
When querying the collection, you specify that the results are to be restricted only to those vectors with the matching key value pair. This is called multitenancy .
5. Quantization By default, many common frameworks, such as OpenAI, use embedding vectors in float32 (32-bit floating point numbers). Thus, Qdrant also uses float32 to represent vector components (the individual numbers that the vector consists of).
However, using 32-bit floats consumes a lot of space, especially for vectors with a high number of dimensions. For example, consider OpenAI's embedding vectors which have 1536 dimensions using the common text-embedding-ada-002 model:
Each 32-bit floating point number needs 4 bytes of storage. A vector with 1536 dimensions consists of 1536 numbers. In total, each vector needs 4 * 1536 (= 6 kB). Additional overhead space is needed for similarity search computations. A rule of thumb is to multiply the vector size by 1.5. So, effectively, each vector needs 9 kB of space. Practical datasets have millions of vectors. For example, the Arxiv.org Titles dataset, which consists of only the titles of all the papers on the arXiv, has 2.25 million vectors. Each vector corresponds to one document and each document corresponds to the title of one paper.
Using the text-embedding-ada-002 model and float32 weights, this dataset would take 9 kB * 2.25 Million = 20+ GB. Considering the computational overhead, it needs over 30 GB of memory. Very high memory requirements are unrealistic for consumer devices, especially mobile. Such large vectors may also lead to slower computations.
Quantization is a way of reducing the size of individual vectors. Instead of representing vector components with 32-bit floating point numbers, the database uses smaller numbers such as:
8-bit integers: This is called Scalar Quantization. It reduces the size of each vector by a factor of 4.1-bit booleans: This is called Binary Quantization. It reduces the memory consumption by a factor of 32.When you enable quantization, Qdrant stores the original as well as the quantized vectors. It is also possible to keep the quantized vectors in memory and relegate the original vectors to disk.
Reducing the size of vectors speeds up computations. However, what was a 32-bit number is now represented with fewer bits. So, the information is effectively downsampled and there is loss of accuracy.
Because the semantic meaning of the vector is made less precise, things like similarity search are less accurate. To deal with this, Qdrant has two approaches:
Re-scoring : Perform the search using the quantized vectors and then refine the result using the original vectors.Oversampling : Use quantized vectors to fetch more than the desired number of results. For example, if the query requests 10 results, fetch 20 instead. Then re-score these 20 using their actual vectors and return the top 10.To use quantization, enable it while creating the collection. Use the quantization_config parameter to specify the settings. The Qdrant documentation shows how to do this.
6. Indexing Indexing is an essential part of any database. Relational databases typically create indexes over column values. Qdrant, being a vector database, has two types of indexes:
Vector indexes : Dense Vector indexes are based on the Hierarchical Navigable Small Worlds (HNSW) algorithm. This makes it faster to find vectors similar to a given vector.Payload indexes : As the structure of a payload is similar to a JSON-based document database, payload Indexes use similar techniques as those used to index document databases. Payload fields can be marked for indexing based on their datatype, such as integer, float, datetime, string, and so on. For string payloads, you can also choose to use a full-text index.HNSW Hierarchical Navigable Small World (HNSW) is an indexing algorithm for vector databases that represents vectors in a multi-layered graph structure. It starts from a top layer with fewer, broader connections and narrows down through layers, each with more detailed connections, to efficiently locate the target vector.
This method balances speed and accuracy by quickly zooming in on relevant clusters and then refining the search as it descends through the layers.
The paper on HNSW explains the concepts in greater detail.
Conclusion This article provided a high-level overview of Qdrant, a powerful vector database designed for AI and machine learning applications. We explored the fundamental concepts, including points, collections, distance metrics, multitenancy, quantization, and indexing strategies like the HNSW algorithm. These concepts form the backbone of how Qdrant efficiently manages and retrieves high-dimensional embeddings.
For those looking to integrate Qdrant into their AI workflows, Airbyte offers seamless data integration solutions that work in tandem with Qdrant. Airbyte’s tools make it easier to connect Qdrant with other data sources, enabling end-to-end retrieval-augmented generation (RAG) workflows and enhancing your AI capabilities.
To dive deeper into practical implementations, check out these resources:
In the next article , we will provide hands-on instructions for installing Qdrant locally and performing basic operations with embeddings. Stay tuned to see how you can leverage Qdrant and Airbyte to supercharge your AI projects!
💡Suggested Read: Semantic Mapping