How we built vector search in the cloud.

1.65K subscribers

624 views

About
Share

Published On Dec 8, 2023

Rockset fully integrates similarity indexing into its search and analytics database enabling engineers to scale AI applications to thousands of users.

In this talk, Chief Architect Tudor Bosman and engineer Daniel Latta-Lin share how they built a distributed similarity index using FAISS-IVF that is memory-efficient and supports immediate insertion and recall. They delve into the implementation details including how Rockset supports:

-Real-time updates: Rockset supports inserts, updates and deletes of vectors and metadata. It’s built on RocksDB, an open-source embedded storage engine designed for mutability. When a vector is inserted or modified, Rockset computes its Voronoi cell using FAISS and then adds or updates the closest centroid and residual value to the search index. New data is reflected in searches in milliseconds.
-Hybrid search with SQL: Rockset stores and indexes vectors alongside text, JSON and time series data. It leverages both the search index and the similarity index in parallel. Using FAISS, the K nearest centroids to the target vector are identified. Results are filtered by the K nearest centroids and metadata terms using the search index, a concept known as single-stage filtering.
-Separation of indexing and search: With compute-compute separation, similarity indexing of vectors will not affect search performance. Ingestion and indexing happen on different virtual instances (clusters) than search for predictable performance as you scale.

Here's the chapterization of the video with timestamps for YouTube:

0:00 - Introduction and Overview
1:24 - Housekeeping Notes and Presentation Introduction
1:56 - Introduction of Presenters
3:11 - Background of Presenters and Their Work at Roxet
3:47 - Overview of Vector Database Features in Roxet
6:12 - Integration of Vector Search with Database Features
7:45 - Transition to Detailed Discussion on Implementation
8:19 - Explanation of Vector Search and Its Importance
10:07 - Explanation of Vector Space and Similarity Search
11:14 - Challenges of Traditional Vector Search Techniques
12:20 - Introduction to Inverted File Indexing and Its Benefits
14:07 - Explanation of Postings List Iteration in Roxet
15:24 - Benefits of Inverted Index Method Over Graph Algorithms
17:09 - Discussion on Indexing Strategies and Optimization
18:53 - Benefits of Inverted Index for Large Scale Vector Data
20:13 - Integration of Centroid Mapping with Roxet's Architecture
22:07 - Building Index at Scale and Metadata Management
24:20 - Index Building Process and Integration with Existing Data
25:38 - Updates and Real-Time Index Consistency Management
27:33 - Querying the Built Index and Optimizing Search
29:26 - Maintenance of Real-Time Updates and Consistency
30:39 - Closing Discussion on Real-Time Updates and Indexing
32:33 - Metadata Filtering and Seamless Integration with SQL
34:17 - In-Line Filtering and Intersection with Posting Lists
35:29 - Summary of Key Points and Future Directions
36:04 - Plans for Bulk Training and Dynamic Probing List Expansion
37:49 - Q&A Session and Closing Remarks
38:53 - End of Presentation and Thank You Notes

Published On Dec 8, 2023

Share/Embed

Video Link