263 points by tosh 2 years ago | 52 comments
wskish 2 years ago
After working through several projects that utilized local hnswlib and different databases for text and vector persistence, I integrated hnswlib with sqlite to create an embedded vector search engine that can easily scale up to millions of embeddings. For self-hosted situations of under 10M embeddings and less than insane throughput I think this combo is hard to beat.
Labo333 2 years ago
I'm really happy to see `hnswlib` as a Python dependency since I'm the one who implemented PyPI support: https://github.com/nmslib/hnswlib/pull/140
fzysingularity 2 years ago
wskish 2 years ago
hnswlib implementation of hnsw is faster than faiss's implementation. Faiss has other index methods that are faster in some cases, but more complex as well.
wskish 2 years ago
nl 2 years ago
Since lots of people don't seem to understand how useful these embedding libraries are here's an example. I built a thing that indexes bouldering and climbing competition videos, then builds an embedding of the climber's body position per frame. I then can automatically match different climbers on the same problem.
It works pretty well. Since the body positions are 3D it works reasonably well across camera angles.
The biggest problem is getting the embedding right. I simplified it a lot above because I actually need to embed the problem shape itself because otherwise it matches too well: you get frames of people in identical positions but on different problems!
antman 2 years ago
antman 2 years ago
For anyone else: you pass it directly in metadata see https://github.com/jiggy-ai/hnsqlite/blob/main/test/test_col...
https://github.com/jiggy-ai/hnsqlite/blob/main/test/test_col...
leobg 2 years ago
wskish 2 years ago
leobg 2 years ago
gk1 2 years ago
chandureddyvari 2 years ago
4ft4 2 years ago
fzliu 2 years ago
mojoe 2 years ago
fzliu 2 years ago
You're right in that it's a bit heavyweight, so we're working to see how we can make pub/sub and other cluster components lighter and more efficient overall.
politician 2 years ago
ar9av 2 years ago
mshachkov 2 years ago
[0] https://github.com/facebookresearch/faiss/pull/2521 [1] https://github.com/rapidsai/raft