2025-20 - notes.a10y.dev

< [[2025-19]] | [[2025-22]] > #til ### Bang for the Buck: Vector Search on Cloud CPUs [arXiv Link](https://arxiv.org/abs/2505.07621) New paper from Peter Boncz's lab at CWI on how to get the best price-performance for different vector search architectures on different cloud hardware. Most of the interesting conclusions are in Section 3.4 and 4, excerpts: > Graviton3 gives the best “bang for the buck,” even over its successor, Graviton4 > Graviton3 excels in the following areas: a variety of SIMD capabilities, high read throughput, and low sequential memory latency at L3/DRAM. More importantly, it is cheap > Zen4 is still a solid option for vector search on IVF indexes and full scans, especially in float32 and bfloat vectors. Zen3 has few negative points thanks to its low price and low latencies for L2/L3/DRAM access. However, it does not excel in any setting. > Finally, the ***SPRs have the lowest score in QP$ and do not excel in any setting***. Despite SPR Z having the best QPS score for HNSW, its price brings down its QP$ score. SPR is the [Intel Sapphire Rapids](https://chipsandcheese.com/p/a-peek-at-sapphire-rapids) chips. Interesting stuff for folks hosting inference! ### **NVIDIA GPUDirect Storage** (GDS): Provides a way to read from a file handle to local NVMe/NVMe-RDMA directly into GPU memory. NVIDIA provides [KvikIO](https://github.com/rapidsai/kvikio) as a C++ library to read/write fd's from/to GPU memory. See [these method doc comments](https://github.com/rapidsai/kvikio/blob/4d9f905aa3dbdad5f2e8e78f6305ef40d2791ba2/cpp/include/kvikio/file_handle.hpp#L193-L220). OpenDAL is investigating enabling GDS support, tracking issue is here: [Issue](https://github.com/apache/opendal/issues/5090). ## backlog - https://read.engineerscodex.com/p/how-cursor-indexes-codebases-fast - https://storage.googleapis.com/public-technical-paper/INTELLECT_2_Technical_Report.pdf - https://github.com/kylebarron/geo-index