Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture
DescriptionApproximate Nearest Neighbor Search (ANNS) is a critical component of modern AI systems, such as recommendation engines and retrieval-augmented large language models (RAG-LLMs). However, scaling ANNS to billion-entry datasets exposes critical inefficiencies: CPU-based solutions are bottlenecked by memory bandwidth limitations, while GPU implementations underutilize hardware resources, leading to suboptimal performance and energy consumption. We introduce UpANNS, a novel framework leveraging Processing-in-Memory (PIM) architecture to accelerate billion-scale ANNS. UpANNS integrates four key innovations, including: architecture-aware data placement to minimize latency through workload balancing; dynamic resource management for optimal PIM utilization; co-occurrence optimized encoding to reduce redundant computations; and an early-pruning strategy for efficient top-k selection. Evaluation on commercial UPMEM hardware demonstrates that UpANNS achieves 4.3x higher QPS than CPU-based Faiss, while matching GPU performance with 2.3x greater energy efficiency. Its near-linear scalability ensures practicality for growing datasets, making it ideal for applications like real-time LLM serving and large-scale retrieval systems.
Event Type
Paper
TimeWednesday, 19 November 202510:30am - 10:52am CST
Location260-267
Architectures & Networks



