New Funding and A New Foundation for Multimodal AI Data I’ve been building data tooling for ML/AI for almost two decades, beginning with being one of the original contributors to pandas. My cofounder
Tokens Per Second is NOT All You Need Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure We're excited to have a guest post
Accelerating deep learning workflows with Lance Lance is a columnar data format that is easy and fast to version, query and train on. It’s designed to be used with images,
Custom Datasets for efficient LLM training using Lance Learn how to craft custom text datasets for LLM training and fine-tuning using Lance.
Hybrid search, New OpenAI Embedding Models, Multimodal RAG for Video Processing Highlights Hybrid search with custom reranking (included in LanceDB Python version 0.6.0 release) * Explore the potential of reranking to enhance retrieval quality and
Hybrid search: RAG for real-life production-grade applications by Mahesh Deshwal What is Hybrid Search, and what’s the need for it? With the increasing usage of LLMs in RAG setting, there’s
LanceDB Community News — January 2024 We’re kicking off 2024 with a new LanceDB community newsletter to showcase all the updates in the LanceDB ecosystem, news, blogs, and important links.
Substrait Powered Filter Pushdown in Lance by Weston Pace Filter pushdown is one of the more fundamental optimizations in any data engineering pipeline. The premise is simple: the earlier you filter
LanceDB + Polars A (near) perfect match A spiritual successor to pandas, Polars is a new blazing fast DataFrame library for Python written in Rust. At LanceDB, we
Efficient RAG with Compression and Filtering by Kaushal Choudhary Why Contextual Compressors and Filters? RAG (Retrieval Augmented Generation) is a technique that helps add additional data sources to our existing LLM
Using column statistics to make Lance scans 30x faster by Will Jones In Lance v0.8.21, we introduced column statistics and statistics-based page pruning. This enhancement reduces the number of IO calls needed
Benchmarking LanceDB I came upon a blog post yesterday benchmarking LanceDB. The numbers looked very surprising to me, so I decided to do a quick investigation on