Chang She - LanceDB Blog

New Funding and A New Foundation for Multimodal AI Data

I’ve been building data tooling for ML/AI for almost two decades, beginning with being one of the original contributors to pandas. My cofounder

News 3 min read

Tokens Per Second is NOT All You Need

Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure We're excited to have a guest post

Blog 3 min read

Accelerating deep learning workflows with Lance

Lance is a columnar data format that is easy and fast to version, query and train on. It’s designed to be used with images,

5 min read

Custom Datasets for efficient LLM training using Lance

Learn how to craft custom text datasets for LLM training and fine-tuning using Lance.

4 min read

Hybrid search, New OpenAI Embedding Models, Multimodal RAG for Video Processing

Highlights Hybrid search with custom reranking (included in LanceDB Python version 0.6.0 release) * Explore the potential of reranking to enhance retrieval quality and

Newsletter 2 min read

Hybrid search: RAG for real-life production-grade applications

by Mahesh Deshwal What is Hybrid Search, and what’s the need for it? With the increasing usage of LLMs in RAG setting, there’s

Blog 6 min read

LanceDB Community News — January 2024

We’re kicking off 2024 with a new LanceDB community newsletter to showcase all the updates in the LanceDB ecosystem, news, blogs, and important links.

News 1 min read

Substrait Powered Filter Pushdown in Lance

by Weston Pace Filter pushdown is one of the more fundamental optimizations in any data engineering pipeline. The premise is simple: the earlier you filter

Blog 5 min read

LanceDB + Polars

A (near) perfect match A spiritual successor to pandas, Polars is a new blazing fast DataFrame library for Python written in Rust. At LanceDB, we

Blog 6 min read

Efficient RAG with Compression and Filtering

by Kaushal Choudhary Why Contextual Compressors and Filters? RAG (Retrieval Augmented Generation) is a technique that helps add additional data sources to our existing LLM

Blog 3 min read

Using column statistics to make Lance scans 30x faster

by Will Jones In Lance v0.8.21, we introduced column statistics and statistics-based page pruning. This enhancement reduces the number of IO calls needed

Blog 6 min read

Benchmarking LanceDB

I came upon a blog post yesterday benchmarking LanceDB. The numbers looked very surprising to me, so I decided to do a quick investigation on

Blog 7 min read