Columnar File Readers in Depth: Column Shredding Record shredding is a classic method used to transpose rows of potentially nested data into a flattened tree of buffers that can be written to
Columnar File Readers in Depth: Compression Transparency Conventional wisdom states that compression and random access do not go well together. However, there are many ways you can compress data, and some of
A Practical Guide to Fine-tuning Embedding Models This is a follow up to the following report that deals with improving retrievers by training and fine-tuning reranker models A Practical Guide to Training
A Practical Guide to Training Custom Rerankers A report on reranking, training, & fine-tuning rerankers for retrieval This report offers practical insights for improving a retriever by reranking results. We'll
The Future of Open Source Table Formats: Apache Iceberg and Lance As the scale of data continues to grow, open-source table formats have become essential for efficient data lake management. Apache Iceberg has emerged as a
Lance File 2.1: Smaller and Simpler Almost a year ago I announced we were going to be embarking on a journey to build a new 2.0 version of our file
Designing a Table Format for ML Workloads In recent years the concept of a table format has really taken off, with explosive growth in technologies like Iceberg, Delta, and Hudi. With so
November Feature Roundup Lance On November 13th, we released Lance 0.19.2. This release includes several new features and improvements, including: * Flexible handling of data for inserts
Columnar File Readers in Depth: Backpressure Streaming data applications can be tricky. When you can read data faster than you can process the data then bad things tend to happen. The
Lance v0.16.1 Feature Roundup In Lance v0.16.1 we introduced several new features, implemented by a combination of LanceDB engineers and community contributors. Lance is an OSS project
The case for random access I/O One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support
Lance v0.15.0 Lance 0.15.0 introduces several experimental features, marking major milestones on important projects in our library. It exposes the first public APIs for full-text