The Future of Open Source Table Formats: Apache Iceberg and Lance As the scale of data continues to grow, open-source table formats have become essential for efficient data lake management. Apache Iceberg has emerged as a
Lance File 2.1: Smaller and Simpler Almost a year ago I announced we were going to be embarking on a journey to build a new 2.0 version of our file
Designing a Table Format for ML Workloads In recent years the concept of a table format has really taken off, with explosive growth in technologies like Iceberg, Delta, and Hudi. With so
Columnar File Readers in Depth: Backpressure Streaming data applications can be tricky. When you can read data faster than you can process the data then bad things tend to happen. The
The case for random access I/O One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support
Columnar File Readers in Depth: APIs and Fusion The API used to read files has evolved over time, from simple "full table" reads to batch reads and eventually to iterative "
Lance v2 is now in Beta We've been talking for a while about a new iteration of our file format. We're pleased to announce that the new
Columnar File Readers in Depth: Scheduling vs Decoding We've been working on readers / writers for our recently announced Lance v2 file format and are posting in-depth articles about writing a high
Columnar File Readers in Depth: Parallelism without Row Groups Recently, I shared our plans for a new file format, Lance v2. As I'm creating a file reader for this new format I
Lance v2: A columnar container format for modern data Why a new format? Lance was invented because readers and writers for existing column formats did not handle AI/ML workloads efficiently. Lance v1 solved