Weston Pace - LanceDB Blog

Columnar File Readers in Depth: Repetition & Definition Levels

Repetition and definition levels are a method of converting structural arrays into a set of buffers. The approach was made popular in Parquet and is

Engineering 12 min read

Columnar File Readers in Depth: Column Shredding

Record shredding is a classic method used to transpose rows of potentially nested data into a flattened tree of buffers that can be written to

Engineering 13 min read

Columnar File Readers in Depth: Compression Transparency

Conventional wisdom states that compression and random access do not go well together. However, there are many ways you can compress data, and some of

Engineering 9 min read

The Future of Open Source Table Formats: Apache Iceberg and Lance

As the scale of data continues to grow, open-source table formats have become essential for efficient data lake management. Apache Iceberg has emerged as a

Engineering 5 min read

Lance File 2.1: Smaller and Simpler

Almost a year ago I announced we were going to be embarking on a journey to build a new 2.0 version of our file

Engineering 10 min read

Designing a Table Format for ML Workloads

In recent years the concept of a table format has really taken off, with explosive growth in technologies like Iceberg, Delta, and Hudi. With so

Engineering 14 min read

Columnar File Readers in Depth: Backpressure

Streaming data applications can be tricky. When you can read data faster than you can process the data then bad things tend to happen. The

Engineering 6 min read

The case for random access I/O

One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support

Engineering 10 min read

Columnar File Readers in Depth: APIs and Fusion

The API used to read files has evolved over time, from simple "full table" reads to batch reads and eventually to iterative "

Engineering 11 min read

Lance v2 is now in Beta

We've been talking for a while about a new iteration of our file format. We're pleased to announce that the new

Engineering 5 min read

Columnar File Readers in Depth: Scheduling vs Decoding

We've been working on readers / writers for our recently announced Lance v2 file format and are posting in-depth articles about writing a high

Engineering 9 min read

Columnar File Readers in Depth: Parallelism without Row Groups

Recently, I shared our plans for a new file format, Lance v2. As I'm creating a file reader for this new format I

Engineering 7 min read