Scalable Computer Vision with LanceDB & Voxel51

Scalable Computer Vision with LanceDB & Voxel51

2 min read

Importance of dataset analysis in CV

Dataset analysis is an essential step in any computer vision project. It allows us to understand the data that we are working with, identify any potential problems, and make sure that the data is as representative as possible of the real world. This is especially important once the model architecture is fixed due to deployment constraints, as the best way to improve performance is via iteration on datasets.

Voxel51 is a set of open-source tools for computer vision. The FiftyOne query language allows you to explore and analyze your data in a variety of ways. You can use the query language to find specific images, objects, or features.

With this integration with LanceDB, Voxel51 can leverage its fast, robust and setup-free vector search capabilities.

LanceDB: The serverless vectorDB

LanceDB is an open-source, serverless, and a multi-modal vector database.

  • It is built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
  • It support for vector similarity search, full-text search and SQL.
  • Native support for Javascript/Typescript

To enable the lancedb backend in voxel51, simply pass backend="lancedb" in supported operations

import fiftyone.brain as fob

fob.compute_similarity(
    ...
    backend="lancedb",
)

Example use cases

Sort dataset by similarity

Get similar images to a given id

import fiftyone as fo
import fiftyone.brain as fob

fob.compute_similarity(
    ...
    backend="lancedb",
)
id = dataset.first().id
view = dataset.sort_by_similarity(id, k=30)

session = fo.launch_app(view)
session.wait()

Search by Text queries

query = "a photo of a dog"
view = dataset.sort_by_similarity(query, k=10, brain_key="lancedpb_index")

The integration allows various customizations that can be found in the docs.

LanceDB interface integrates into the Python data ecosystem

LanceDB is compatible with python data ecosystem and can be used with pandas, numpy, and arrow.

lancedb_index = fob.compute_similarity(…)
table = lancedb_index.table

# Integration with Python data ecosystem
df = table.to_pandas() # get the table as a pandas dataframe
pa = table.to_arrow() # get the table as an arrow table

More use cases and examples using voxel51 with LanceDB backend are available in the integration docs.

Visit LanceDB repo and docs to see more examples and integration.