
voxel 51
Similarity search for quality control
Once you find one problematic annotation, similarity search is a powerful tool for finding all related errors. When you click on a mislabeled sample, the most similar images are immediately retrieved to see if they have the same systematic labeling problem.
FiftyOne’s similarity search turns “finding more like this” from a tedious manual task to instant discovery. Once a data set is indexed, visually similar samples can be retrieved instantly through point-and-click or programmatic queries.
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
# Load dataset
dataset = foz.load_zoo_dataset("quickstart")
# Index images by similarity
fob.compute_similarity(
dataset,
model="clip-vit-base32-torch",
brain_key="img_sim"
)
# Sort by most likely to contain annotation mistakes
mistake_view = dataset.sort_by("mistakenness", reverse=True)
# Query the first sample and find 10 most similar images
query_id = mistake_view.take(1).first().id
similar_view = dataset.sort_by_similarity(query_id, k=10, brain_key="img_sim")
# Launch App to view similar samples and for point-and-click similarity search
session = fo.launch_app(dataset)
Key features include instant visual search through the app interface, object-level similarity indexing for detection patches, and a scalable backend to switch from sklearn to Qdrant, Pinecone, or other vector databases for production use.
