Improving annotation quality with machine learning

Machine Learning


Voxel 51 Fifty One 02

voxel 51

Similarity search for quality control

Once you find one problematic annotation, similarity search is a powerful tool for finding all related errors. When you click on a mislabeled sample, the most similar images are immediately retrieved to see if they have the same systematic labeling problem.

FiftyOne’s similarity search turns “finding more like this” from a tedious manual task to instant discovery. Once a data set is indexed, visually similar samples can be retrieved instantly through point-and-click or programmatic queries.

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

# Load dataset
dataset = foz.load_zoo_dataset("quickstart")

# Index images by similarity
fob.compute_similarity(
    dataset,
    model="clip-vit-base32-torch",
    brain_key="img_sim"
)

# Sort by most likely to contain annotation mistakes
mistake_view = dataset.sort_by("mistakenness", reverse=True)

# Query the first sample and find 10 most similar images
query_id = mistake_view.take(1).first().id
similar_view = dataset.sort_by_similarity(query_id, k=10, brain_key="img_sim")

# Launch App to view similar samples and for point-and-click similarity search
session = fo.launch_app(dataset)

Key features include instant visual search through the app interface, object-level similarity indexing for detection patches, and a scalable backend to switch from sklearn to Qdrant, Pinecone, or other vector databases for production use.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *