PYLER has implemented a video vector embedding pipeline that assigns a unique digital fingerprint to each moment of video and integrates multimodal inputs (visual, audio, text, metadata) into a unified representation. These embeddings are stored in a PostgreSQL-based vector database using pgvector, complemented by SingleStore to provide high throughput, enabling fast similarity search and context matching across millions of videos.
NVIDIA DGX systems with Blackwell architecture and high-bandwidth NVIDIA NVLink interconnect provide the foundational horsepower for this high-performance training and inference pipeline. By leveraging NVIDIA Mission Control to orchestrate complex training workloads and leveraging NVIDIA NeMo Curator to automate data curation and filtering, PYLER increased video preprocessing throughput by 4x compared to its previous in-house pipeline. This hardware and software synergy enables PYLER to handle video analysis and acquisition across large datasets with unprecedented efficiency.
The move to DGX B200 also fundamentally changed PYLER’s development speed. By leveraging increased computational density that improves hyperparameter search capabilities by a factor of 5, the team reduced model training iteration cycles from three months to one month. Additionally, the move to a Blackwell-based system increases multimodal model training speed by 3x compared to previous generations, allowing PYLER to develop and deploy models faster than ever before.
By extending the NVIDIA Blueprint for Video Search and Summarization (VSS) and leveraging NVIDIA NV-Embed to accelerate embedding generation, the system efficiently handles video analysis and retrieval across large video datasets. PYLER’s model doesn’t just capture “cars.” It understands “people get frustrated while driving in the rain” and enables precise situational positioning and safety audits that were previously not possible with high capacity.
