Milestone Systems unveiled its upcoming Vision Language Model (VLM) to artificial intelligence developers around the world at a hackathon aimed at powering video intelligence for urban environments. Fifty-five developers were given early access to VLM and trained on over 75,000 hours of curated video data, with finalists announced after a competitive event.
Product expansion
VLM was built using NVIDIA Cosmos-Reason and Hafnia’s domain-specific data library and is designed to interpret urban visuals, language, symbols, and events. Milestone is also introducing a generative AI plugin for its XProtect video management software, with a focus on improving traffic management in complex urban environments. This plugin converts video feeds into actionable, detailed written reports, summaries, and live alerts.
Challenge to hackathon
Participants from 15 countries were tasked with leveraging VLM via API to create integrations with third-party applications with the goal of improving smart cities. This overview focused on solutions that allow you to utilize video streams in a more efficient and privacy-friendly manner. Analysts currently face the challenge of extracting meaningful insights from vast amounts of video footage, and Milestone aims to tackle this process through automation and artificial intelligence.
winning solution
“We wanted to create something relevant and useful in real life. What I liked most about the hackathon was how easy it was to get started. The API and documentation made it easy to quickly create demos, brainstorm more ideas, and try them out,” said hackathon winner Thomas Kreutz.
Kreutz’s entry “Ask The City” uses Hafnia’s VLM API to transform live city camera footage into instant, privacy-friendly responses to user queries. Users select a location on the map, ask questions in natural language, and the system returns answers generated from the most recent video frames. Mr. Kreutz received 5,000 euros and an NVIDIA Jetson AGX Thor developer kit.
“Technology is evolving faster than ever, and no one company has all the answers. The winners will be those that can bring together the best technology, the best minds, and the best ideas. The essence of Milestone’s open platform is to foster innovation. We’re not just building video management software, we’re building an ecosystem for our partners and customers to innovate on top of it,” said Sebastian, vice president of technology partnerships and open platforms. says Delner. Milestone system.
Finalist projects
Second place went to the Blaize team, an emergency response tool that uses edge AI to understand emergency situations and suggest response plans in near real-time. Citilog’s SmartMap secured third place by integrating incident detection with weather and traffic overlays to improve operational decision-making. Other finalists included New Zealand developer Rawinder Singh of RevoFlow, a no-code workflow builder for video analytics, who also won the Audience Award.
Further shortlisted entries ranged from video analytics workflow tools to AI-powered platforms that provide accelerated video summarization, advanced search, and semantic metadata extraction.
industry ecosystem
During the event, industry speakers from NVIDIA, AWS, Dell, and Intel discussed the broad potential of artificial intelligence in video technology, reinforcing the growing focus on open platforms and collaborative development. This hackathon demonstrated how domain-specific AI systems can provide a means to add value to city infrastructure through automation and managed data privacy.
data sourcing
Milestone’s VLM is trained using data from Hafnia, with a focus on compliance and anonymization to address ethical concerns in video analytics. This approach combines computer vision and natural language processing to transform visual data into structured output without identifying individuals.
“We are very impressed with the innovative integrations of all the finalists, but Thomas Kreutz and Ask the City pulled it off brilliantly. The success of the hackathon promises the future use of our platform and data library for training computer vision models based on compliantly sourced, curated, extensively annotated and anonymized real-world data,” said Roland Hauw.
d, Community Lead – Hafnia.
