Exclusive: How generative AI is transforming video conferencing

While virtual meetings have become mainstream, technology has not been able to replicate the social experience of face-to-face interactions. At the same time, generative artificial intelligence (GenAI) technology has advanced significantly, offering solutions to many of the problems that have traditionally plagued hybrid conferencing.

GenAI should make virtual meetings more productive and engaging, mimicking real-life experiences. But for this to happen, these capabilities must be available in real time, with minimal latency, and at an affordable price. This means that some of these new AI capabilities must be available on connected endpoints.

Fortunately, solution providers are quickly integrating generative AI into leading video conferencing platforms and computers, making real-time optimization, virtual expansion, and automated meeting management a reality. These developments pave the way for the unified community to significantly enhance the hybrid and virtual meeting experience for its customers.

virtual replication
GenAI can significantly improve the video, audio, and text experience for virtual meetings. In hybrid meetings involving both in-person and remote participants, intelligent video processing powered by AI allows remote participants to zoom in on the speaker instead of broadcasting a static shot of the entire room. , you can recreate the experience of an in-person meeting. .

Neural Radiance field (NeRF) or similar technology can help create an immersive experience that creates a compelling view on the remote participant side and dynamically changes the viewing angle at each endpoint. The AI transforms it into a consistent gallery view, showing all participants in a uniform size, posture, or style. Additionally, if you have a whiteboard in your meeting room, AI can automatically detect it and convert written text into an editable format. You can also create a personal version to take notes.

GenAI can also provide voice and text assistance to each meeting participant, whether virtual or in-person, to maximize productivity. With this assistant, you can convert audio to text to create meeting summaries, perform actions as instructed by their respective owners, and suggest relevant responses on the fly. can. For multilingual teams, language barriers can be alleviated with the help of an assistant that can provide instant voice translation.

Despite its virtually limitless potential, GenAI as it exists today is limited by the technology that makes it possible. To harness its true power, it is not enough to use existing cloud-based services that are available by default.

scalable future
For GenAI to reach its full potential in video conferencing, video conferencing systems must perform GenAI processing on the endpoint itself (either on a personal computer or a conferencing gateway device), without having to go back to the cloud for processing. Must be able to run.

One of the important aspects of conferencing systems is extensibility. When it comes to scalability, it's important to distinguish between cases that involve centralized processing and cases that require edge processing.

There are three main cases where processing at a central point is advantageous.

• Timesharing – When a feature requires light processing that can easily be handled by some of the central machine's capabilities, such as alerting when a participant enters a room or unmutes a microphone, the central machine can You can serve endpoints simultaneously. There is no noticeable effect even if the time zone changes.

• Resource sharing – If your function has some unique processing that is common to all endpoints, such as searching in a shared database. In these cases, shared processing can be applied once and reused across many or all endpoints.

• Information sharing – when all participants need to share the same information. For example, a shared whiteboard with no personal comments for each participant.

Most of the features described so far do not satisfy these three cases. To build a scalable video conferencing system that makes these capabilities available to all participants, AI capabilities must be distributed downstream and the various nodes must be equipped with appropriate AI computing power.

This provides multiple benefits:

• Latency – In virtual meetings, instant results are essential for smooth interactions, including real-time translation, content creation, and video coordination. Leveraging AI generated on edge devices reduces latency, ensuring fluent, lag-free discussions and a seamless user experience.

• Cost – The cost of a monthly subscription to a cloud-based generative AI tool can be daunting for many organizations. With so many tools for different user needs, such as chat, search engines, and image and video creation, costs can quickly reach hundreds of dollars per user per month, further straining your budget. there is. By moving the generated AI to the user's personal computer or conferencing device, the user becomes the owner of the tool without the need for monthly subscriptions or long-term contracts, providing a more economically viable solution. .

• Bandwidth and connectivity – Virtual meetings are often affected by a lack of bandwidth, especially when participants have limited internet connectivity, such as when traveling or in remote locations. Edge-based generative AI can locally cut out irrelevant information, ensuring only relevant and important data is sent, enabling uninterrupted and productive meetings.

• Environmental impact – The impact of cloud-based AI processing cannot be underestimated, and the process generates large amounts of energy consumption and pollution. Researchers from Carnegie Mellon University and Hugface measured the carbon footprint of various machine learning tasks.their findings We find that AI tasks that involve generating new content, such as text generation, summarization, image captioning, and image generation, are the most energy-intensive tasks. According to the findings, the most energy-intensive AI models, such as Stability AI's Stable Diffusion XL, produce nearly 1,600 grams of CO2 per session. This equates to approximately the same environmental impact as driving 4 miles in a gasoline-powered car.

Edge devices offer a more sustainable option for generative AI, contributing to a greener approach to AI conferencing by lowering power consumption, minimizing cooling requirements, and reducing carbon emissions. Masu.

Adding AI
In the not-too-distant future, AV integrators and designers will be able to install video conferencing systems that are GenAI-ready, delivering the performance, reliability, and security benefits of edge processing along with the benefits of GenAI. Masu.

These future video conferencing systems that process AI directly on edge devices will require closed-loop systems that can handle some of the things that are currently done in the cloud. Processing AI on devices such as laptops, room devices, and cameras helps ensure meetings run smoothly and affordably while making AI-generated content such as automated summaries and dynamic presentations more secure. It will be.

Hailo offers purpose-built AI processors to efficiently process AI models at affordable prices suitable for a variety of edge devices. The company is currently working with conferencing manufacturers to integrate AI processors into the hardware to power the video conferencing systems of the future.

Avi Baum is Chief Technology Officer; Hirois an AI-focused Israeli-based chipmaker that has developed a purpose-built AI processor that enables data center-class performance on edge devices. Baum has over 17 years of experience in systems engineering, signal processing, algorithms, and telecommunications, with the past 10 years focused on wireless communications technology.

Source link