Most of Google's updates to Gemini are not outstanding to me. I have not yet improved that hallucination rate significantly, and Ability to summarize news And the weather leaves a lot to be desired. However, a recent update that added video analytics to Gemini has caught my eye as a tool you might use regularly.
Gemini Video Analysis Based on existing AI features to summarise YouTube videos. I used this tool for my test run to see how powerful it is and how it will be used in my daily life.
How well does Gemini video analysis work?
For testing, I chose different videos from my camera roll and asked Gemini different questions each time. Depending on what you ask, Gemini analyzes the video differently, so I asked the most relevant questions about the video.
Test 1: Object Recognition
Gemini correctly identifies the type of duck in my video at several prompts and even correctly identifies where the video was filmed thanks to the background signs.
The sign only displayed the business name, but Gemini was able to locate where the video was recorded within 100 meters. However, the video clues (business name, mandarin duck, canal) would have led humans to the correct answer within minutes.
Ducks swimming in the canal
Test 2: Location recognition
I was very impressed with Google's ability to see where my videos were, but there were plenty of clues to help with that. The next test used a video of the eruption of the Kilauea volcano in Hawaii, which was filmed in May. Gemini was able to correctly identify the volcano, but was unable to determine the date (video was filmed on May 26th).
Active volcanic eruption at night
Test 3: Location recognition
Like Gemini's other analytical features, you need to ask the right questions to get the right answer. This video filmed a small parade in Carneval, Cologne last year. Gemini was confused.
When I asked where the video was filmed, it couldn't answer me, but it managed to encourage and identify the country further. Interestingly, the prompt revealed that it was aware that the video was a Carneval Parade, but could not identify the city.
We retested the Gemini using a video from Karneval's main parade (which visually contained a largely visual cues), but despite the amount of street signs, shopfront and Karneval costumes shown in the video, we were unable to identify that the video was filmed with Cologne.
City Street parade marching band
Test 3: Audio recognition
I was personally interested in Gemini audio recognition. It's convenient to identify songs that are currently playing, but picking up songs in the background from old videos is even more useful for me. Unfortunately, Gemini's outcome here was uneven at best. Here are some of my results:
-
I mistakenly identified a 22-second recording of “Solid Rock” by Dire Straits as Haim's “Inking Alone.”
-
A 15-second recording of “Surfing with the Aliens” by Joe Satriani was mistakenly identified as “not stopping” by the red hot chili peppers.
-
I correctly identified a 57-second recording of “Rike a Rolling Stone” by Bob Dylan. We also identified the song from the 11-second recording.
-
Incorrectly identifying the 11-second recording of “Wild Flowers” by Tom Petty, Dupree states, “You belong to me.”
We further tested Gemini with videos of different lengths. The accuracy was positively correlated with the length of the recording, but what surprised me was how wrong it was.
I highly recommend comparing the above tracks to see what they are different from reality. Honestly, how does Gemini, Tom Petty sound like Dupree?
Test 4: Explain what happens in the video
One of the more practical uses of Gemini is to explain what happens in the video if you don't have time to watch it yourself. I used one of my favorite videos, a friend's cat fight clip. Gemini made an attractive take on this clip.
You can clearly see the attacks of black and white cats and then drive away the black cats, but Gemini concludes that the cats have begun fighting (although the attackers in particular clearly use passive voices here), while the black cats have driven away the black and white cats.
Gemini's take here is misleading and means users have completely misunderstood the situation.
However, follow-up questions have led Gemini to correctly identify the video attacker. This is an interesting example of harmless interactions between cats, but it's a great example of how Gemini mislead users. What if you use Gemini to analyze videos of people fighting?
Overview of AI Videos in the Gemini App
Gemini's video analytics is as unreliable as other AI services
The first test we did in Gemini's video analysis was the eruption of Kilauea volcano. This impressed me, but for most of the subsequent tests, Gemini failed to deliver. Hard data like signs is needed to accurately identify the location, and the song is poorly recognized Google's song search tool (It is also included in the Gemini app).
The most interesting test was found to be Gemini, analyzing cat fights. I was able to properly analyze the video after multiple prompts, but it took longer than I could watch the video. In conclusion, I will stick to watching and analyzing the video myself and shelving Gemini again.
