I created an AI ASMR food video using Google's VEO 3

Google's VEO 3 AI video model is a league that surpasses one important reason: one of its competitors. It encourages not only what is displayed on the screen, but also what you hear.

The first VEO model, built by Google's DeepMind Lab, debuted in May 2024, with each new generation adding more features. Compared to its competitors, he was always better at his motion accuracy and understanding physics, but adding sound was a game changer.

You can use it to encourage short commercials, movie scenes you're writing, or music videos. But what I've seen more than anything else is the kind of mild tapping, whispering and surrounding sounds that cause some people to feel tingling.

To see how far this can go, I created a series of ASMR food prompts. Each is designed to produce matching videos and make noises around the food.

Gemini logo

(Image credit: ShutterStock)

Prompt VEO 3 in the Gemini app

VEO 3 is now available in the Gemini app. Just select the video option when launching a new prompt, and enter what you need and you'll get an 8-second clip.

Gemini isn't necessarily the best way to access VEO 3, but I recommend Freepik, Fal, Higgsfield, or Google Flow. It's easy to use and I recommend getting the job done.

An important benefit of using Gemini directly is that it automatically interprets and enhances prompts. So when you ask for “a cool ASMR video featuring Lasagna”, that's what you get.

You can also use what is called a structured prompt to make it more specific. Label each moment with a timestamp and a description of the scene. However, unless precise control is required, a simple paragraph (aka the story prompt) is usually more effective.

Creating a prompt

The first task in an AI project is to think about the prompts. The model is better at interpreting intent, but it's still better to be specific if you know what you want.

I wanted an ASMR food video so I started off with a test: “SOUND's ASMR food video”.

result? It was right. It essentially gave me the lasagna I had in mind. I then improved it – I've given an overview of certain food types, added healthy descriptions, and even tried structured prompts for soda drinks with ice.

Most of the time, the story prompt is perfect. Explain what you want to see, the video flow, and how the sound should pass.

1. Just as lasagna can be baked from bread

Google Veo 3 Lasagne Video -YouTube

Please take a look

The first prompt, “ASMR Food Video With Sound,” produced a stunning clip of a man who slides his fork into a slice of lasagna. When the fork enters you will hear a squish and then a mass of it as it hits the plate. This would hope that the VEO 3 has an “Expanded Clip” button.

I was not encouraged by anything else, so I had no way to identify what the food would turn out, how the sound would come out, or even if the sound worked. This is why it's important to be specific even in chatbots like Gemini when prompting AI models.

2. Cooking and food

Google Veo 3 Cooking Video – YouTube
Google Veo 3 Cooking Video - YouTube

Please take a look

Next, I went more specific. Longer, more story-style prompts, I asked VEO 3 to generate a close-up of a chef preparing and eating satisfying food in a bright kitchen.

I asked for a slow motion visual with the ingredients chopped in, the hot sounds of butter melting in the pan, and the crunch for the chef to take a bite.

I've also added this line. “Emphasis on audio quality: Clean and layered ASMR soundscape without music” tells you not only the sound, but the style of the sound and what I don't want to hear.

3. Popcorn pop

Google Veo 3 Popcorn Video – YouTube
Google Veo 3 Popcorn Video - YouTube

Please take a look

For the final prompt, I started with the image. I used the Midjourney V7 to create a photo of a woman looking at Rainbow Popcorn and added a prompt called “ASMR Food” to Gemini.

Visually, the results were surprising, but for some reason the woman says in a voiceover, “This is tasty, this rainbow popcorn.” It's with me – I didn't specify what she should talk about or what she should say.

Simple fix: Please include the required speech in quotes. For example, I might have urged her to say, “I love watching popcorn pop,” and emphasized the word pop. She could also specify that she was talking on camera. Veo3 would have been in sync with matching lip movements.

Conclusion

Overall, the VEO 3 delivers impressive results, especially when producing high-quality sounds that accurately reflect the visuals. There are a few habits to navigate, such as unintended narration and slightly sturdy lasagna, but these are easily addressed with more specific prompts.