Try Apple's Lightning-Fast video caption model from your browser

AI Video & Visuals


A few months ago, Apple released FastVLM, a visual language model (VLM) that offers high-resolution image processing nearby. Now, if you have an apple silicon-powered Mac, you can use it for spin. This is the way.

When first covered FastVLM, he explained that it leveraged MLX, Apple's proprietary open ML framework designed specifically for Apple silicon, providing video captions that are more than three times smaller than similar models and up to 85 times faster.

Since then, Apple has been working on this project more. This allows you to hug your face as well as Github. When you hug your face, you can load a lighter version of FastVLM-0.5B into your browser and check it out yourself.

Depending on the hardware, it may take a little time to load. It took me a few minutes on my 16GB M2 Pro MacBook Pro. But as soon as it was loaded, the model began to accurately describe my appearance, the room behind me, the various representations, and the objects I could see.

In the lower left corner, you can adjust the prompts the model considers when updating the caption live, or choose from a few suggestions, such as:

  • Please explain what appears in one sentence.
  • What is the color of my shirt?
  • Identify the text that appears or written content.
  • What kind of emotions and behavior are depicted?
  • Name the object you are holding in your hand.

If you want to take things further, try feeding the video into the tool using the virtual camera app and explaining multiple scenes in detail until it becomes difficult to understand what's going on. Of course, the actual use cases are different, but this highlights how quickly and accurate the model can be.

What is particularly interesting about this experiment is that it runs locally in the browser. This means that data will not leave the device and may run offline. Of course, this makes for a great use case for wearables and assistive technologies where lightness and low latency are paramount to unlocking better use cases.

It is worth noting that Demo runs on a lightweight 0.50 million parameter model, while the FastVLM family also includes larger and more powerful variants with 1.5 billion and 7 billion parameters. Larger models may offer even better performance and speed, but running directly in the browser could be no go.

Have you tested it? Share your thoughts in the comments.

Accessory trading on Amazon

FTC: We use income-earning car affiliate links. more.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *