
The video AI space is fascinating and the possibilities are enormous. Vendors and customers alike are excited by the possibilities, envisioning a fascinating world in which intelligent machines can provide unparalleled insight and save humans from repetitive and tedious tasks. When evaluating vendor claims for AI and machine learning (ML) products, what questions should you ask to keep your feet on the ground and determine what the solution can and cannot do? It is important to know what is good.
Here are six key questions video AI vendors should ask to assess their true capabilities:
- Are you using image processing or ML?
- Does your ML training data include environments like ours? (initial training)
- Will your ML learn from the environment over time? (continuous training)
- Will ML run on-edge (cameras), on-premises (local servers), hybrid (split AI), or on-cloud?
- How can we protect ourselves from excessive misinformation?
- What is the hardware cost required to deploy the solution?
Now let’s dig deeper into each question.
1. Do you use image processing or machine learning?
For decades, image processing technology has required taking an image as input, analyzing that image pixel-by-pixel, and taking the result as output. This approach is limited by the scope of the computer algorithms themselves. For example, to detect a cat, we need to know all possible combinations of pixels that represent “cat”. If the algorithm doesn’t account for the half-hidden bobcat, it won’t be able to detect it.
Machine learning relies on pattern recognition. You give the machine a set of examples (a training set) and it “learns” how to detect that pattern. In the cat example, instead of coding the system pixel-by-pixel to detect cats, the ML system is fed thousands of cat pictures to learn what cats look like. Studies have proven this to be a more robust and accurate strategy for cat detection and other types of visual recognition.
Our recommendation: Given the amazing advances in machine learning these days, you should avoid vendors that use traditional image processing and only work with vendors that use ML.
2. Does your ML training data include environments like ours? (initial training)
Results from ML systems are only as good as the data used to train them. The GIGO principle (garbage in, garbage out) really applies here. Therefore, vendors should have robust training data with relevant environment examples to create ML models.
For example, if you’re trying to identify fire and smoke in an outdoor environment, training a model on fire and smoke detected indoors won’t help. Similarly, even fires and smoke in woodlands do not help the model to accurately detect these situations in the field. Many ML predictions fail not because of core model flaws, but because they were not properly and accurately trained for their intended use case.
Our recommendation: Make sure your vendor can ensure that environments like yours are included in their training data and that they aren’t using general-purpose off-the-shelf ML models.
3. Will your ML learn from the environment over time? (continuous training)
Life would be easier if you could focus all your energy on training your ML model correctly and move on. Unfortunately, as is often the case, the “set and forget” approach doesn’t work in ML.
“ML model drift” is like digital entropy. Remember “entropy” from high school physics (sorry for the flashback!). Anything left unattended becomes more chaotic over time. Similarly, ML models that are not regularly retrained tend to “drift”, degrading their predictive ability as the real-world environment around them changes slowly. This may have introduced a new type of camera, and we have to approach it with a whole new perspective. Alternatively, the seasons may change and the ambient lighting conditions may add color tinges that prevent objects from being identified correctly.
Whatever the reason, the only way to ensure the continued accuracy of your ML models, and thus the quality of your detections, is to establish a regular and systematic process to keep your ML models up to date. This is similar to the industrial approach. “Kaizen” or continuous improvement. This includes continuously monitoring detections, tracking false positives and false negatives, and retraining the model daily. If the vendor doesn’t do this, or he does it once a quarter or year, you’ll end up with tons of false alerts, or worse, silently missing critical issues.
Our Recommendation: Make sure the vendor is improving their models daily based on false positives and false negatives observed in their environment.
4. Does ML run on-edge (cameras), on-premises (local servers), hybrid (split AI), or on-cloud?
There is no “right answer” here. ML “smarts” can reside on the camera itself or be offloaded to on-premises servers. Alternatively, ML processing can be performed on the cloud. Finally, you can “split” your ML workloads between on-premises and cloud.
If the video analytics should be applied to only one camera, select “AI Camera”. One capital expenditure and you’re ready to go. Edge/camera-based ML is growing in popularity as camera chipsets become more powerful. Unfortunately, ML models get outdated quickly and updating the firmware is very cumbersome. As research progresses and new models come out, it becomes nearly impossible to keep these cameras up to date.
In on-camera analytics, basic features such as “people counting” do not work properly when there are multiple cameras. Imagine two cameras with overlapping viewable areas. You can see how that leads to double counting people in that overlapping area. Server-based or on-premises analytics are more flexible and can support more complex ML models and coordinate analytics from multiple cameras. However, on-premises GPUs and servers cost money (and space!) and can quickly deplete your budget. Add to that the need for ongoing maintenance, so it’s no wonder that on-premises analytics are deployed in a limited fashion.
Cloud ML implementations offer greater flexibility by allowing you to easily incorporate the latest advances while completely avoiding on-premises complexity. The downside is that videos have to be uploaded to the cloud for analysis, taxing low-bandwidth environments.
The best solution is a hybrid model such as Split AI that splits ML analytics between server and cloud. This reduces the upload bandwidth required to analyze videos in the cloud while eliminating the need for expensive on-premises GPUs. This allows you to get the best of both while offsetting the shortcomings of on-premises and cloud.
Our recommendation: Buy a single AI camera if you want a one-time solution. Get an on-premises solution if you have multiple cameras and enough hardware budget. But if you have enough bandwidth, get a cloud solution instead. Choose the Split AI hybrid solution to optimize both on-premises cost/space and bandwidth.
5. How can we protect ourselves from excessive misinformation?
All alarm systems can lead to false alarms. If a vendor does not improve their ML models day by day (see #3), they will “drift” in the direction of gradually increasing false alarms. How do you deal with this issue?
Good vendors, such as some weapons detection companies, have systems in place in conjunction with control centers that check for alarms before issuing them. Even better, general-purpose services like Screener+ can work with any AI model to ensure that 99% of false alarms are never delivered. Combined with the continuous improvement process of ML models, the system becomes smarter and less false alarms. On the other hand, if false alarms can be reduced in a cost-effective manner, the introduction of ML has achieved its objectives.
Our Recommendation: Allow vendors to answer how to deal with the spike in false alerts. The cost of dealing with false alerts falls on you, not the vendor.
6. What is the hardware cost required to deploy the solution?
An AI/ML configuration with unlimited access to CPU/GPU is ideal. But resources come at a cost, and most companies negotiate the trade-off between the technology that’s right for them and the cost of deploying it. Therefore, it is important to check whether the hardware is already included in the price of the solution you are evaluating and to ensure that the full price is explicitly stated. Otherwise, you must purchase, configure, maintain, and upgrade (potentially expensive) servers or other devices yourself to ensure continuity of your ML services.
Our Recommendation: Consider the “full load” cost, which includes not only the vendor’s license fee, but also the purchase cost, installation, configuration, and ongoing maintenance of such hardware.
These six checklist items are by no means exhaustive. However, you should have enough background to help you make an informed and informed decision about which video AI solution best fits your needs and environment.