For those of you old enough to have had it in rotation on a turntable or a CD player (let alone on eight-track, reel-to-reel, or cassette), The Who’s Who’s Next is a great album. One of the best lyrics comes in the final song, Won’t Get Fooled Again: “Meet the new boss… same as the old boss…”
Well, video analytics is kind of like that (ok, maybe not, but bear with us). It is the “all new” IoT data source that everyone is going crazy about all over again.
From a technology perspective, there is little surprise as to why:
- Video analytics requires lots of hardware (CPU/GPU/storage) and consumes lots of cloud services.
- It creates opportunities for software developers to create highly specialized products to focus on specific use cases, industries or domains.
- It was tailor made for an edge-to-enterprise – or edge-to-cloud architecture.
- It leverages advanced AI and machine learning algorithms that are best built and trained in the cloud and then deployed locally at the edge.
From a business and use case perspective, the problems really haven’t changed much over the past decade, yet the capabilities are now there to do so much more across a range of industries and domains:
- Retail Loss Prevention, Safety, Marketing and Merchandising solutions
- Industrial Safety and Security solutions (“worker down,” “sleepy operator,” buddy system compliance)
- General Security solutions (suspicious objects, lingering or loitering, unauthorized access)
- Manufacturing Quality solutions (visually inspect for defects or needed rework)
- Identity solutions (facial recognition and beyond)
In all cases, video analytics, like all IoT solutions, is enabling context to be derived from data. And in real-world solutions, computers can monitor dozens or even hundreds of simultaneous video streams. They don’t blink, and they don’t look away.
What kind of context does video analytics derive? That depends on the use case, but in all scenarios it creates actionable events derived from the data. Has somebody jumped a fence? Perimeter violations are easily addressed. Has somebody wiped out the razor blade shelf at the local pharmacy? Maybe the actor intends to pay for all the product he/she has picked up. Maybe not. At least a loss prevention professional can be made aware instantly of the situation before a would-be thief has the chance to leave the store.
Recently, CTP had the opportunity to showcase video analytics in action at Hannover Messe in Germany, the largest industrial/manufacturing trade show in the world, with over 230,000 annual visitors and exhibitors. Our showcase was developed in partnership with Intel and featured in Google’s booth in the Digital Factory hall.
Our primary goal was to show we could leverage cloud-based machine learning to train analytical models that could then be run locally at the edge (simulating a factory floor and control center).
The showcased models we built were designed to look for three things:
- Legacy analog gauge monitoring
Since having real fire and smoke on the expo floor would have been frowned upon, we simulated them using a very real looking LED lamp and a misting humidifier. What actually was real, however, was our ability to monitor an analog gauge, and this seemingly mundane capability resonated broadly with expo attendees.
Many of those folks have legacy industrial systems in place, some of them decades old, that are simply too risky to change or too inaccessible to feasibly retrofit and instrument with modern sensors. And in very practical terms, it is simply too difficult for a human to watch and monitor these gauges. However, video analytics can do so continuously, without distractions, or needing to sleep, or take bathroom and coffee breaks.
Training the video analytics to monitor a gauge and watch for anomalies, such as “red line” events, proved to be a remarkably useful capability. Furthermore, it was possible for us to “derive” actual numerical data (temperature, RPM, pressure, etc.), by converting the video image of the gauge needle into an analog/digital equivalent. How fast is that motor running? Is it 15% past critical? Has the pressure risen or dropped to unsafe levels? These questions and more can all be answered without modern sensors. Pretty useful stuff indeed!
Building the Demo
Let us take a look at the architecture and solutions components involved in our demo.
The key to making use of machine learning for the video analytics in this demo, without a team of data scientists, was utilizing an approach called transfer learning, which starts with a pre-trained model and uses it to extract image features to train a new graph1. The pre-trained model for this solution is called Inception-v3, which is a deep convolutional neural network. It was trained for the ImageNet Large Visual Recognition Challenge using data from 2012, with 1000 classes of images, such as “Zebra”, “Dalmatian”, and “Dishwasher.” From this, we were able to create specific classifications for smoke, fire and gauge readings in our demo.
Creating a model from scratch can take months, even with highly skilled analysts working together. With transfer learning, a new model can be built in hours, often with only a few dozen images to start, and the skills needed for software development in programming languages such as Python or C++.
Utilizing public cloud resources is the perfect fit for training models, as the processing of image sets can be distributed to parallel processing resources such as GPUs or TPUs2 , which are released when processing is complete. We used the Google Cloud Platform (GCP) and Cloud ML Engine to build and train the graph, which was then exported and deployed to run at the edge, which in this case is a simulated factory floor. Cloud ML Engine utilizes an open source machine learning technology called TensorFlow (also developed by Google). We ran the same TensorFlow Serving technology for the edge processing of the video stream.
Using a sampling rate of 3 frames per second in a video stream from an IP camera pointed at the demo box, each frame is sent to the TensorFlow Serving processor running locally, to detect smoke, fire, and the state of the analog gauge (green, yellow, or red).
- A real-time dashboard showing the current state and timeline for the processing. This would be monitored by someone on the factory floor, and also sends alerts when problems are detected.
- A near real-time stream of alerts sent through an IoT gateway to GCP via Cloud IoT Core over MQTT. These alerts are pushed to a dashboard hosted in the cloud, as well as Stackdriver Monitoring, where further threshold-based alerting can be performed.
In addition to the video analytics and alerting, Cloud IoT Core can also be used to push updates to IoT devices deployed around the globe. This allows for updates to the TensorFlow graph if new categories of detection were added to an updated model (Figure 2).
What Is Old Is New Again
If you are a regular reader of The Doppler, you are probably well aware of our position that IoT solutions have existed a long time under different names. You may also be aware of CTP’s heritage of building cloud-native IoT solutions. Video analytics is an exciting “new” space that the leading cloud vendors (our partners Microsoft Azure, Google Cloud, AWS) are all building new features and capabilities around.
Furthermore, Hewlett Packard Enterprise offers a suite of intelligent edge-based solutions that support high performing hardware and software solutions which enable IoT and video analytics solutions to be deployed close to your production systems.
As the cloud and IoT professional services arm of HPE, CTP is uniquely positioned to straddle the intelligent edge and the enterprise cloud worlds, while delivering real-world, outcome-based video analytics that meet your needs. Clearly what is old is new again.
Watch the full demo here.
Contributors to this article include Jason Parsons.