At the Google Cloud Next conference, Google introduced a new computer vision platform, Vertex AI Vision, that simplifies the process of building analytics based on live camera streams and videos. Currently, in preview, Vertex AI Vision is an extension of AutoML Vision that can train models to perform image classification and object detection.
Vertex AI Vision provides a canvas to build end-to-end machine learning pipelines covering the entire spectrum of computer vision inference and analytics. It targets business decision-makers and analysts who want to build analytics based on computer vision without dealing with complex code. Vertex AI Vision also has an SDK for developers to extend the functionality and embed the output in web and mobile applications.
Enterprises have already invested in dozens of surveillance cameras and CCTVs that are constantly generating video streams. On the other hand, there are multiple pre-trained models that can perform sophisticated image classification, object recognition, and image segmentation. But connecting the dots between the sources of data (cameras) and ML models to derive insights and intelligent analytics demands advanced skills. Customers need to hire skilled ML engineers to build inference pipelines to derive actionable insights.
Vertex AI Vision addresses this challenge by providing a no-code environment that does the heavy lifting. Users can easily connect remote streaming inputs coming from existing cameras to ML models to perform inference. The output from the video streams and models is stored in a Vision Warehouse to extract the metadata. The same output can be stored in a BigQuery table, making it easy to query and analyze the data. It is also possible to see the stream output in real-time to validate and monitor the accuracy of the inference pipeline.
Vertex AI Vision has multiple pre-trained models that can be quickly integrated into the pipeline. The occupancy analytics model lets users count people or vehicles given specific inputs added in video frames. The Person blur model protects the privacy of people who appear in input videos through distortion, such as masking or blurring people’s appearance in output videos. The person/vehicle detector model can detect and count people or vehicles in video frames. The motion filter model reduces computation time by trimming down long video sections into smaller segments containing a motion event.
Apart from the pre-trained models, customers can import existing models trained within the Vertex AI platform. This extends the functionality by mixing and matching various models.
The new platform is based on Google’s responsible AI principles of fairness, safety, privacy and security, inclusiveness, and transparency. Google claims that the new Vision AI Vision platform will cost only one-tenth of the currency offerings. While in preview, the pricing details are not disclosed yet. The service is available only in the us-central1 region.
In its current form, Vertex AI Vision is not integrated with Anthos and cannot be run in a hybrid mode within the data center or at the edge. Customers are expected to ingest video streams to Google Cloud to run the inference pipeline. Industry verticals such as healthcare and automotive demanding high throughput and low latency cannot take advantage of Vertex AI Vision. Google must consider deploying the Vision AI applications at the edge with the output stored within a local warehouse.
Google’s Vertex AI Vision competes with no-code/low-code platforms such as Amazon SageMaker Jumpstart and Azure ML Designer. With the rise of large language models and advances in natural language processing based on transformers, expect to see the no-code development platforms extended to support conversational AI.