Put Your Computer Vision Models In “The Matrix” With Synthetic Data
As artificial intelligence charges forward on many fronts, computer vision continues to be one of the most, if not the most, critical method of connecting the real and digital worlds. Computer vision is now well out of niche implementations and use cases and has mass-market appeal across industries and applications. Despite its usefulness, computer vision is hamstrung by the nature of its real-world data being messy, holey, and often very personal. Surprised? Don’t be. Even with the overwhelming volume of image and video content created every day, most of the data may be unusable due to missing data, mislabeling, and concerns for customer privacy.
Enter synthetic data for computer vision. Synthetic data itself is a broad category (which my colleague Jeremy Vale and I will be describing and mapping in an upcoming report) and has a growing number of use cases in many industries. Computer vision is one of the most advanced areas of application for synthetic data, and there are an ever-widening number of use cases. Think your enterprise doesn’t have a place for synthetic data? Well, if there’s any place where your business process interacts with real people or assets, it might be time to reconsider.
Synthetic Data Sharpens The Focus Of Computer Vision
There are a very significant number of publicly available image and video data sets to train machine-learning models, so what is the appeal of synthetic data? For enterprises that are working on more niche use cases, have complex and evolving data labeling requirements, or even that are trying to innovate into totally new lines of business, these data sets will likely be sorely incomplete and inefficient. Instead, companies are leveraging tools that allow them to programmatically generate and customize image and video data that meets the needs of the challenge they are trying to address. Some of these use cases include:
- Preventive maintenance. Your company needs to predict when a train coupling is going to possibly fail, and the only method is visual inspection. How will a computer vision model know when that coupling might lose integrity? This can happen when a model is trained on a synthetic data set that was generated to show a wide variety of different scenarios for a failing widget. The synthetic data set can be generated using one of the many tools available and verified by employee technical knowledge.
- Driver safety. Self-driving cars were a significant application for synthetic data over the past decade. Even as it becomes clear that most of us will have to keep our hands on the wheel, synthetic data offers a host of additional applications in and around vehicles. For example, onboard driver monitoring is becoming a consumer and regulatory requirement in many markets. Generating real data for this can be prohibitively expensive, is prone to errors, and the results are not adjustable and flexible. Synthetic data tools let companies define their needs and take into account all known user scenarios.
- Active customer engagement. Enterprises want to better engage with their customers and build relationships that often require an understanding of their reactions and emotions. Training models that can understand and make decisions based on human facial data have significant and obvious privacy and security implications, particularly in markets where governments have started to step in to regulate digital privacy (e.g., the EU with its GDPR).
Build A Multiverse To Train Models With Synthetic Data
It turns out that creating a synthetic universe doesn’t have to be that hard. One of the most accessible techniques for enterprises getting started in creating synthetic data for computer vision is to use popular commercial gaming engines like Unity or Unreal. These platforms allow for the quick generation of highly customizable landscapes and interactions as well as high graphical fidelity. Critically, for building computer vision models, they also offer easy and flexible routes to labeling and tagging of the data for training. For enterprises going into more complex and niche use cases (e.g., requiring thermal or X-ray data), there is a burgeoning landscape of vendors providing their own offerings built with specialized engines (such as Sky Engine AI or Datagen). There is an opportunity today in nearly every industry to take advantage of the expanding capabilities of computer vision to optimize business models and gain competitive advantage, and synthetic data offers a path to open computer vision’s eyes for your enterprise.
Have more questions? Please schedule a call with me via Forrester inquiry.