AI Needs Synthetic Data To Build A Real Future
It’s a hazy Saturday morning in Southern California when a struggling actor gets a call. “They want me to do what?” he asks his agent incredulously. “OK, tell them I’ll be there.”
He spends the next 24 hours doing everything he can to stay awake, per the instructions of his agent. Finally, the time comes, and he arrives on location with bleary eyes. After some brief introductions, he walks out onto the set for his big moment in the limelight: The cameras start rolling, and he promptly falls asleep in a prop car — just as he’d been instructed.
This is hardly the actor’s big break. In fact, the only viewers of this deft performance will be a lean team of data scientists.
The gig did not come from a major Hollywood studio, but rather an auto manufacturer that put out a multimillion-dollar RFQ to gather images of drivers falling asleep at the wheel. The carmaker is collecting this data to advance a burgeoning use case in computer vision – driver monitoring systems (DMS), the automatic in-cabin detection of distracted or drowsy driving. It’s a slow, expensive process, but hey, they need the data to feed their models.
This real (albeit dramatized) example comes from a company that believed this is the only way to get training data for the computer-vision-powering part of the DMS’s AI. Many ML methods, and specifically computer vision, require a wealth of curated, annotated, and representative data in order to build accurate prediction models. Thus, the car company paid actors spanning demographic groups to participate in this seemingly bizarre setup to collect it. When it came to model building, however, the data from the actors didn’t cut it. The carmaker’s Plan B was to partner with a synthetic data company to programmatically generate a data set of synthetic images of cars and humans that were rendered on a computer. This gave the company a much larger training data set of high-quality images with frame-perfect annotations to help its client.
Computer vision is just one of the current use cases for synthetic data. While it is no panacea, it has the potential to supercharge existing AI initiatives and unlock others that have historically been hampered by data challenges that are too costly or even impossible to overcome. It offers a host of other benefits, too, including both mitigating privacy concerns and reducing governance challenges often associated with sensitive information. For example, synthetic data vendors in the healthcare space generate fake patient data with statistically similar properties to real populations of interest, enabling healthcare organizations and researchers to ethically work with regulations like HIPAA and their own internal review boards and share data more readily.
Now’s the time to get started on your synthetic data journey. Buckle up, and read our full report to put yourself in the driver’s seat for your most important AI initiatives.