AI Needs Synthetic Data To Build A Real Future

Rowan Curran, Principal Analyst

Sep 7 2022

It’s a hazy Saturday morning in Southern California when a struggling actor gets a call. “They want me to do what?” he asks his agent incredulously. “OK, tell them I’ll be there.”

He spends the next 24 hours doing everything he can to stay awake, per the instructions of his agent. Finally, the time comes, and he arrives on location with bleary eyes. After some brief introductions, he walks out onto the set for his big moment in the limelight: The cameras start rolling, and he promptly falls asleep in a prop car — just as he’d been instructed.

This is hardly the actor’s big break. In fact, the only viewers of this deft performance will be a lean team of data scientists.

The gig did not come from a major Hollywood studio, but rather an auto manufacturer that put out a multimillion-dollar RFQ to gather images of drivers falling asleep at the wheel. The carmaker is collecting this data to advance a burgeoning use case in computer vision – driver monitoring systems (DMS), the automatic in-cabin detection of distracted or drowsy driving. It’s a slow, expensive process, but hey, they need the data to feed their models.

This real (albeit dramatized) example comes from a company that believed this is the only way to get training data for the computer-vision-powering part of the DMS’s AI. Many ML methods, and specifically computer vision, require a wealth of curated, annotated, and representative data in order to build accurate prediction models. Thus, the car company paid actors spanning demographic groups to participate in this seemingly bizarre setup to collect it. When it came to model building, however, the data from the actors didn’t cut it. The carmaker’s Plan B was to partner with a synthetic data company to programmatically generate a data set of synthetic images of cars and humans that were rendered on a computer. This gave the company a much larger training data set of high-quality images with frame-perfect annotations to help its client.

Computer vision is just one of the current use cases for synthetic data. While it is no panacea, it has the potential to supercharge existing AI initiatives and unlock others that have historically been hampered by data challenges that are too costly or even impossible to overcome. It offers a host of other benefits, too, including both mitigating privacy concerns and reducing governance challenges often associated with sensitive information. For example, synthetic data vendors in the healthcare space generate fake patient data with statistically similar properties to real populations of interest, enabling healthcare organizations and researchers to ethically work with regulations like HIPAA and their own internal review boards and share data more readily.

Now’s the time to get started on your synthetic data journey. Buckle up, and read our full report to put yourself in the driver’s seat for your most important AI initiatives.

Get The Insights At Work Newsletter

Country*

Yes, I’d like to receive Forrester’s Insights At Work newsletter and receive occasional survey invitations and marketing communications.

Thanks for signing up.

Stay tuned for updates from the Forrester blogs.

Categories

See Rowan Curran at:

Get The Insights At Work Newsletter

Thanks for signing up.

Ultimate Guide To The Top 10 Emerging Technologies

Discover the top 10 emerging technologies shaping 2025, based on Forrester’s exhaustive research. Explore the impact, use cases, and benefit horizons of technologies like agentic AI, synthetic data, quantum security, and more.

HPE Discover 2025: AI Is Infrastructure; Infrastructure Is AI

Why AI ROI Remains Elusive Despite Widespread Adoption

Get The Insights At Work Newsletter

Thanks for signing up.