Data Strategy: Synthetic Data and Other Tech for AI’s Next Phase
- by 7wData
You’ve finally gotten your enterprise’s Machine Learning and artificial intelligence into production and your top executives are expecting results. Just one question: Do you have enough quality data to train those algorithms?
Now that enterprises are plowing ahead with these initiatives, sourcing data for the always-hungry algorithms will be a constant item on the to-do list. There can be obstacles to gaining access to needed data. There’s a limited amount of data that can be collected and cleaned by your own enterprise. New and existing privacy rules can limit data collection and storage. And there are some events that are so new that there’s not much if any data available to train an algorithm -- say for a pandemic that leads to a supply chain crisis.
One solution to these all these use case challenges is synthetic data. The topic will be among many covered by Forrester at their Data Strategy & Insights event, December 6 and 7, as organizations lean into the next era of Machine Learning and other artificial intelligence in the enterprise. Forrester analyst Rowan Curran will be among the presenters of a session on the synthetic data topic, “The Value of Tilting at Windmills: Synthetic Data in AI and Beyond at the event. Curran spoke with InformationWeek about the upcoming session and the promise of synthetic data.
Synthetic Data: What is it?
According to Forrester, synthetic data is training data of any type (structured, transactional, image, audio, or other types) that duplicates, mimics, or extrapolates from the real world but maintains no direct link to the real world, particularly for scenarios where real-world data is unavailable, unusable, or strictly regulated.
“This is something that I think will become super interesting and a very important part of the AI landscape moving forward,” Curran says. He offers a couple of use cases to explain the potential of synthetic data.
For instance, one use case of synthetic data was designed to help auto makers collect computer vision data about what sleepy drivers look like. This was to comply with driver monitoring systems that may become a regulatory requirement in Europe and the US. Here are two options for how a company would collect that data. In Plan A, the company would hire actors from multiple demographic groups to feign fatigue, distractedness, and sleepiness, explains Curran. But this is an expensive and time-consuming process when organizations typically need lots of data quickly. Plan B called for partnering with a synthetic data company to simulate images of people looking tired, fatigued, sleepy, or distracted. This process yielded a much larger training set of quality images.
Curran explains that other applications of synthetic data could help, say, the human resources organization in a large multi-national company.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More