What does synthetic data mean for business?
Synthetic data is often a lower-cost, faster way to access vast quantities of data than traditional data collection and curation methods. This means it has the potential to turbocharge the data-driven transformation of every industry by becoming the foundation for training machine-learning models and AI, which in turn enables the development of new products, services, and ways of working—finally delivering on the promise of “big data” that got us all so excited a few years back.
Synthetic data is already being used in many industries. Amazon used synthetic data about speech patterns, syntax, and semantics to improve multilingual speech recognition in its Alexa virtual assistant. The UK’s National Health Service (NHS) has converted real-world data on patient admissions for accidents and emergency (A&E) treatment into a statistically similar but anonymized open-source dataset to help NHS care organizations better understand and meet the needs of patients and healthcare providers. This kind of health data has also been leveraged by Alphabet and US insurance company Anthem to improve insurance fraud detection.
However, this is still relatively early-stage tech, and as with any other machine-generated information, the output is only as good as the inputs and algorithms. Anomalies and outliers in the source data can be amplified or lost altogether; either option will make the end product less representative of the real data it’s meant to replace. Synthetic datasets might also accidentally retain some personally identifiable information from the source, which could violate people’s privacy and expose organizations using the data to legal action.
Generative AI has been known to “hallucinate” incorrect information, when it fails to recognize anomalies in the underlying model and draws conclusions that seem statistically likely, but are not supported by the actual data. Any synthetic datasets created from those hallucinations are then affected. Some fear that because of this phenomenon, the proliferation of synthetic data could, over time, introduce feedback loops that would make AI-generated information less reliable.
Ensuring the value of synthetic data will require robust human due diligence. Following the guidance of PwC’s “Responsible AI” toolkit can help.
For all the latest Technology News Click Here
For the latest news and updates, follow us on Google News.