Data Quality Is Now The Primary Factor Limiting GenAI Adoption
When generative artificial intelligence (genAI) burst into prominence with the release of ChatGPT in 2022, technically savvy business users quickly began experimenting. At that time, existing tools were limited to a fairly select set of use cases, and trustworthiness was low. With few ready-to-use genAI apps available, a lack of know-how was a primary impediment to pursuing genAI solutions, especially for more specialized business cases.
The core technology for text-based genAI is the large language model (LLM). The complexity and resources required to create LLMs puts them out of reach for most companies. Today, businesses looking to implement genAI use cases have a selection of existing LLMs available for them to use and customize. The barrier to entry has dropped within reach of almost any mature technology team. For companies that lack the skills or ambition to work directly with LLMs, software platforms across almost every business function are now offering out-of-the-box genAI features for daily use, with little to no specialized skills required.
If 2023 was the year of genAI experimentation, 2024 is shaping up to be the year to launch genAI into production solutions to serve customers and customer-facing roles. The ability to summarize vast quantities of unstructured data and generate creative content (including text, images, and even video!) is intriguing to revenue officers and the broader executive team. Forrester’s September 2023 Artificial Intelligence Pulse Survey showed that 70% of enterprise level companies* are already using genAI and another 20% are exploring its use.
The Common Denominator Is The Quality Of The Data
A lot can go wrong between the user request, interpretation of the question, how the response is generated, and how the response is communicated back to the user. Regardless of which technical path your business pursues, the primary limiting factor you’ll face today is your own data quality. The old adage “garbage in, garbage out” is even more true for genAI. GenAI places unprecedented strain on your capabilities for data governance for several reasons:
- GenAI consumes data at a new level of speed, scale, and complexity. Data and operations teams managing traditional business use cases focus on curating and cleansing defined data sets. GenAI consumes structured and unstructured data alike, allowing access to insight generation at a speed and scale never before seen, including data types that most businesses do not actively manage.
- GenAI uses data to generate insights unpredictably. Measurement and analytics teams are accustomed to controlling the gateways through which end users query available data. Insights are delivered through reports and dashboards, with each offering a curated experience with limited scope. GenAI grants access to a vast repository of data and will make intuitive leaps to support user queries. Data management teams can no longer predict which data must be cleansed to deliver accurate insights because they no longer control which questions are being asked.
- Security, privacy, and consent require new processes that don’t exist today. Managing data security and privacy in traditional business use cases relies on controlling source data. Data that violates compliance standards is purged, and data security relies on controlling access to specific data sets by approved users. GenAI models do not rely on active queries of source data to fulfill requests. Once training data has been ingested, data teams can no longer easily control which users are allowed to access which data elements. Security and compliance depend on knowing each end user’s appropriate level of access. With no current standard linking genAI models back to their source data, this creates new levels of uncertainty and risk. In the AI Pulse Survey mentioned above, data privacy and security concerns are seen as the greatest single barrier to adoption of genAI by B2B enterprises.
The Data Challenges For GenAI Demand A Different Approach To Data Quality
Managing data quality for genAI use cases demands a different set of skills from operations teams. It also requires retraining teams to manage new concepts. At a high level, this mind shift has a few key themes:
- Operations teams must be more closely aligned to technology resources. This partnership of technical skill and business insight are critical to generate trusted responses from any genAI tool.
- Data steward roles must expand their ability to provide domain expertise, taking on new roles as the arbitrators of accurate insight generation.
- Data management must move from the cleansing and control of discrete data sets into the ongoing, active curation of conversations, both prompt and response.
Each of these themes and several others will be explored more deeply in our upcoming presentation at Forrester’s B2B Summit North America in Austin, Texas, on May 5–8, 2024. Our presentation “Is Data Quality Getting In The Way Of GenAI Trust?” will be one of several sessions focused on helping attendees prepare their businesses for the massive changes that genAI is already driving. We will also be available to attendees to schedule one-on-one discussions, while Forrester Decisions clients can request guidance sessions with either of us on these or similar topics at any time.
*Correction: An earlier version of this blog incorrectly described this metric as 70% of B2B companies.