If your inbox (or social media feed) looks anything like ours, you’ve likely been seeing announcement after announcement of vendor integrations with OpenAI’s ChatGPT or other generative AI technologies. The customer service technology world is particularly saturated with these announcements; it seems like we can’t go a day without seeing another one. This is not a huge surprise, of course — generative AI is super hot right now, and many want to be first to the party.

Now, we love a good party as much as anyone. But friends don’t let friends deploy risky AI!

So here’s a PSA from your friendly neighborhood Forrester analyst on how to party (or, evaluate generative AI solutions) responsibly!

Don’t Believe Everything You Read On The Internet

Many vendor press releases are claiming integration with ChatGPT, likely hoping to surf the wave of generative AI hype. OpenAI has not yet made its ChatGPT API available for others to use, however, and last we heard, OpenAI has not even begun working through its mile-long wait list. It is far more likely that vendors have integrated with a different large language model (LLM), provided by OpenAI or otherwise, and are crafting these announcements carefully to imply a closer connection (and capability) to ChatGPT than they actually have.

Customer service vendors have been leveraging LLMs for several years, though usually not in a generative capacity. LLMs — including OpenAI’s GPT-3 — allow for higher performance of many natural language processing (NLP) tasks such as language modeling, sentiment analysis, named entity recognition, or text classification. Leveraging LLMs to improve accuracy of solutions such as conversation analytics or chatbots is a viable approach, but it doesn’t mean that the software will have generative text properties. The vendors publishing these announcements are likely relying on buyers not knowing the difference.

It’s also important to consider that many of the high-profile LLMs (like GPT-3) are not trained on contact center use cases and are unlikely to be ready to go out of the box. Some vendors, however, have done the work in the past couple of years to create their own LLMs that are tuned specifically to customer service scenarios and will be much more performant and effective.

So What Does That Mean For You?

  • Ask the hard questions. If you’re on the receiving end of a pitch implying equivalent capability to ChatGPT, push on the vendor for details. What model(s) or API(s) are you actually using (e.g., Davinci-003, BERT, GPT-J)? What specific NLP tasks is your product designed to handle, and how does the LLM fit into that process? How long have you been leveraging LLMs, and what benefits have your customers seen? If they are claiming that their models are generating unique content, you might also ask: What controls have you put in place to ensure that the content is accurate and appropriate, and what measures are in place to catch inappropriate content? How much control will your clients have over the output, or is that entirely governed by the model? How can the LLM be fine-tuned or customized to handle domain-specific use cases?
  • More parameters aren’t necessarily better. Many of the LLMs we’re all reading about are trained on >100 billion parameters — that’s a lot. For one-off training tasks (such as training a base natural-language model or generating training data offline for use by smaller models), leveraging the extremely large LLMs (ELLLMs? just kidding) makes perfect sense. For online use cases, however, using models of this size can be extremely cost-prohibitive and, frankly, overkill. A smaller (1–10-billion-parameter) LLM that is domain-specific can outperform larger models at much lower cost.
  • Watch for undifferentiated bolt-ons. The generative AI models we’ve all been reading about are still extremely new, and while we are certainly seeing extremely rapid rounds of experimentation and innovation, building valuable products takes time (and research, and validation, and testing). We expect to see a lot of gimmicky ChatGPT dupes in the coming months. When evaluating vendor solutions claiming integration with these high-profile generative models, ask yourselves: Is what the vendor is showing uniquely theirs? Or could any other vendor in its category plug in the same API and instantly tell the same story? If the answer is yes, we’d pass.
  • Avoid consumer-facing applications. At least for now. We’ve written before about some of the risks of deploying generative AI tools, including their tendency to produce coherent nonsense. We do not yet feel confident that there are sufficient enterprise controls in place to protect against the reputational risk of customer-facing solutions generating incorrect or inappropriate content. Using generative AI to support your internal conversational designers or speech analysts? Love it. Using it to generate chatbot responses on the fly? Not yet.

But we’re not total wet blankets. These advancements in AI are truly revolutionary. We’re witnessing a pivotal moment in AI unfold right before our very eyes. Enterprises absolutely can and should experiment with this technology. But just know that that’s what you’re doing: experimenting.

Any Forrester clients looking for guidance on where generative AI might fit within their broader customer service strategy (or for some support with evaluating vendor solutions), please get in touch.


Thanks to Rowan Curran for reviewing this blog post.