Announcing Our Inaugural AI Foundation Models For Language Forrester Wave™ — 21 Criteria To Consider, Beyond Benchmarks
AI researchers and scientists love their benchmarks and model parameter counts, but perhaps not as much as tech journalists love to report on them. Every week, there is news about a model beating a benchmark or exceeding parameter counts. While these two factors are important, they are not the only ones to focus on when choosing a provider of foundation models, especially within an enterprise AI context. We went much deeper … much, much deeper.
In our inaugural evaluation, The Forrester Wave™: AI Foundation Models For Language, Q2 2024, we break down large language model (LLM) model offering, strategy, and market presence across 21 criteria for 10 providers that offer foundation models for language: Amazon Web Services, Anthropic, Cohere, Databricks, Google, IBM, Microsoft, Mistral AI, NVIDIA, and, of course, OpenAI. Sure, you want to choose a great model, but it is also important to understand the vendor’s vision, innovation, roadmap, pricing transparency, adoption in the market, and market momentum. While a well-performing model is important to a successful use case, enterprise buyers must look for AI foundation models for language (AI-FML) providers that:
- Have a strategy geared toward serving enterprise needs. Enterprises are especially skeptical of the staying power of startups and, sometimes, the commitment of larger tech companies. It is important for enterprises to critically examine the model vendor’s strategy to make sure that it aligns with the needs of enterprises and that the company has staying power in the market. To evaluate a vendor’s strategy, we looked at the vendor’s vision, innovation, roadmap, partner ecosystem, pricing flexibility, and supporting services.
- Offer a rich set of tools to configure, test, and govern model usage. APIs and SDKs are super bare-bones, albeit essential, base functionality for the techiest of techs. But enterprises demand more tooling that empowers a broader AI team to configure models, perform prompt engineering, test models, set governance rules, deploy production use cases, and more. To evaluate a vendor’s tooling, we looked at enterprise governance and security, enterprise application development, model alignment, and deployment model management.
- Have rising benchmarks. There are dozens of benchmarks conceived and used by AI researchers to measure model performance against a variety of tasks. What’s important is that a vendor’s subsequent models keep performing better. To evaluate a vendor’s model core capabilities, we used MMLU (Massive Multitask Language Understanding) for human language tasks and HumanEval for TuringBot (coding) use cases because they are the most cited benchmarks by AI researchers and vendors alike.
- Can scale at low latency. Users of AI chatbots are all too familiar with the jittery output, often taking several seconds to return a result. This might be acceptable for some human-model interaction use cases, but for automated uses, such as generating real-time custom product descriptions for an e-commerce application, a few seconds is unacceptable. A model must perform architecturally to return results quickly, scale as the number of users/queries increases, and be fault-tolerant. To evaluate a vendor’s architectural performance, we looked at the vendor’s operational resilience and scalability.
The Absurd Rate Of Innovation Makes Choosing An LLM Challenging
Compounding the complexity of choosing an AI foundation model is that enterprises are likely to need more than one. In addition to the commercial model providers in our evaluation, there is a vibrant community of open-weight models that many enterprises are already using, such as the Llama models from Meta. There is also an emerging market for domain-specific models for fields such as healthcare, legal, and many more. The absurd rate of innovation in this market can make the choice between startups and tech giants even more confusing. This is why we recommend a balanced and longer-term outlook on how the foundation models will fit the needs of a complex enterprise beyond benchmarks.
We Give You An Independent View Of The Model Market
Whatever you do, please don’t ask #AI to help you choose. It will not be unbiased! Instead, Forrester clients can schedule a guidance session or inquiry with us to discuss vendor selection and overall trends in this #AI foundation models market.