AIOps and observability can deliver significant value in business-critical use cases. To successfully build the machine-learning models powering those use cases, you need a strong data foundation. The applications of AI/ML to incident remediation, enhanced alert recommendations, and proactive risk mitigation aren’t useful if the models are not trained on fresh, relevant data. If it takes too long for the models to learn the environment, or if the organization does not trust the outputs provided by these models, their value will go unrealized. Worse, their poor performance could lead to more security or operational risks.

Tools leveraging ML models in dynamic environments must be continuously fed new data in order to adapt to changing conditions and goals. They need to learn as much as possible about their new environments before being fully able and safe to provide service and ultimately deliver value. Once established, however, they deliver vital information that can either be automatically actioned or used to enhance a human-driven corrective or proactive action.

The AI/ML Model Dilemma

Enterprises utilizing AIOps tools today are struggling to make sure that their models work as soon as they are deployed, rather than waiting a length of time (sometimes months) to accumulate data and train the model. This is driven by a lack of access to the live production data in pre-production environments, and that means models can’t be trained on the most recent and relevant data. This creates a classic chicken and egg situation — we need a model to be trained on data, but we need to put the model into production to collect some of that data.

Synthetic Data Can Bridge The Gap From Sensitive Production Data To Safe Test Data

A way to shorten the training cycle without violating policies or regulations while simultaneously building trust and transparency in the AI-/ML-driven solution such as AIOps and observability is by utilizing generative AI techniques — in this case, synthetic data generation with generative adversarial networks and large language models. These techniques can generate de-identified data to expedite the training, modify synthetic data to test “what if” scenarios, and expand and augment smaller data sets to make them large enough to train various types of models. This helps refine the models to make them more accurate, but almost more importantly, it helps build confidence in the AI application that it will be more accurate and timelier.

Building Confidence And Driving Adoption

Shortening the training cycle while improving confidence in the systems will drive even faster adoption of the already available capabilities in AI-/ML-driven platforms such as AIOps. Closing the data gap problem in AIOps and observability could be addressed using additional techniques such as automated labeling, anonymization, or masking. The question of time to value can also become a more discrete calculation by testing the models with deliberate variability that more closely approximates how long it will take a model to be ready for production. Neither adoption nor value realization will occur if these systems aren’t trusted and secured, however.

Answers To The Dilemma Are On The Horizon

In the coming months, we will be looking to better understand these dilemmas that organizations are facing with training their AI/ML models that are supporting IT operations. We will be working with vendors and enterprises alike to better understand and clarify our views on these challenges. If you are faced with these challenges and/or have overcome them, please reach out to either Rowan or me, as we’d really like to capture your experiences as part of our research. You can also reach out to our research assistant Audrey Lynch to schedule a time to have a conversation. We look forward to speaking with you as part of this research.

Join The Conversation

We invite you to reach out through social media if you want to provide general feedback. If you prefer more formal or private discussions, email to set up a meeting! Click Carlos Casanova or Rowan Curran to follow our research and continue the discussion.