CIOs are struggling to move generative AI projects from experiment to production, falling short of last year’s goals.
Two-thirds of C-suite leaders thought their organization’s progress with generative AI was not up to par in 2023, according to a Boston Consulting Group poll.
Generative AI experiments often fail. Even when pilots do go well, CIOs find solutions are hard to scale, citing a lack of clarity on success metrics (73%), cost concerns (68%) and the fast-evolving technology landscape (64%), according to an Everest Group survey published last month.
Each enterprise has personalized risk frameworks, priorities and policies that determine whether an experiment can move on to broader testing. Even if those aspects are all aligned, experiments could stall due to lackluster results in initial trials. But not all failed experiments should be looked at as failures.
Zillow, an early adopter, began experimenting with various generative AI tools last year. As part of that process, the company wanted to ensure pilots checked certain boxes before moving to broader deployment.
“If you take the code assistant tools, for example, we have very specific expectations of how that needs to work within our Zillow ecosystem,” Lakshmi Dixit, VP of tech engineering and operations, told CIO Dive.
While evaluating tools, the real estate listing company also took into consideration impacts on user experience, productivity gains and what business problem it would solve.
“We had success in some things that we tried, and we didn’t see success in other things that we heard others had better success with,” Dixit said. “It’s hard to say the specifics, but I think it’s okay to make mistakes. You just want to learn from it and pivot quickly.”
In the tech industry, failing fast is a core tenet that pushes for speed even when things aren’t going well rather than letting issues fester. With generative AI’s trough of disillusionment on the horizon, CIOs shouldn’t lessen their expectations for the technology. Instead, tech leaders should quickly adapt plans when experiments stall, whether it involves tweaking specific aspects of the trial or pulling the plug altogether.
“This is going to be a transformative technology that unlocks new business models, but there’s going to be a lot of failed attempts to do that along the way,” Brian Jackson, research director in the CIO practice at Info-Tech Research Group, said. “Don’t lower your expectations, because then you’re willing to accept iterative improvements when really you’re trying to hit the transformative home run.”
Reacting to experiment results
Failed generative AI experiments are probably more common than most enterprises let on.
“Anybody who says this is not the case is probably not being completely honest about it,” Ankur Sinha, CTO at digital financial services company Remitly, told CIO Dive.
“One of the things we’ve pushed really hard on is, as we tried different generative AI solutions, either our own build or leveraging vendors, we made sure that we understood what the measure of success was going to look like,” Sinha said.
Clearly defining metrics of success is key to any project’s planning process. If businesses are looking for efficiency or productivity, nailing down what that actually means will help CIOs identify the value a tool or solution will deliver.
Sinha’s teams are currently experimenting with AI-powered coding companions and test-generation tools. One of the key metrics of success that the teams track is the code change rate, which describes the process when IT pros have to manually change generated code for it to go into production.
“If the change rate is high, we’re not getting as much benefit and if the percentage of code that makes it into production is low, then it’s not worth it,” Sinha said. He described the current change rate as moderate.
Generated code is reviewed by automated quality gates and human reviewers. These mitigation techniques also help the business shield itself from other associated risks, such as introducing security vulnerabilities with AI code.
While AI-powered code generation at Remitly hasn’t moved past the experimentation phases, it has opened the door for other, lower-risk use cases, such as test generation.
“It’s been a particularly positive use case for us because you’re writing tests for your code and your tests don’t necessarily go into production,” Sinha said.
Some generative AI experiments are bound to fail, especially when clear ROI metrics or guardrails are missing. Others lead the way for more immediate gains and some failed experiments will have more potential for success as the technology matures.
It’s up to businesses to pick a winning concept amid a sea of possibilities.
“We have over 250 ideas that have been submitted, and we’re not doing 250 projects,” Sal Companieh, chief digital and information officer at Cushman & Wakefield, told CIO Dive.
The real estate company embarked on its AI+ initiative in November aiming to embed AI across the transaction lifecycle to boost productivity and assist employees.
“I would say less than 10% [of those submitted ideas] are actually getting action," Companieh said. "That’s very deliberate because we want to drive focus.”
Correction: This piece has been updated to reflect Remitly reviews all AI-generated code.