Anthropic, OpenAI beef up coding capabilities for their LLMs

This audio is auto-generated. Please let us know if you have feedback.

Dive Brief:

Generative AI startup Anthropic released an updated version of its general purpose large language model, Claude 2, on Tuesday. While previous versions of Claude lagged behind “state-of-the-art” models when performing coding tasks, Anthropic said Claude 2 shows improvements, according to its research.
Anthropic placed “special emphasis” on improving the model’s ability to act as a coding assistant. The updated model scored better on a human-level Python coding test than in previous trials, increasing its score from 56% using Claude 1.3 to 71.2% with Claude 2, the company said.
OpenAI rolled out a tool last week that allows ChatGPT Plus users to analyze data, create charts, edit files and perform math with a prompt. The company first equated the plugin, called code interpreter, to a “very eager junior programmer working at the speed of your fingertips,” when it unveiled plugins in March.

Dive Insight:

Technology vendors are updating and creating additives for their general-purpose large language models to improve coding capabilities and adhere to enterprises’ need for ethical guidelines, explainability and data privacy guardrails.

With each update or plugin, Anthropic and OpenAI are essentially crowdsourcing use cases for their products.

OpenAI encouraged users to share their most interesting outputs and use cases of the tool in a newly created forum on its Discord server. Use cases included making GIFs, converting natural language prompts to code written in C++ form and analyzing spreadsheets. Users also described when the tool had failed to carry out prompts.

Code interpreter was first teased in March when the company launched ChatGPT plugins. The experimental ChatGPT model works with a Python interpreter to handle uploads and downloads. Plus users can access code interpreter in beta on the website and can analyze uploaded personal files.

Ethan Mollick, an associate professor at the Wharton School of the University of Pennsylvania, praised the tool in a blog post Friday. “You don’t have to code, because it does all the work for you,” Mollick said.

While the tool has led to a crop of use cases in online communities, it’s still a work in progress — similarly to Anthropic’s Claude. One user shared an example of the code interpreter struggling to combine two images when prompted. The tool appeared to excel when converting data sets into graphs based on the examples users shared in the forum.

Even in use cases where tools typically succeed, they can’t always be trusted.

“Claude models still confabulate – getting facts wrong, hallucinating details, and filling in gaps in knowledge with fabrication,” according to Anthropic’s research. “This means they should not be used on their own in high stakes situations where an incorrect answer would cause harm."

Claude is already readily accessible to workers through third-party apps and services, such as Slack.

The focus on coding capabilities — which, in turn, positions generative AI tools closer to enterprises — is coming at a time when the general public’s appetite is diminishing.

In the U.S., ChatGPT’s desktop and mobile web traffic declined 10.3% from May to June, according to a report from data internet analytics firm Similarweb published last week. ChatGPT website visitors worldwide spent less time on the tool’s website as well, dropping 8.5%.

While public fanfare surrounding ChatGPT and its competitors', including Claude, wavers, tech leaders are still inundated with waves of AI products hitting the market. Out of more than 2,000 categories, AI tools make up the top three fastest growing software products across software marketplace and review website G2 this year.