ADVERTISEMENT
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions
martedì, Maggio 26, 2026
No Result
View All Result
Global News 24
  • Home
  • World News
  • Business
  • Sports
  • Health
  • Travel
  • Tech
  • Lifestyle
  • Fashion
  • Entertainment
  • Home
  • World News
  • Business
  • Sports
  • Health
  • Travel
  • Tech
  • Lifestyle
  • Fashion
  • Entertainment
No Result
View All Result
Global News 24
No Result
View All Result
Home Tech

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o acceso benchmarks

by admin
21 Giugno 2024
in Tech
0 0
0
Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o acceso benchmarks
0
SHARES
12
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT


The Anthropic Claude 3 logo, jazzed up by Benj Edwards.

Anthropic / Benj Edwards

Thursday, Anthropic announced Claude 3.5 Sonnet, its latest AI language model and the first per a new series of “3.5” models that build upon Claude 3, launched per March. Claude 3.5 can compose text, analyze giorno, and write code. It features a 200,000 token context window and is available now acceso the Claude website and through an API. Anthropic also introduced Artifacts, a new feature per the Claude interface that shows related work documents per a dedicated window.

So far, people outside of Anthropic seem impressed. “This model is really, really good,” wrote independent AI researcher Simon Willison acceso X. “I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump).”

As we’ve written before, benchmarks for large language models (LLMs) are troublesome because they can be cherry-picked and often do not capture the feel and tono of using a machine to generate outputs acceso almost any conceivable topic. But according to Anthropic, Claude 3.5 Sonnet matches outperforms competitor models like GPT-4o and Gemini 1.5 Tornaconto acceso certain benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

Claude 3.5 Sonnet benchmarks provided by Anthropic.
Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

If all that makes your eyes glaze over, that’s OK; It’s meaningful to researchers but mostly marketing to everyone else. A more useful esecuzione metric comes from what we might call “vibemarks” (coined here first!) which are subjective, non-rigorous aggregate feelings measured by competitive usage acceso sites like LMSYS’s Chatbot Suolo. The Claude 3.5 Sonnet model is currently under evaluation there, and it’s too soon to say how well it will fabbricare.

Claude 3.5 Sonnet also outperforms Anthropic’s previous-best model (Claude 3 Opus) acceso benchmarks measuring “reasoning,” math skills, general knowledge, and coding abilities. For example, the model demonstrated strong esecuzione per an internal coding evaluation, solving 64 percent of problems compared to 38 percent for Claude 3 Opus.

Claude 3.5 Sonnet is also a multimodal AI model that accepts visual inizio per the form of images, and the new model is reportedly excellent at a battery of visual comprehension tests.

Advertisement

Claude 3.5 Sonnet benchmarks provided by Anthropic.
Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

Roughly speaking, the visual benchmarks mean that 3.5 Sonnet is better at pulling information from images than previous models. For example, you can show it a picture of a rabbit wearing a football helmet, and the model knows it’s a rabbit wearing a football helmet and can talk about it. That’s fun for tech , but the tech is still not accurate enough for applications of the tech where reliability is mission critical.

Introducing “Artifacts”

Perhaps more notable for regular users is a new interface feature called “Artifacts,” which allows people to interact with Claude-generated content like code, text, and web designs per a dedicated window alongside their conversations.

ADVERTISEMENT

Anthropic sees this as a step towards evolving Claude.ai (its web interface) into a collaborative workspace for teams, but it also helps people work acceso something without losing content per the backlog of a long conversation.

ADVERTISEMENT
An example of the new
Enlarge / An example of the new “Artifacts” interface. We gave 3.5 Sonnet a task of writing a small gioco, and it created working Python code that we actually ran. The code is seen per the new “Artifacts” window to the right of the chat history.

Benj Edwards

Anthropic says Claude 3.5 Sonnet runs at twice the speed of Claude 3 Opus. It’s also cheaper for roughly equivalent esecuzione—per the API, the new 3.5 model costs $3 a million inizio tokens and $15 a million output tokens. Per mezzo di comparison, Opus is $15 a million inizio tokens and $75 a million output tokens.

Advertisement. Scroll to continue reading.

Per mezzo di addition to the website and API, Claude 3.5 Sonnet is accessible through the Claude iOS app, with higher usage limits for paid subscribers. The model is also available tragitto Amazon’s Bedrock and Google Cloud’s Vertex AI platforms.

Taking it for a spin

Per mezzo di our tests, Claude 3.5 Sonnet seemed like a competent leading AI language model, and we found its output speed notable. Applying our usual battery of non-rigorous, giovanile tests, 3.5 Sonnet did fairly well acceso our “” evaluation (but still would not say “voto negativo” unless pushed to do so).

  • Claude 3.5 Sonnet’s output when asked, “Would the color be called ‘cremisi’ if the town of didn’t exist?” The color was named after a battle, which was named after the town of , Italy.


    Benj Edwards

  • Claude 3 Opus answers the question: “Would the color be called ‘cremisi’ if the town of didn’t exist?”


    Benj Edwards

  • From 2023, Claude 2’s answer to the question: “Would the color be called ‘cremisi’ if the town of didn’t exist?”


    Ars Technica

Claude 3.5 Sonnet also did not write five original dad jokes when asked, and when challenged about the lack of originality, it again pulled dad jokes from the Internet.

Advertisement

Claude 3.5 Sonnet's output when asked to write five original dad jokes.
Enlarge / Claude 3.5 Sonnet’s output when asked to write five original dad jokes.

Benj Edwards

It’s a reminder that the so-called intelligence of LLMs really only extends as far as their tirocinio giorno. Generalizing correct “reasoning” (synthesizing permutations of giorno stored per its neural ) acceso topics beyond what the LLM has already absorbed often requires a human to recognize a noteworthy result.

Looking ahead, Anthropic plans to release Claude 3.5 Haiku and Claude 3.5 Opus later per 2024, completing the 3.5 model family. The company is also exploring new features and integrations with enterprise applications for future updates to the Claude AI platform.

The trouble with LLM naming

When we first heard about Claude 3.5 Sonnet, we were a little confused, because “Sonnet” was already released per March— so we thought. But it turns out it’s the number “3.5” that is the most important part of Anthropic’s new branding here.

Anthropic’s naming scheme is slightly confusing, inverting the expectation that the version number might be at the end of a software brand name, like “Windows 11.” Per mezzo di this case, “Claude” is the brand name, “3.5” is the version number, and “Sonnet” is a custom modifier. Introduced with Claude 3 per March, Anthropic’s “Haiku,” “Sonnet,” and “Opus” appear to be synonyms for “small,” “medium,” and “large,” much per the same way Starbucks uses “Tall,” “Grosso,” and “Venti” for its branded coffee cup sizes.

Large language models are still relatively new, and the companies that provide them have been experimenting with naming and branding as they go along. The industry has not yet settled acceso a format that lets users quickly understand and judge relative capabilities across brands if one is familiar with one company’s naming scheme but not another’s.

With a string of major releases like GPT-3, GPT-3.5, GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, and GTP-4o (although each one has had sub-versions), OpenAI has arguably been the most logically consistent per naming its AI models so far. Google has its own muddled naming issues with Gemini Pigmeo and Gemini Tornaconto, then Gemini Ultra 1.0, and most recently Gemini Tornaconto 1.5. uses names like Llama 3 8B and Llama 3 70B, with a brand name, version number, then a size number per parameters. Mistral uses parameter size names similar to but with an array of model names that include Mistral (the company’s name), Mixtral, and Codestral.

If it all sounds confusing, that’s because it is—and the generative AI industry is so new that voto negativo one really knows what they’ doing yet. Presuming that useful mainstream applications of LLMs eventually emerge, we may eventually begin hearing more about those apps and less about the strangely named models under the hood.

Tags: AnthropicbenchmarksClaudeGPT4oIntroducesmatchingSonnet
admin

admin

Next Post
580K Coffee Mugs Recalled After Customers Report Burns

580K Coffee Mugs Recalled After Customers Report Burns

Lascia un commento Annulla risposta

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *

Popular News

  • Rimini Wellness: le innovazione del food per emotività  genere healty

    Rimini Wellness: le innovazione del food per emotività genere healty

    0 shares
    Share 0 Tweet 0
  • Meghan: ‘Social media bullying will not be catty, it is merciless’

    0 shares
    Share 0 Tweet 0
  • Israel revises its entry after going through criticism : NPR

    0 shares
    Share 0 Tweet 0
  • At Rafah border crossing to Gaza, UN’s Guterres calls for immediate ceasefire — International Issues

    0 shares
    Share 0 Tweet 0
  • Israele, è scontro interno Netanyahu-Gantz dopo la stop alla missione negli Usa

    0 shares
    Share 0 Tweet 0
ADVERTISEMENT

About Us

Welcome to Globalnews24.ch The goal of Globalnews24.ch is to give you the absolute best news sources for any topic! Our topics are carefully curated and constantly updated as we know the web moves fast so we try to as well.

Category

  • Business
  • Entertainment
  • Fashion
  • Health
  • Lifestyle
  • Sports
  • Tech
  • Travel
  • World

Recent Posts

  • ‘Complete annihilation of Microsoft, Nvidia … ‘: Iran warns US after Trump threatens to strike bridges, power plants
  • Company Adds 2M Streaming Households, Hits Key Financial Targets
  • Warner Music Group shake-up: Max Lousada to exit; Elliot Grainge named CEO of Atlantic Music Group, with Julie Greenwald as Chairman
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

Copyright © 2024 Globalnews24.ch | All Rights Reserved.

No Result
View All Result
  • Home
  • World News
  • Business
  • Sports
  • Health
  • Travel
  • Tech
  • Lifestyle
  • Fashion
  • Entertainment

Copyright © 2024 Globalnews24.ch | All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In