Skip to content

Instantly share code, notes, and snippets.

@simonw
Created February 28, 2025 00:00
Show Gist options
  • Select an option

  • Save simonw/112e3f4660a1a410151e86ec677e34ab to your computer and use it in GitHub Desktop.

Select an option

Save simonw/112e3f4660a1a410151e86ec677e34ab to your computer and use it in GitHub Desktop.

Okay, here's a summary of the themes expressed in the comments, with direct quotations and attribution:

General Sentiment: Underwhelmed and Confused

Many users express disappointment and confusion about GPT-4.5's purpose, performance, and especially its pricing.

  • "Considering both this blog post and the livestream demos, I am underwhelmed. Having just finished the stream, I had a real 'was that all' moment..." - Topfi
  • "This is such as confusing release / announcement." - jasonjmcghee
  • "What has been shown feels like it could be achieved using a custom system prompt on older versions of OpenAIs models, and I struggle to see anything here that truly required ground-up training on such a massive scale." - Topfi
  • "hyperscalers in shambles, no clue why they even released this other than the fact they didn't want to admit they wasted an absurd amount of money for no reason" - ren_engineer
  • "This feels more like a release they pushed out to keep the "hype" alive rather than something they were eager to share. Honestly, the results don’t seem all that impressive, and considering the price, it just doesn’t feel worth it." - DaveMcMartin

Pricing Concerns: Prohibitively Expensive

The extremely high cost of GPT-4.5, especially compared to GPT-4o and competitors, is a major point of contention. Many users question its value proposition.

  • "API price is crazy high. This model must be huge. Not sure this is practical" - brokensegue
  • "Wow you aren't kidding, 30x input price and 15x output price vs 4o is insane." - jdprgm
  • "Input price difference: 4.5 is 30x more... Output price difference:4.5 is 15x more... In their model evaluation scores in the appendix, 4.5 is, on average, 26% better. I don't understand the value here." - MattSayar
  • "How could they justify that asking price? And, if they have some amazing capabilities that make a 30-fold pricing increase justifiable, why not show it?" - Topfi
  • "$75.00 / 1M tokens for input...$150.00 / 1M tokens for output...That's crazy prices." - ekojs
  • "API is literally 5 times more expensive than Claude 3 Opus, and it doesn't even seem to do anything impressive. What's the business strategy here?" - bakugo

Questionable Improvements and Usefulness

Users are skeptical about the claimed improvements, particularly regarding "EQ" and hallucination reduction. Some believe the changes are superficial and could be achieved with prompting adjustments on existing models.

  • "Early testing doesn't show that it hallucinates less, but we expect that putting that sentence nearby will lead you to draw a connection there yourself". - LeifCarrotson
  • "I wonder why they highlight it as an achievement when they could have simply tuned 4o to be more conversational..." - kgeist
  • "Am I missing something, or do the results not even look that much better? ...this just seems like a different prompting style and RLHF, not really an improved model at all." - lblume
  • "Looking at pricing [1], I am frankly astonished." - Topfi
  • "That presentation was super underwhelming. We got to watch them compare… the vibes? … of 4.5 vs o1." - virgildotcodes

Comparisons to Competitors (Especially Anthropic's Claude)

Many commenters compare GPT-4.5 unfavorably to Anthropic's Claude models, particularly Claude 3.7 Sonnet, highlighting better performance at a lower cost.

  • "To put that in context, Claude 3.5 Sonnet (new), a model we have had for months now and which from all accounts seems to have been cheaper to train and is cheaper to use, is still ahead of GPT-4.5 at 36.1% vs 32.6% in SWE-Lancer Diamond" - Topfi
  • "Doubly so with how good Claude 3.7 Sonnet is at $3 / 1M tokens." - crooked-v
  • "It seems clearly worse than Claude Sonnet 3.7, yet costs 30x as much?" - jasonjmcghee
  • "I feel like OpenAI is pursuing AGI when Anthropic/Claude is pursuing making AI awesome for practical things like coding." - wewewedxfgdf
  • "...for all the connotations that come with a 4.5 moniker this is kind of underwhelming." - Bjorkbat

Coding Performance: A Step Backwards?

Several users point out that GPT-4.5 performs worse than other models, including GPT-4o and Claude, on coding benchmarks.

  • "GPT-4.5 scores 32.6%, while o3-mini scores 10.8%." - ehsanu1 (on the SWE-Lancer benchmark)
  • "A bit better at coding than ChatGPT 4o but not better than o3-mini" - bhouston
  • "GPT-4.5 Preview scored 45% on aider's polyglot coding benchmark...27% ChatGPT-4o" - anotherpaulg
  • "I guess they are ceding the LLMs for coding market to Anthropic?" doctoboggan

Concerns About OpenAI's Direction and Strategy

There's a broader theme of concern about OpenAI's overall strategy, its focus on hype, and its ability to compete in the evolving AI landscape.

  • "OpenAI really struggles to stay ahead of their competitors." - Topfi
  • "each passing release makes Altman’s confidence look more aspirational than visionary" - chefandy
  • "We don't really know what this is good for, but spent a lot of money and time making it and are under intense pressure to announce new things right now. If you can figure something out, we need you to help us." - swatcoder (paraphrasing OpenAI)
  • "...it seems like they're on a trajectory to create a model which is strictly more capable than I am but which costs 100x my salary to run." - daemonologist
  • "OpenAI simply does not have a business model, even if they are trying to convince the world that they do." - ur-whale
  • "This is the shittiest PR move I've seen since the AI trend started." - j_maffe
  • "OpenAI will do literally anything but ship GPT-5." - moffkalast
  • "Probably the fastest enshitification I’ve seen." - 42lux

"EQ" and Conversational Style: A Divisive Change

The increased focus on "EQ" and a more conversational, "friendly" tone is met with mixed reactions. Some see it as a positive development, while others find it concerning or unnecessary.

  • "It seems interesting that they are focusing a large part of this release on the model having a higher "EQ" (Emotional Quotient)." - sebastiennight
  • "Personally I find this worrying and (as someone who builds upon SOTA model APIs) I really hope this behavior is not going to seep into API responses, or will at least be steerable through the system/developer prompt." - sebastiennight
  • "OpenAI doubling down on the American-style therapy-speak instead of focusing on usefulness. No thanks." - mvdtnz
  • "The example GPT-4.5 answers from the livestream are just... too excitable? Can't put my finger on it, but it feels like they're aimed towards little kids." - Xiol32
  • "Is it just me or is the 4o response insanely better?" - JohnMakin, comparing the more verbose, structured response of 4o favorably to 4.5's shorter, more "emotional" response.

Scaling and the Future of LLMs

There's discussion about whether the scaling of pre-trained models is hitting diminishing returns, and whether reasoning models represent a more promising path forward.

  • "Looks like more signal that the scaling "law" is indeed faltering." - nialv7
  • "This is apparently (based on pricing) using about an order of magnitude more compute, and is only maybe 10% more intelligent." - erulabs
  • "...reasoning models are the only way forward." - serjester
  • "It points to an overall plateau being reached in the performance of the transformer architecture." - wavemode
  • "...can't imagine this will be a path for the future." - freediver

GPU Scarcity and Infrastructure Constraints

The high cost and limited availability of GPUs are mentioned as potential factors influencing OpenAI's decisions and pricing.

  • "we will add tens of thousands of GPUs next week and roll it out to the plus tier then". - jdprgm (quoting Altman on X)
  • "Altman's claim and NVIDIA's consumer launch supply problems may be related - OpenAI may be eating up the GPU supply..." - bhouston
  • "the pricing is probably a mixture of dealing with GPU scarcity and intentionally discouraging actual users." - g-mork

Uncommon Opinions and Observations

  • "At this point I think the ultimate benchmark for any new LLM is whether or not it can come up with a coherent naming scheme for itself. Call it “self awareness.”" - throwup238
  • "I am beginning to think these human eval tests are a waste of time at best, and negative value at worst." - doctoboggan
  • "As an LLM cynic, I feel that point passed long go, perhaps even before Altman claimed countries would start wars to conquer territory for its datacenters..." - Terr_
  • "One of the most interesting applications of models with higher EQ is personalized content generation, but the size and cost here are at odds with that." - selalipop
  • "I can just imagine Kraft having a subsidized AI model for recipe suggestions that adds Velveeta to everything." - vel0city
  • "It will not end well for investors who sunk money in these large AI startups (unless of course they manage to find a Softbank-style mark to sell the whole thing to), but everyone will benefit from the progress AI will have made during the bubble." - ur-whale. A relatively optimistic take compared to most.
  • "With new disruptive technologies, companies aren't supposed to be able to look into a crystal ball and see the future. They're supposed to try new things and see what the market finds useful." - crazygringo
  • "At the same time, I believe people are missing what they tried to do with GPT 4.5: it was needed and important to explore the pre-training scaling law in that direction. A gift to science, however selfist it could be." - antirez
  • "GPT-2 was laugh out loud funny, rolling on the ground funny. I miss that - newer LLMs seem to have lost their sense of humor." -wewewedxfgdf
  • "...for user facing applications like mine, this is an awesome step in the right direction for EQ / tone / voice." ripped_britches

This comprehensive summary captures the major themes and provides specific examples with direct quotations and attribution, fulfilling all the requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment