Skip to content

Instantly share code, notes, and snippets.

@simonw

simonw/gpt-4o.md Secret

Created February 27, 2025 23:57
Show Gist options
  • Select an option

  • Save simonw/592d651ec61daec66435a6f718c0618b to your computer and use it in GitHub Desktop.

Select an option

Save simonw/592d651ec61daec66435a6f718c0618b to your computer and use it in GitHub Desktop.

Self-awareness and Naming Conventions

Several users highlighted humor and cynicism in the naming conventions for language models (LLMs). "throwup238" proposed that the ultimate benchmark for any new LLM should be if it can coherently name itself, terming it “self-awareness.” "lenerdenator" joked about the seemingly random naming schemes, likening it to the Programming 101 advice of "just give the variable any old name."

Comparison with Other Models

There was extensive debate over the effectiveness of various models, particularly comparing OpenAI's GPT-4.5 to Anthropic's Claude 3.7. "bhouston" shared performance metrics, noting that Anthropic's model outperformed in coding. However, "logicchains" pointed out that the cost of using Claude may not be feasible for personal projects.

Pricing and Market Strategy

Pricing discussions were central, with new models such as GPT-4.5 being criticized for its "insane" costs. "zaptrem" and others compared these to more affordable alternatives, questioning how OpenAI justifies such pricing if performance improvements are marginal. "mchusma" and "Topfi" expressed confusion over such a costly release, suggesting it might be a marketing strategy or a placeholder release.

Model Performance and Expectations

Many participants found the latest models underwhelming regarding their expected capabilities. "freediver" and "jnd0" noted performance limitations, speculating that "this will be a path for the future." Meanwhile, "freediver" found it not much more intelligent than simpler models on certain tasks, adding to the concern of diminishing returns with scaling.

The Role and Ethics of AI Models

An interesting discussion unfolded around the ethical implications and the role of AI. "doctoboggan" discussed concerns about the reliance on human evaluators for model improvements, preferring models to be more "correct and capable" rather than simply more personable or likable. "sebastiennight" warned about models that might soon pose as "a friendly person" rather than a helpful assistant, as more emphasis is placed on emotional intelligence (EQ).

Uncommon Opinions

  • "antirez" speculated that OpenAI's exploration of pre-training scaling for GPT-4.5 could be akin to scientific research, even if self-serving: "A gift to science."
  • "wewewedxfgdf" nostalgically remarked that earlier models like GPT-2 provided "laugh-out-loud" humor which seems absent in the newer iterations post-reinforcement learning adjustments.
  • Despite mainstream skepticism, "highfrequency" saw potential for non-reasoning models that can subtly tweak tone to be more enjoyable, rather than providing dry information.

This variety of perspectives underscores not only the complexities in evaluating AI advancements but also the diverse expectations from these technologies as they continue to evolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment