Several users highlighted humor and cynicism in the naming conventions for language models (LLMs). "throwup238" proposed that the ultimate benchmark for any new LLM should be if it can coherently name itself, terming it “self-awareness.” "lenerdenator" joked about the seemingly random naming schemes, likening it to the Programming 101 advice of "just give the variable any old name."
There was extensive debate over the effectiveness of various models, particularly comparing OpenAI's GPT-4.5 to Anthropic's Claude 3.7. "bhouston" shared performance metrics, noting that Anthropic's model outperformed in coding. However, "logicchains" pointed out that the cost of using Claude may not be feasible for personal projects.
Pricing discussions were central, with new models such as GPT-4.5 being criticized for its "insane" costs. "zaptrem" and others compared these to more affordable alternatives, questioning how OpenAI justifies such pricing if performance improvements are marginal. "mchusma" and "Topfi" expressed confusion over such a costly release, suggesting it might be a marketing strategy or a placeholder release.
Many participants found the latest models underwhelming regarding their expected capabilities. "freediver" and "jnd0" noted performance limitations, speculating that "this will be a path for the future." Meanwhile, "freediver" found it not much more intelligent than simpler models on certain tasks, adding to the concern of diminishing returns with scaling.
An interesting discussion unfolded around the ethical implications and the role of AI. "doctoboggan" discussed concerns about the reliance on human evaluators for model improvements, preferring models to be more "correct and capable" rather than simply more personable or likable. "sebastiennight" warned about models that might soon pose as "a friendly person" rather than a helpful assistant, as more emphasis is placed on emotional intelligence (EQ).
- "antirez" speculated that OpenAI's exploration of pre-training scaling for GPT-4.5 could be akin to scientific research, even if self-serving: "A gift to science."
- "wewewedxfgdf" nostalgically remarked that earlier models like GPT-2 provided "laugh-out-loud" humor which seems absent in the newer iterations post-reinforcement learning adjustments.
- Despite mainstream skepticism, "highfrequency" saw potential for non-reasoning models that can subtly tweak tone to be more enjoyable, rather than providing dry information.
This variety of perspectives underscores not only the complexities in evaluating AI advancements but also the diverse expectations from these technologies as they continue to evolve.