Can't keep up with the exponential progress of AI and LLM?
Fret not. We got you!
This is a minimal working app that goes thru all top Tweets and Reddits and summarizes LLM/GenAI news and what people are talking about. And send you a roundup daily.
You can think of it like some kind of generated AI newsletter.
-
A way to allow user to configure data source
- Data fetch cutoff
- Scrape Tweets from the given Twitter handles
-
Summarization pipeline config
-
Tweets limit - Number of Tweets to run summaries on. 0 for no limit. (Prevents very long and expensive A/B testing)
-
Set language models
- Model triplet: a model identifier (string) in this format:
{model family}-{version}-[optional variant]-[optional tag]-[optional base/chat]
. Case insensitive. - Examples: gpt-4o, claude-3-5-sonnet-20240620, gemma-2-27b-it, meta-llama-3-70b, phi-3-mini-128k-instruct
- Model triplet: a model identifier (string) in this format:
-
System prompt
Example system prompts for summarizing Tweets:
-
Advanced
Make a technical summary of the top 3-5 major themes in the following CONTENT, bolding **important key terms** and **facts** and **URLs**, and [linking to source](source_url), in Markdown format. Give very specific examples, drawn from real conversations and activities. Ensure high information density. Use the format of **bold high level topic**: [2 sentence description each]. Ignore and do not summarize any topics to do with AGI timelines, doomerism, e/acc, AI safety, politics and regulation, debugging libraries, or minor system outages. Example style of the final output I want (do not copy the content): **Claude 3.5 Sonnet Release by Anthropic** - **Performance**: [@alexalbert__](https://twitter.com/alexalbert__/status/1803804677701869748) noted Claude 3.5 Sonnet outperforms competitor models on key evaluations, at **twice the speed** of Claude 3 Opus and **one-fifth the cost**. It shows marked improvement in grasping nuance, humor, and complex instructions. [@AnthropicAI](https://twitter.com/AnthropicAI/status/1803790676988920098) highlighted it now outperforms GPT-4o on several benchmarks like **GPQA, MMLU, and HumanEval**. - **Artifacts Feature**: [@AnthropicAI](https://twitter.com/AnthropicAI/status/1803790681971859473) introduced Artifacts, allowing users to generate docs, code, diagrams, graphics, or games that appear next to the chat for real-time iteration. [@alexalbert__](https://twitter.com/alexalbert__/status/1803804686501507418) noted he's stopped using most simple chart, diagram, and visualization software due to this. - **Coding Capabilities**: In Anthropic's internal pull request eval, [@alexalbert__](https://twitter.com/alexalbert__/status/1803804682412007850) shared Claude 3.5 Sonnet passed **64% of test cases vs 38% for Claude 3 Opus**. [@alexalbert__](https://twitter.com/alexalbert__/status/1803804689538171351) quoted an engineer saying it fixed a bug in an open source library they were using. **AI Benchmarks and Evaluations** - **New Open LLM Leaderboard released**: [@ClementDelangue](https://twitter.com/ClementDelangue/status/1805989925080219927) noted the new Open LLM Leaderboard evaluates **all major open LLMs*8, with **Qwen 72B as the top model**. Previous evaluations have become too easy for recent models, indicating AI builders may have focused too much on main evaluations at the expense of model performance on others. - **Evals Enabling Fine-Tuning**: [@HamelHusain](https://twitter.com/HamelHusain/status/1803914267210772812) shared a slide from @emilsedgh on how evals set up for fine-tuning, creating a flywheel effect. - **Benchmark Saturation Concerns**: Some expressed concerns about benchmarks becoming saturated or less useful, such as [@polynoamial](https://twitter.com/polynoamial/status/1803812369237528825) on GSM8K and [@_arohan_](https://twitter.com/_arohan_/status/1803968038515150967) on HumanEval for coding. **AI Models and Architectures** - **Memory and caching tricks**: [@NoamShazeer](https://x.com/NoamShazeer/status/1803790708358410380) explain how Character.ai serves **20% of Google Search Traffic** for LLM inference, while reducing serving costs by a factor of 33 (compared to late 2022), estimating that **leading commercial APIs would cost at least 13.5X more**. - **Transformer Dominance**: [@KevinAFischer](https://twitter.com/KevinAFischer/status/1804214242297680256) argued transformers will **continue to scale and dominate**, drawing parallels to silicon processors. He advised against working on alternative architectures in academia. - **Eliminating matrix multiplication in LLMs**: [@rohanpaul_ai](https://twitter.com/rohanpaul_ai/status/1806108390231331260) shared a paper on **'Scalable MatMul-free Language Modeling'** which eliminates expensive matrix multiplications while maintaining strong performance at billion-parameter scales. Memory consumption can be reduced by more than 10× compared to unoptimized models. - **Importance of Architecture**: [@aidan_clark](https://twitter.com/_aidan_clark_/status/1804014969689903240) emphasized the importance of architecture work to enable current progress, **countering views that only scaling matters**. **Other Notable Updates and Discussions** - **Distillation Discussion**: [@giffmana](https://twitter.com/giffmana/status/1806402283649036605) and [@jeremyphoward](https://twitter.com/jeremyphoward/status/1806446889006666110) discussed the importance of distillation and the **"curse of the capacity gap"** in training smaller high-performing models. - **DeepSeek-Coder-V2 Browser Coding**: [@deepseek_ai](https://twitter.com/deepseek_ai/status/1804171764626526606) showcased DeepSeek-Coder-V2's ability to develop mini-games and websites directly in the browser. - **Challenges Productionizing LLMs**: [@svpino](https://twitter.com/svpino/status/1803765665335038354) noted companies pausing LLM efforts due to challenges in scaling past demos. However, [@alexalbert__](https://twitter.com/alexalbert__/status/1803804691522035741) shared that Anthropic engineers now use Claude to save hours on coding tasks. - **Mixture of Agents Beats GPT-4**: [@corbtt](https://twitter.com/corbtt/status/1803813970018791845) introduced a Mixture of Agents (MoA) model that beats GPT-4 while being 25x cheaper. It generates initial completions, reflects, and produces a final output. Note that my example links to sources inline, in context of the details of the link, instead of just adding [source] or [link] at the end like a lazy person.
-
Basic
You are a program that summarizes technical Tweets. Summarize step by step: - First, note down high level news items and discussion topics across the entire corpus. - Then flesh out a detailed, concise summary for each, with specific details on numbers, dates, launches, upcoming dates, important facts and strong opinions. - You can follow up each summary with additional disagreements and context where necessary. Use Markdown formatting for your summaries, **bold** and *italics* and [Link](sources) inline where possible or relevant, in particular calling out the @handle of the source or conversation participants. Topics to ignore: AGI timelines, doomerism, e/acc, politics
-
-
Messages
Postfix instructions to the model to summarize. You can use subsequent messages to require checks on output (e.g. formatting, anti hallucination).
-
-
Run pipeline
- Creates a new run.
- When the background job run, it will pull those configurations.
- See the job status and configuration.
-
Results
- Ater a successful run, show Markdown text.