cedrickchee/ai-news-tldr-agent.md

## ai-news-tldr-agent.md

      
    Raw
  

              ai-news-tldr-agent.md
            
          
    AI News TLDR App

Can't keep up with the exponential progress of AI and LLM?
Fret not. We got you!
This is a minimal working app that goes thru all top Tweets and Reddits and summarizes LLM/GenAI news and what people are talking about. And send you a roundup daily.
You can think of it like some kind of generated AI newsletter.
Technical Design


A way to allow user to configure data source

Data fetch cutoff
Scrape Tweets from the given Twitter handles


Summarization pipeline config


Tweets limit - Number of Tweets to run summaries on. 0 for no limit. (Prevents very long and expensive A/B testing)


Set language models

Model triplet: a model identifier (string) in this format: {model family}-{version}-[optional variant]-[optional tag]-[optional base/chat]. Case insensitive.
Examples: gpt-4o, claude-3-5-sonnet-20240620, gemma-2-27b-it, meta-llama-3-70b, phi-3-mini-128k-instruct


System prompt
Example system prompts for summarizing Tweets:


Advanced
Make a technical summary of the top 3-5 major themes in the following CONTENT, bolding **important key terms** and **facts** and **URLs**, and [linking to source](source_url), in Markdown format. Give very specific examples, drawn from real conversations and activities. Ensure high information density. Use the format of **bold high level topic**: [2 sentence description each]. Ignore and do not summarize any topics to do with AGI timelines, doomerism, e/acc, AI safety, politics and regulation, debugging libraries, or minor system outages.

Example style of the final output I want (do not copy the content):


**Claude 3.5 Sonnet Release by Anthropic**

- **Performance**: [@alexalbert__](https://twitter.com/alexalbert__/status/1803804677701869748) noted Claude 3.5 Sonnet outperforms competitor models on key evaluations, at **twice the speed** of Claude 3 Opus and **one-fifth the cost**. It shows marked improvement in grasping nuance, humor, and complex instructions. [@AnthropicAI](https://twitter.com/AnthropicAI/status/1803790676988920098) highlighted it now outperforms GPT-4o on several benchmarks like **GPQA, MMLU, and HumanEval**.
- **Artifacts Feature**: [@AnthropicAI](https://twitter.com/AnthropicAI/status/1803790681971859473) introduced Artifacts, allowing users to generate docs, code, diagrams, graphics, or games that appear next to the chat for real-time iteration. [@alexalbert__](https://twitter.com/alexalbert__/status/1803804686501507418) noted he's stopped using most simple chart, diagram, and visualization software due to this.
- **Coding Capabilities**: In Anthropic's internal pull request eval, [@alexalbert__](https://twitter.com/alexalbert__/status/1803804682412007850) shared Claude 3.5 Sonnet passed **64% of test cases vs 38% for Claude 3 Opus**. [@alexalbert__](https://twitter.com/alexalbert__/status/1803804689538171351) quoted an engineer saying it fixed a bug in an open source library they were using.

**AI Benchmarks and Evaluations**

- **New Open LLM Leaderboard released**: [@ClementDelangue](https://twitter.com/ClementDelangue/status/1805989925080219927) noted the new Open LLM Leaderboard evaluates **all major open LLMs*8, with **Qwen 72B as the top model**. Previous evaluations have become too easy for recent models, indicating AI builders may have focused too much on main evaluations at the expense of model performance on others.
- **Evals Enabling Fine-Tuning**: [@HamelHusain](https://twitter.com/HamelHusain/status/1803914267210772812) shared a slide from @emilsedgh on how evals set up for fine-tuning, creating a flywheel effect.
- **Benchmark Saturation Concerns**: Some expressed concerns about benchmarks becoming saturated or less useful, such as [@polynoamial](https://twitter.com/polynoamial/status/1803812369237528825) on GSM8K and [@_arohan_](https://twitter.com/_arohan_/status/1803968038515150967) on HumanEval for coding.

**AI Models and Architectures**

- **Memory and caching tricks**: [@NoamShazeer](https://x.com/NoamShazeer/status/1803790708358410380) explain how Character.ai serves **20% of Google Search Traffic** for LLM inference, while reducing serving costs by a factor of 33 (compared to late 2022), estimating that **leading commercial APIs would cost at least 13.5X more**.
- **Transformer Dominance**: [@KevinAFischer](https://twitter.com/KevinAFischer/status/1804214242297680256) argued transformers will **continue to scale and dominate**, drawing parallels to silicon processors. He advised against working on alternative architectures in academia.
- **Eliminating matrix multiplication in LLMs**: [@rohanpaul_ai](https://twitter.com/rohanpaul_ai/status/1806108390231331260) shared a paper on **'Scalable MatMul-free Language Modeling'** which eliminates expensive matrix multiplications while maintaining strong performance at billion-parameter scales. Memory consumption can be reduced by more than 10× compared to unoptimized models.
- **Importance of Architecture**: [@aidan_clark](https://twitter.com/_aidan_clark_/status/1804014969689903240) emphasized the importance of architecture work to enable current progress, **countering views that only scaling matters**.

**Other Notable Updates and Discussions**

- **Distillation Discussion**: [@giffmana](https://twitter.com/giffmana/status/1806402283649036605) and [@jeremyphoward](https://twitter.com/jeremyphoward/status/1806446889006666110) discussed the importance of distillation and the **"curse of the capacity gap"** in training smaller high-performing models.
- **DeepSeek-Coder-V2 Browser Coding**: [@deepseek_ai](https://twitter.com/deepseek_ai/status/1804171764626526606) showcased DeepSeek-Coder-V2's ability to develop mini-games and websites directly in the browser.
- **Challenges Productionizing LLMs**: [@svpino](https://twitter.com/svpino/status/1803765665335038354) noted companies pausing LLM efforts due to challenges in scaling past demos. However, [@alexalbert__](https://twitter.com/alexalbert__/status/1803804691522035741) shared that Anthropic engineers now use Claude to save hours on coding tasks.
- **Mixture of Agents Beats GPT-4**: [@corbtt](https://twitter.com/corbtt/status/1803813970018791845) introduced a Mixture of Agents (MoA) model that beats GPT-4 while being 25x cheaper. It generates initial completions, reflects, and produces a final output.


Note that my example links to sources inline, in context of the details of the link, instead of just adding [source] or [link] at the end like a lazy person.


Basic
You are a program that summarizes technical Tweets. Summarize step by step:
- First, note down high level news items and discussion topics across the entire corpus.
- Then flesh out a detailed, concise summary for each, with specific details on numbers, dates, launches, upcoming dates, important facts and strong opinions.
- You can follow up each summary with additional disagreements and context where necessary.

Use Markdown formatting for your summaries, **bold** and *italics* and [Link](sources) inline where possible or relevant, in particular calling out the @handle of the source or conversation participants.

Topics to ignore: AGI timelines, doomerism, e/acc, politics


Messages
Postfix instructions to the model to summarize. You can use subsequent messages to require checks on output (e.g. formatting, anti hallucination).


Run pipeline

Creates a new run.
When the background job run, it will pull those configurations.
See the job status and configuration.


Results

Ater a successful run, show Markdown text.