Skip to content

Instantly share code, notes, and snippets.

@cedrickchee
Last active July 7, 2024 13:14
Show Gist options
  • Save cedrickchee/ee017f280761577e0e807843e124625a to your computer and use it in GitHub Desktop.
Save cedrickchee/ee017f280761577e0e807843e124625a to your computer and use it in GitHub Desktop.
AI News TLDR App [Ideas, WIP]

AI News TLDR App

Can't keep up with the exponential progress of AI and LLM?

Fret not. We got you!

This is a minimal working app that goes thru all top Tweets and Reddits and summarizes LLM/GenAI news and what people are talking about. And send you a roundup daily.

You can think of it like some kind of generated AI newsletter.

Technical Design

  • A way to allow user to configure data source

    • Data fetch cutoff
    • Scrape Tweets from the given Twitter handles
  • Summarization pipeline config

    • Tweets limit - Number of Tweets to run summaries on. 0 for no limit. (Prevents very long and expensive A/B testing)

    • Set language models

      • Model triplet: a model identifier (string) in this format: {model family}-{version}-[optional variant]-[optional tag]-[optional base/chat]. Case insensitive.
      • Examples: gpt-4o, claude-3-5-sonnet-20240620, gemma-2-27b-it, meta-llama-3-70b, phi-3-mini-128k-instruct
    • System prompt

      Example system prompts for summarizing Tweets:

      • Advanced

        Make a technical summary of the top 3-5 major themes in the following CONTENT, bolding **important key terms** and **facts** and **URLs**, and [linking to source](source_url), in Markdown format. Give very specific examples, drawn from real conversations and activities. Ensure high information density. Use the format of **bold high level topic**: [2 sentence description each]. Ignore and do not summarize any topics to do with AGI timelines, doomerism, e/acc, AI safety, politics and regulation, debugging libraries, or minor system outages.
        
        Example style of the final output I want (do not copy the content):
        
        
        **Claude 3.5 Sonnet Release by Anthropic**
        
        - **Performance**: [@alexalbert__](https://twitter.com/alexalbert__/status/1803804677701869748) noted Claude 3.5 Sonnet outperforms competitor models on key evaluations, at **twice the speed** of Claude 3 Opus and **one-fifth the cost**. It shows marked improvement in grasping nuance, humor, and complex instructions. [@AnthropicAI](https://twitter.com/AnthropicAI/status/1803790676988920098) highlighted it now outperforms GPT-4o on several benchmarks like **GPQA, MMLU, and HumanEval**.
        - **Artifacts Feature**: [@AnthropicAI](https://twitter.com/AnthropicAI/status/1803790681971859473) introduced Artifacts, allowing users to generate docs, code, diagrams, graphics, or games that appear next to the chat for real-time iteration. [@alexalbert__](https://twitter.com/alexalbert__/status/1803804686501507418) noted he's stopped using most simple chart, diagram, and visualization software due to this.
        - **Coding Capabilities**: In Anthropic's internal pull request eval, [@alexalbert__](https://twitter.com/alexalbert__/status/1803804682412007850) shared Claude 3.5 Sonnet passed **64% of test cases vs 38% for Claude 3 Opus**. [@alexalbert__](https://twitter.com/alexalbert__/status/1803804689538171351) quoted an engineer saying it fixed a bug in an open source library they were using.
        
        **AI Benchmarks and Evaluations**
        
        - **New Open LLM Leaderboard released**: [@ClementDelangue](https://twitter.com/ClementDelangue/status/1805989925080219927) noted the new Open LLM Leaderboard evaluates **all major open LLMs*8, with **Qwen 72B as the top model**. Previous evaluations have become too easy for recent models, indicating AI builders may have focused too much on main evaluations at the expense of model performance on others.
        - **Evals Enabling Fine-Tuning**: [@HamelHusain](https://twitter.com/HamelHusain/status/1803914267210772812) shared a slide from @emilsedgh on how evals set up for fine-tuning, creating a flywheel effect.
        - **Benchmark Saturation Concerns**: Some expressed concerns about benchmarks becoming saturated or less useful, such as [@polynoamial](https://twitter.com/polynoamial/status/1803812369237528825) on GSM8K and [@_arohan_](https://twitter.com/_arohan_/status/1803968038515150967) on HumanEval for coding.
        
        **AI Models and Architectures**
        
        - **Memory and caching tricks**: [@NoamShazeer](https://x.com/NoamShazeer/status/1803790708358410380) explain how Character.ai serves **20% of Google Search Traffic** for LLM inference, while reducing serving costs by a factor of 33 (compared to late 2022), estimating that **leading commercial APIs would cost at least 13.5X more**.
        - **Transformer Dominance**: [@KevinAFischer](https://twitter.com/KevinAFischer/status/1804214242297680256) argued transformers will **continue to scale and dominate**, drawing parallels to silicon processors. He advised against working on alternative architectures in academia.
        - **Eliminating matrix multiplication in LLMs**: [@rohanpaul_ai](https://twitter.com/rohanpaul_ai/status/1806108390231331260) shared a paper on **'Scalable MatMul-free Language Modeling'** which eliminates expensive matrix multiplications while maintaining strong performance at billion-parameter scales. Memory consumption can be reduced by more than 10× compared to unoptimized models.
        - **Importance of Architecture**: [@aidan_clark](https://twitter.com/_aidan_clark_/status/1804014969689903240) emphasized the importance of architecture work to enable current progress, **countering views that only scaling matters**.
        
        **Other Notable Updates and Discussions**
        
        - **Distillation Discussion**: [@giffmana](https://twitter.com/giffmana/status/1806402283649036605) and [@jeremyphoward](https://twitter.com/jeremyphoward/status/1806446889006666110) discussed the importance of distillation and the **"curse of the capacity gap"** in training smaller high-performing models.
        - **DeepSeek-Coder-V2 Browser Coding**: [@deepseek_ai](https://twitter.com/deepseek_ai/status/1804171764626526606) showcased DeepSeek-Coder-V2's ability to develop mini-games and websites directly in the browser.
        - **Challenges Productionizing LLMs**: [@svpino](https://twitter.com/svpino/status/1803765665335038354) noted companies pausing LLM efforts due to challenges in scaling past demos. However, [@alexalbert__](https://twitter.com/alexalbert__/status/1803804691522035741) shared that Anthropic engineers now use Claude to save hours on coding tasks.
        - **Mixture of Agents Beats GPT-4**: [@corbtt](https://twitter.com/corbtt/status/1803813970018791845) introduced a Mixture of Agents (MoA) model that beats GPT-4 while being 25x cheaper. It generates initial completions, reflects, and produces a final output.
        
        
        Note that my example links to sources inline, in context of the details of the link, instead of just adding [source] or [link] at the end like a lazy person.
        
      • Basic

        You are a program that summarizes technical Tweets. Summarize step by step:
        - First, note down high level news items and discussion topics across the entire corpus.
        - Then flesh out a detailed, concise summary for each, with specific details on numbers, dates, launches, upcoming dates, important facts and strong opinions.
        - You can follow up each summary with additional disagreements and context where necessary.
        
        Use Markdown formatting for your summaries, **bold** and *italics* and [Link](sources) inline where possible or relevant, in particular calling out the @handle of the source or conversation participants.
        
        Topics to ignore: AGI timelines, doomerism, e/acc, politics
        
    • Messages

      Postfix instructions to the model to summarize. You can use subsequent messages to require checks on output (e.g. formatting, anti hallucination).

  • Run pipeline

    • Creates a new run.
    • When the background job run, it will pull those configurations.
    • See the job status and configuration.
  • Results

    • Ater a successful run, show Markdown text.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment