Created
February 1, 2026 23:27
-
-
Save k-nearest-neighbor/6d9a34f54fc17a0ed84c0b0df7b4d809 to your computer and use it in GitHub Desktop.
392 @daily-paper-discussion messages and their arxiv links as of 01/2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| discussion_date | link | paper_title | |
|---|---|---|---|
| 2026-01-20 | https://arxiv.org/abs/2512.24601 | Recursive Language Models | |
| 2025-11-11 | https://arxiv.org/abs/2504.16828 | Process Reward Models That Think | |
| 2025-11-06 | https://arxiv.org/abs/2510.09596 | BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards | |
| 2025-11-05 | https://arxiv.org/abs/2510.25976 | Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer | |
| 2025-11-04 | https://arxiv.org/abs/2509.17196 | Evolution of Concepts in Language Model Pre-Training | |
| 2025-10-29 | https://arxiv.org/abs/2510.23691 | Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents | |
| 2025-10-28 | https://arxiv.org/abs/2510.21614 | Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine | |
| 2025-10-27 | https://arxiv.org/abs/2510.15511 | Language Models are Injective and Hence Invertible | |
| 2025-10-20 | https://arxiv.org/abs/2510.14901 | Reasoning with Sampling: Your Base Model is Smarter Than You Think | |
| 2025-10-16 | https://arxiv.org/abs/2510.01171 | Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity | |
| 2025-10-15 | https://arxiv.org/abs/1802.06070 | Diversity is All You Need: Learning Skills without a Reward Function | |
| 2025-10-14 | https://arxiv.org/abs/2510.01279 | TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture | |
| 2025-10-13 | https://arxiv.org/abs/2509.22818 | Can Large Language Models Develop Gambling Addiction? | |
| 2025-10-09 | https://arxiv.org/abs/2510.04871v1 | Less is More: Recursive Reasoning with Tiny Networks | |
| 2025-10-01 | https://arxiv.org/abs/2506.09047 | Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs | |
| 2025-09-30 | https://arxiv.org/abs/2507.06203v1 | A Survey on Latent Reasoning | |
| 2025-09-24 | https://arxiv.org/abs/2509.19249 | Reinforcement Learning on Pre-Training Data | |
| 2025-09-22 | https://arxiv.org/abs/2509.14252 | LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures | |
| 2025-09-18 | https://arxiv.org/abs/2505.13763 | Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations | |
| 2025-09-17 | https://arxiv.org/abs/2509.05276 | SpikingBrain: Spiking Brain-inspired Large Models | |
| 2025-09-12 | https://arxiv.org/abs/2509.08519 | HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning | |
| 2025-09-11 | https://arxiv.org/abs/2508.13948 | Prompt Orchestration Markup Language | |
| 2025-09-10 | https://arxiv.org/abs/2509.02722 | Planning with Reasoning using Vision Language World Model | |
| 2025-09-04 | https://arxiv.org/abs/2507.19703 | The wall confronting large language models | |
| 2025-09-02 | https://arxiv.org/abs/2508.21038 | On the Theoretical Limitations of Embedding-Based Retrieval | |
| 2025-08-27 | https://arxiv.org/abs/2506.08343 | Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency | |
| 2025-08-26 | https://arxiv.org/abs/2506.02867 | Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning | |
| 2025-08-25 | https://arxiv.org/abs/2508.10390 | Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts | |
| 2025-08-21 | https://arxiv.org/abs/2411.00986 | Taking AI Welfare Seriously | |
| 2025-08-20 | https://arxiv.org/abs/2508.06492 | Effective Training Data Synthesis for Improving MLLM Chart Understanding | |
| 2025-08-07 | https://arxiv.org/abs/2506.21734 | Hierarchical Reasoning Model | |
| 2025-08-06 | https://arxiv.org/abs/2402.15391 | Genie: Generative Interactive Environments | |
| 2025-08-06 | https://arxiv.org/abs/2404.10179 | Scaling Instructable Agents Across Many Simulated Worlds | |
| 2025-07-11 | https://arxiv.org/abs/2504.10612 | Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling | |
| 2025-07-10 | https://arxiv.org/abs/2505.17117 | From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning | |
| 2025-07-04 | https://arxiv.org/abs/2406.06484 | Parallelizing Linear Transformers with the Delta Rule over Sequence Length | |
| 2025-07-03 | https://arxiv.org/abs/2503.14456 | RWKV-7 "Goose" with Expressive Dynamic State Evolution | |
| 2025-06-27 | https://arxiv.org/abs/2505.06708 | Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | |
| 2025-06-25 | https://arxiv.org/abs/2506.10947 | Spurious Rewards: Rethinking Training Signals in RLVR | |
| 2025-06-19 | https://arxiv.org/abs/2506.09985 | V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | |
| 2025-06-18 | https://arxiv.org/abs/2407.04117 | Predictive Coding Networks and Inference Learning: Tutorial and Survey | |
| 2025-06-12 | https://arxiv.org/abs/2506.01622 | General agents contain world models | |
| 2025-06-06 | https://arxiv.org/abs/2505.12540 | Harnessing the Universal Geometry of Embeddings | |
| 2025-06-03 | https://arxiv.org/abs/2409.12517 | Scaling FP8 training to trillion-token LLMs | |
| 2025-05-14 | https://arxiv.org/abs/2302.04761 | Toolformer: Language Models Can Teach Themselves to Use Tools | |
| 2025-05-13 | https://arxiv.org/abs/2305.13673 | Physics of Language Models: Part 1, Learning Hierarchical Language Structures | |
| 2025-04-30 | https://arxiv.org/abs/2504.15376 | Towards Understanding Camera Motions in Any Video | |
| 2025-04-23 | https://arxiv.org/abs/2409.20325 | Old Optimizer, New Norm: An Anthology | |
| 2025-04-23 | https://arxiv.org/abs/2412.10925 | Video Representation Learning with Joint-Embedding Predictive Architectures | |
| 2025-04-17 | https://arxiv.org/abs/2402.03300 | DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | |
| 2025-04-17 | https://arxiv.org/abs/2403.05525 | DeepSeek-VL: Towards Real-World Vision-Language Understanding | |
| 2025-04-10 | https://arxiv.org/abs/2401.14196 | DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence | |
| 2025-04-09 | https://arxiv.org/abs/2401.06066 | DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | |
| 2025-04-08 | https://arxiv.org/abs/2401.02954 | DeepSeek LLM: Scaling Open-Source Language Models with Longtermism | |
| 2025-04-02 | https://arxiv.org/abs/2503.22230 | Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback | |
| 2025-03-27 | https://arxiv.org/abs/2503.19551 | Scaling Laws of Synthetic Data for Language Models | |
| 2025-03-26 | https://arxiv.org/abs/2503.00735 | LADDER: Self-Improving LLMs Through Recursive Problem Decomposition | |
| 2025-03-25 | https://arxiv.org/abs/2503.14607 | Can Large Vision Language Models Read Maps Like a Human? | |
| 2025-03-20 | https://arxiv.org/abs/2503.14378 | Impossible Videos | |
| 2025-03-20 | https://arxiv.org/abs/2503.14478 | Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM | |
| 2025-03-18 | https://arxiv.org/abs/2503.10965 | Auditing language models for hidden objectives | |
| 2025-03-11 | https://arxiv.org/abs/2503.04130 | STORM: Token-Efficient Long Video Understanding for Multimodal LLMs | |
| 2025-03-07 | https://arxiv.org/abs/2503.00865 | Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers | |
| 2025-03-06 | https://arxiv.org/abs/2412.06771 | Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | |
| 2025-02-27 | https://arxiv.org/abs/2410.16179 | MagicPIG: LSH Sampling for Efficient LLM Generation | |
| 2025-02-26 | https://arxiv.org/abs/2502.03387 | LIMO: Less is More for Reasoning | |
| 2025-02-25 | https://arxiv.org/abs/2502.14786 | SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | |
| 2025-02-22 | https://arxiv.org/abs/2305.18290 | Direct Preference Optimization: Your Language Model is Secretly a Reward Model | |
| 2025-02-20 | https://arxiv.org/abs/2502.11089 | Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention | |
| 2025-02-19 | https://arxiv.org/abs/2502.09696 | ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models | |
| 2025-02-19 | https://arxiv.org/abs/2502.12150 | Idiosyncrasies in Large Language Models | |
| 2025-02-12 | https://arxiv.org/abs/2402.10588 | Do Llamas Work in English? On the Latent Language of Multilingual Transformers | |
| 2025-02-11 | https://arxiv.org/abs/2111.00396v3 | Efficiently Modeling Long Sequences with Structured State Spaces | |
| 2025-02-04 | https://arxiv.org/abs/2501.18837 | Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming | |
| 2025-01-31 | https://arxiv.org/abs/2501.17161 | SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | |
| 2025-01-30 | https://arxiv.org/abs/2501.12370 | Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models | |
| 2025-01-25 | https://arxiv.org/abs/2501.12326 | UI-TARS: Pioneering Automated GUI Interaction with Native Agents | |
| 2025-01-24 | https://arxiv.org/abs/2501.13011 | MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking | |
| 2025-01-18 | https://arxiv.org/abs/2501.06425 | Tensor Product Attention Is All You Need | |
| 2025-01-17 | https://arxiv.org/abs/2501.08313 | MiniMax-01: Scaling Foundation Models with Lightning Attention | |
| 2025-01-16 | https://arxiv.org/abs/2501.00663 | Titans: Learning to Memorize at Test Time | |
| 2025-01-14 | https://arxiv.org/abs/2501.05874 | VideoRAG: Retrieval-Augmented Generation over Video Corpus | |
| 2025-01-09 | https://arxiv.org/abs/2501.01423v1 | Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models | |
| 2025-01-08 | https://arxiv.org/abs/2412.06769 | Training Large Language Models to Reason in a Continuous Latent Space | |
| 2025-01-08 | https://arxiv.org/abs/2412.19437 | DeepSeek-V3 Technical Report | |
| 2024-12-20 | https://arxiv.org/abs/2412.08905 | Phi-4 Technical Report | |
| 2024-12-13 | https://arxiv.org/abs/2411.19865 | Reverse Thinking Makes LLMs Stronger Reasoners | |
| 2024-12-12 | https://arxiv.org/abs/2412.06966 | Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research | |
| 2024-12-11 | https://arxiv.org/abs/2412.04468 | NVILA: Efficient Frontier Visual Language Models | |
| 2024-12-05 | https://arxiv.org/abs/2407.08608 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | |
| 2024-12-04 | https://arxiv.org/abs/2411.10440 | LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | |
| 2024-12-03 | https://arxiv.org/abs/2411.07191 | The Super Weight in Large Language Models | |
| 2024-12-03 | https://arxiv.org/abs/2411.14405 | Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions | |
| 2024-11-29 | https://arxiv.org/abs/2411.17690 | Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | |
| 2024-11-27 | https://arxiv.org/abs/2411.14402 | Multimodal Autoregressive Pre-training of Large Vision Encoders | |
| 2024-11-26 | https://arxiv.org/abs/2406.02061 | Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | |
| 2024-11-20 | https://arxiv.org/abs/2406.19370 | Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space | |
| 2024-11-19 | https://arxiv.org/abs/2411.04996 | Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | |
| 2024-11-18 | https://arxiv.org/abs/2411.09009 | Cut Your Losses in Large-Vocabulary Language Models | |
| 2024-11-15 | https://arxiv.org/abs/2411.07279 | The Surprising Effectiveness of Test-Time Training for Few-Shot Learning | |
| 2024-11-13 | https://arxiv.org/abs/2411.04330 | Scaling Laws for Precision | |
| 2024-11-13 | https://arxiv.org/abs/2411.02853 | ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate | |
| 2024-11-13 | https://arxiv.org/abs/1707.06347 | Proximal Policy Optimization Algorithms | |
| 2024-11-13 | https://arxiv.org/abs/2410.00907 | Addition is All You Need for Energy-efficient Language Models | |
| 2024-11-09 | https://arxiv.org/abs/2411.02385 | How Far is Video Generation from World Model: A Physical Law Perspective | |
| 2024-11-08 | https://arxiv.org/abs/2411.02355 | "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | |
| 2024-11-07 | https://arxiv.org/abs/2407.12831 | Truth is Universal: Robust Detection of Lies in LLMs | |
| 2024-11-06 | https://arxiv.org/abs/2410.22071 | Distinguishing Ignorance from Error in LLM Hallucinations | |
| 2024-11-05 | https://arxiv.org/abs/2410.23179 | Does equivariance matter at scale? | |
| 2024-11-01 | https://arxiv.org/abs/2410.16090 | Analysing the Residual Stream of Language Models Under Knowledge Conflicts | |
| 2024-10-31 | https://arxiv.org/abs/2410.11081 | Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models | |
| 2024-10-29 | https://arxiv.org/abs/2410.16270 | Reflection-Bench: Evaluating Epistemic Agency in Large Language Models | |
| 2024-10-25 | https://arxiv.org/abs/2108.08481 | Neural Operator: Learning Maps Between Function Spaces | |
| 2024-10-24 | https://arxiv.org/abs/2410.08146 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | |
| 2024-10-23 | https://arxiv.org/abs/2410.06205 | Round and Round We Go! What makes Rotary Positional Encodings useful? | |
| 2024-10-18 | https://arxiv.org/abs/2405.15071 | Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization | |
| 2024-10-11 | https://arxiv.org/abs/2410.01131 | nGPT: Normalized Transformer with Representation Learning on the Hypersphere | |
| 2024-10-10 | https://arxiv.org/abs/2410.01912 | A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation | |
| 2024-10-09 | https://arxiv.org/abs/2410.04717 | $\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization | |
| 2024-10-09 | https://arxiv.org/abs/2410.05258 | Differential Transformer | |
| 2024-10-08 | https://arxiv.org/abs/2410.02757 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | |
| 2024-10-04 | https://arxiv.org/abs/2408.07199 | Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents | |
| 2024-10-04 | https://arxiv.org/abs/2409.19951 | Law of the Weakest Link: Cross Capabilities of Large Language Models | |
| 2024-10-03 | https://arxiv.org/abs/2409.17481 | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | |
| 2024-10-01 | https://arxiv.org/abs/2409.18869 | Emu3: Next-Token Prediction is All You Need | |
| 2024-09-27 | https://arxiv.org/abs/2211.14275 | Solving math word problems with process- and outcome-based feedback | |
| 2024-09-27 | https://arxiv.org/abs/2407.01449 | ColPali: Efficient Document Retrieval with Vision Language Models | |
| 2024-09-26 | https://arxiv.org/abs/2409.13373 | LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench | |
| 2024-09-25 | https://arxiv.org/abs/2409.14677 | Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections | |
| 2024-09-11 | https://arxiv.org/abs/2409.04431 | Theory, Analysis, and Best Practices for Sigmoid Self-Attention | |
| 2024-09-05 | https://arxiv.org/abs/2409.00558 | Compositional 3D-aware Video Generation with LLM Director | |
| 2024-09-04 | https://arxiv.org/abs/2408.16725 | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | |
| 2024-08-30 | https://arxiv.org/abs/2408.11475 | TrackGo: A Flexible and Efficient Method for Controllable Video Generation | |
| 2024-08-28 | https://arxiv.org/abs/2408.13934 | Learning to Move Like Professional Counter-Strike Players | |
| 2024-08-27 | https://arxiv.org/abs/2408.12637 | Building and better understanding vision-language models: insights and future directions | |
| 2024-08-24 | https://arxiv.org/abs/2408.08210 | Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | |
| 2024-08-23 | https://arxiv.org/abs/1912.01603 | Dream to Control: Learning Behaviors by Latent Imagination | |
| 2024-08-23 | https://arxiv.org/abs/2010.02193 | Mastering Atari with Discrete World Models | |
| 2024-08-23 | https://arxiv.org/abs/2301.04104 | Mastering Diverse Domains through World Models | |
| 2024-08-22 | https://arxiv.org/abs/2106.08295 | A White Paper on Neural Network Quantization | |
| 2024-08-22 | https://arxiv.org/abs/2211.10438 | SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | |
| 2024-08-21 | https://arxiv.org/abs/2406.19470 | Changing Answer Order Can Decrease MMLU Accuracy | |
| 2024-08-20 | https://arxiv.org/abs/2403.19159 | Disentangling Length from Quality in Direct Preference Optimization | |
| 2024-08-17 | https://arxiv.org/abs/2404.01413 | Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data | |
| 2024-08-16 | https://arxiv.org/abs/2406.19108 | Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction | |
| 2024-08-16 | https://arxiv.org/abs/2402.14740 | Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs | |
| 2024-08-14 | https://arxiv.org/abs/2407.02446v1 | Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | |
| 2024-08-13 | https://arxiv.org/abs/2406.03476 | Does your data spark joy? Performance gains from domain upsampling at the end of training | |
| 2024-08-08 | https://arxiv.org/abs/2407.01502 | AI Agents That Matter | |
| 2024-08-08 | https://arxiv.org/abs/2408.00118 | Gemma 2: Improving Open Language Models at a Practical Size | |
| 2024-08-02 | https://arxiv.org/abs/2004.07780 | Shortcut Learning in Deep Neural Networks | |
| 2024-07-24 | https://arxiv.org/abs/2406.19999 | The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models | |
| 2024-07-23 | https://arxiv.org/abs/2407.06581 | Vision language models are blind: Failing to translate detailed visual features into words | |
| 2024-07-23 | https://arxiv.org/abs/2407.04622 | On scalable oversight with weak LLMs judging strong LLMs | |
| 2024-07-19 | https://arxiv.org/abs/2406.06469 | Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning | |
| 2024-07-18 | https://arxiv.org/abs/2407.04620 | Learning to (Learn at Test Time): RNNs with Expressive Hidden States | |
| 2024-07-18 | https://arxiv.org/abs/2407.10671 | Qwen2 Technical Report | |
| 2024-07-17 | https://arxiv.org/abs/2402.01817 | LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks | |
| 2024-07-15 | https://arxiv.org/abs/2406.13236 | Data Contamination Can Cross Language Barriers | |
| 2024-07-12 | https://arxiv.org/abs/2407.07726 | PaliGemma: A versatile 3B VLM for transfer | |
| 2024-07-11 | https://arxiv.org/abs/2407.03618 | BM25S: Orders of magnitude faster lexical search via eager sparse scoring | |
| 2024-07-09 | https://arxiv.org/abs/2407.04172 | ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild | |
| 2024-07-04 | https://arxiv.org/abs/2407.02371 | OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation | |
| 2024-07-03 | https://arxiv.org/abs/2404.16130 | From Local to Global: A Graph RAG Approach to Query-Focused Summarization | |
| 2024-06-27 | https://arxiv.org/abs/2406.10162 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | |
| 2024-06-21 | https://arxiv.org/abs/2406.09406 | 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | |
| 2024-06-21 | https://arxiv.org/abs/2406.07394 | Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | |
| 2024-06-21 | https://arxiv.org/abs/2304.08485 | Visual Instruction Tuning | |
| 2024-06-21 | https://arxiv.org/abs/2310.03744 | Improved Baselines with Visual Instruction Tuning | |
| 2024-06-19 | https://arxiv.org/abs/2311.08516 | LLMs cannot find reasoning errors, but can correct them given the error location | |
| 2024-06-14 | https://arxiv.org/abs/2406.04093 | Scaling and evaluating sparse autoencoders | |
| 2024-06-14 | https://arxiv.org/abs/2406.06525 | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | |
| 2024-06-14 | https://arxiv.org/abs/2406.07550 | An Image is Worth 32 Tokens for Reconstruction and Generation | |
| 2024-06-14 | https://arxiv.org/abs/2406.08478 | What If We Recaption Billions of Web Images with LLaMA-3? | |
| 2024-06-08 | https://arxiv.org/abs/2305.09636 | SoundStorm: Efficient Parallel Audio Generation | |
| 2024-06-08 | https://arxiv.org/abs/2405.21075 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | |
| 2024-06-08 | https://arxiv.org/abs/2306.13549 | A Survey on Multimodal Large Language Models | |
| 2024-06-08 | https://arxiv.org/abs/2405.14860 | Not All Language Model Features Are One-Dimensionally Linear | |
| 2024-06-08 | https://arxiv.org/abs/2310.06114 | Learning Interactive Real-World Simulators | |
| 2024-05-31 | https://arxiv.org/abs/2107.03312 | SoundStream: An End-to-End Neural Audio Codec | |
| 2024-05-31 | https://arxiv.org/abs/2209.03143 | AudioLM: a Language Modeling Approach to Audio Generation | |
| 2024-05-24 | https://arxiv.org/abs/2404.11568 | On the Scalability of GNNs for Molecular Graphs | |
| 2024-05-24 | https://arxiv.org/abs/2405.02246 | What matters when building vision-language models? | |
| 2024-05-24 | https://arxiv.org/abs/2405.10626 | Dynamic data sampler for cross-language transfer learning in large language models | |
| 2024-05-24 | https://arxiv.org/abs/2312.08566 | Learning adaptive planning representations with natural language guidance | |
| 2024-05-24 | https://arxiv.org/abs/2405.11473 | FIFO-Diffusion: Generating Infinite Videos from Text without Training | |
| 2024-05-24 | https://arxiv.org/abs/2405.07987 | The Platonic Representation Hypothesis | |
| 2024-05-23 | https://arxiv.org/abs/2306.12925 | AudioPaLM: A Large Language Model That Can Speak and Listen | |
| 2024-05-17 | https://arxiv.org/abs/2405.05904 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | |
| 2024-05-16 | https://arxiv.org/abs/2403.06098 | VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models | |
| 2024-05-13 | https://arxiv.org/abs/2311.12786 | Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks | |
| 2024-05-10 | https://arxiv.org/abs/2404.19756 | KAN: Kolmogorov-Arnold Networks | |
| 2024-05-10 | https://arxiv.org/abs/2405.04517 | xLSTM: Extended Long Short-Term Memory | |
| 2024-05-10 | https://arxiv.org/abs/2302.00487 | A Comprehensive Survey of Continual Learning: Theory, Method and Application | |
| 2024-05-10 | https://arxiv.org/abs/2405.00332 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic | |
| 2024-05-03 | https://arxiv.org/abs/2312.13558 | The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction | |
| 2024-05-02 | https://arxiv.org/abs/2404.19737 | Better & Faster Large Language Models via Multi-token Prediction | |
| 2024-05-01 | https://arxiv.org/abs/2309.13638 | Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve | |
| 2024-04-26 | https://arxiv.org/abs/2404.03592 | ReFT: Representation Finetuning for Language Models | |
| 2024-04-26 | https://arxiv.org/abs/2404.13208 | The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions | |
| 2024-04-26 | https://arxiv.org/abs/2404.07143 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | |
| 2024-04-26 | https://arxiv.org/abs/2401.13660 | MambaByte: Token-free Selective State Space Model | |
| 2024-04-26 | https://arxiv.org/abs/2404.14219 | Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | |
| 2024-04-26 | https://arxiv.org/abs/2404.14047 | An empirical study of LLaMA3 quantization: from LLMs to MLLMs | |
| 2024-04-19 | https://arxiv.org/abs/2404.08634 | When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models | |
| 2024-04-19 | https://arxiv.org/abs/2404.03683 | Stream of Search (SoS): Learning to Search in Language | |
| 2024-04-19 | https://arxiv.org/abs/2403.17844 | Mechanistic Design and Scaling of Hybrid Architectures | |
| 2024-04-19 | https://arxiv.org/abs/2404.09656 | Learn Your Reference Model for Real Good Alignment | |
| 2024-04-19 | https://arxiv.org/abs/2404.09937 | Compression Represents Intelligence Linearly | |
| 2024-04-19 | https://arxiv.org/abs/2404.07979 | LLoCO: Learning Long Contexts Offline | |
| 2024-04-19 | https://arxiv.org/abs/2404.09956 | Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization | |
| 2024-04-12 | https://arxiv.org/abs/2402.19427 | Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models | |
| 2024-04-11 | https://arxiv.org/abs/2404.05966 | THOUGHTSCULPT: Reasoning with Intermediate Revision and Search | |
| 2024-04-10 | https://arxiv.org/abs/2404.05595 | UniFL: Improve Latent Diffusion Model via Unified Feedback Learning | |
| 2024-04-10 | https://arxiv.org/abs/2404.05666 | YaART: Yet Another ART Rendering Technology | |
| 2024-04-09 | https://arxiv.org/abs/2402.05120 | More Agents Is All You Need | |
| 2024-04-09 | https://arxiv.org/abs/2403.02419 | Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems | |
| 2024-04-07 | https://arxiv.org/abs/2403.12881 | Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | |
| 2024-04-05 | https://arxiv.org/abs/2403.19887 | Jamba: A Hybrid Transformer-Mamba Language Model | |
| 2024-04-05 | https://arxiv.org/abs/2404.00399 | Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code | |
| 2024-04-05 | https://arxiv.org/abs/2403.15371 | Can large language models explore in-context? | |
| 2024-04-05 | https://arxiv.org/abs/2404.01744 | Octopus v2: On-device language model for super agent | |
| 2024-03-29 | https://arxiv.org/abs/2403.10616 | DiPaCo: Distributed Path Composition | |
| 2024-03-29 | https://arxiv.org/abs/2403.14773 | StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text | |
| 2024-03-29 | https://arxiv.org/abs/2403.15042 | LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | |
| 2024-03-29 | https://arxiv.org/abs/2403.15377 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | |
| 2024-03-29 | https://arxiv.org/abs/2403.11901 | Larimar: Large Language Models with Episodic Memory Control | |
| 2024-03-29 | https://arxiv.org/abs/2403.17297 | InternLM2 Technical Report | |
| 2024-03-22 | https://arxiv.org/abs/2403.13187 | Evolutionary Optimization of Model Merging Recipes | |
| 2024-03-21 | https://arxiv.org/abs/2403.04642 | Teaching Large Language Models to Reason with Reinforcement Learning | |
| 2024-03-20 | https://arxiv.org/abs/2310.04799 | Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages | |
| 2024-03-19 | https://arxiv.org/abs/2403.06963 | The pitfalls of next-token prediction | |
| 2024-03-19 | https://arxiv.org/abs/2403.09629 | Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | |
| 2024-03-15 | https://arxiv.org/abs/2403.03507 | GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | |
| 2024-03-15 | https://arxiv.org/abs/2402.11753 | ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs | |
| 2024-03-15 | https://arxiv.org/abs/2403.03163 | Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering | |
| 2024-03-15 | https://arxiv.org/abs/2402.19450 | Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | |
| 2024-03-15 | https://arxiv.org/abs/2403.04652 | Yi: Open Foundation Models by 01.AI | |
| 2024-03-08 | https://arxiv.org/abs/2104.09864 | RoFormer: Enhanced Transformer with Rotary Position Embedding | |
| 2024-03-08 | https://arxiv.org/abs/2306.15595 | Extending Context Window of Large Language Models via Positional Interpolation | |
| 2024-03-08 | https://arxiv.org/abs/2402.13753 | LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | |
| 2024-03-06 | https://arxiv.org/abs/2402.19155 | Beyond Language Models: Byte Models are Digital World Simulators | |
| 2024-03-06 | https://arxiv.org/abs/2402.17764 | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | |
| 2024-03-05 | https://arxiv.org/abs/2402.08268 | World Model on Million-Length Video And Language With Blockwise RingAttention | |
| 2024-03-01 | https://arxiv.org/abs/2310.01889 | Ring Attention with Blockwise Transformers for Near-Infinite Context | |
| 2024-03-01 | https://arxiv.org/abs/2311.09431 | Striped Attention: Faster Ring Attention for Causal Transformers | |
| 2024-03-01 | https://arxiv.org/abs/2402.17177 | Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | |
| 2024-02-26 | https://arxiv.org/abs/2402.08609 | Mixtures of Experts Unlock Parameter Scaling for Deep RL | |
| 2024-02-23 | https://arxiv.org/abs/2402.10200 | Chain-of-Thought Reasoning Without Prompting | |
| 2024-02-15 | https://arxiv.org/abs/2306.00637 | Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models | |
| 2024-02-14 | https://arxiv.org/abs/2402.05929 | An Interactive Agent Foundation Model | |
| 2024-02-12 | https://arxiv.org/abs/2402.04252 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | |
| 2024-02-09 | https://arxiv.org/abs/2402.04494 | Amortized Planning with Large-Scale Transformers: A Case Study on Chess | |
| 2024-02-08 | https://arxiv.org/abs/2402.03620 | Self-Discover: Large Language Models Self-Compose Reasoning Structures | |
| 2024-02-07 | https://arxiv.org/abs/2401.08967 | ReFT: Reasoning with Reinforced Fine-Tuning | |
| 2024-02-06 | https://arxiv.org/abs/2402.01391 | StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | |
| 2024-02-05 | https://arxiv.org/abs/2402.00742 | Transforming and Combining Rewards for Aligning Large Language Models | |
| 2024-02-02 | https://arxiv.org/abs/2311.16567 | MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices | |
| 2024-01-30 | https://arxiv.org/abs/2310.17567 | Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models | |
| 2024-01-29 | https://arxiv.org/abs/2401.15077 | EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty | |
| 2024-01-25 | https://arxiv.org/abs/2401.11605 | Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | |
| 2024-01-24 | https://arxiv.org/abs/2401.12945 | Lumiere: A Space-Time Diffusion Model for Video Generation | |
| 2024-01-19 | https://arxiv.org/abs/2401.10020 | Self-Rewarding Language Models | |
| 2024-01-17 | https://arxiv.org/abs/2401.03065 | CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution | |
| 2024-01-16 | https://arxiv.org/abs/2401.05566 | Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | |
| 2024-01-15 | https://arxiv.org/abs/2312.11865 | Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach | |
| 2024-01-15 | https://arxiv.org/abs/2308.00352 | MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | |
| 2024-01-11 | https://arxiv.org/abs/2311.12983 | GAIA: a benchmark for General AI Assistants | |
| 2024-01-10 | https://arxiv.org/abs/2401.04088 | Mixtral of Experts | |
| 2024-01-05 | https://arxiv.org/abs/2312.08361 | Distributed Inference and Fine-tuning of Large Language Models Over The Internet | |
| 2024-01-04 | https://arxiv.org/abs/2401.01335 | Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | |
| 2024-01-03 | https://arxiv.org/abs/2312.00886 | Nash Learning from Human Feedback | |
| 2023-12-21 | https://arxiv.org/abs/2312.11444 | An In-depth Look at Gemini's Language Abilities | |
| 2023-12-21 | https://arxiv.org/abs/2312.12456 | PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | |
| 2023-12-13 | https://arxiv.org/abs/2312.04884 | UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models | |
| 2023-12-04 | https://arxiv.org/abs/2310.01783 | Can large language models provide useful feedback on research papers? A large-scale empirical analysis | |
| 2023-12-01 | https://arxiv.org/abs/2311.16933 | SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models | |
| 2023-11-30 | https://arxiv.org/abs/2311.14737 | Positional Description Matters for Transformers Arithmetic | |
| 2023-11-21 | https://arxiv.org/abs/2311.00871 | Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models | |
| 2023-11-20 | https://arxiv.org/abs/2311.04850 | Rethinking Benchmark and Contamination for Language Models with Rephrased Samples | |
| 2023-11-16 | https://arxiv.org/abs/2311.05997 | JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | |
| 2023-11-15 | https://arxiv.org/abs/2306.14824 | Kosmos-2: Grounding Multimodal Large Language Models to the World | |
| 2023-11-14 | https://arxiv.org/abs/2310.04378 | Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference | |
| 2023-11-13 | https://arxiv.org/abs/2306.05284 | Simple and Controllable Music Generation | |
| 2023-11-10 | https://arxiv.org/abs/2309.17421 | The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | |
| 2023-11-08 | https://arxiv.org/abs/2310.17680v1 | CodeFusion: A Pre-trained Diffusion Model for Code Generation | |
| 2023-11-07 | https://arxiv.org/abs/2310.03214 | FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | |
| 2023-11-06 | https://arxiv.org/abs/2310.11511 | Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | |
| 2023-10-31 | https://arxiv.org/abs/2207.07051 | Language models show human-like content effects on reasoning tasks | |
| 2023-10-30 | https://arxiv.org/abs/2310.08560 | MemGPT: Towards LLMs as Operating Systems | |
| 2023-10-25 | https://arxiv.org/abs/2310.13548 | Towards Understanding Sycophancy in Language Models | |
| 2023-10-13 | https://arxiv.org/abs/2212.11281 | Language models are better than humans at next-token prediction | |
| 2023-10-13 | https://arxiv.org/abs/2212.04037 | Demystifying Prompts in Language Models via Perplexity Estimation | |
| 2023-09-28 | https://arxiv.org/abs/2210.15097 | Contrastive Decoding: Open-ended Text Generation as Optimization | |
| 2023-09-28 | https://arxiv.org/abs/2309.09117 | Contrastive Decoding Improves Reasoning in Large Language Models | |
| 2023-09-26 | https://arxiv.org/abs/2309.12499 | CodePlan: Repository-level Coding using LLMs and Planning | |
| 2023-09-21 | https://arxiv.org/abs/2309.10668 | Language Modeling Is Compression | |
| 2023-09-14 | https://arxiv.org/abs/2308.11432 | A Survey on Large Language Model based Autonomous Agents | |
| 2023-09-14 | https://arxiv.org/abs/2309.05463 | Textbooks Are All You Need II: phi-1.5 technical report | |
| 2023-09-05 | https://arxiv.org/abs/2307.03172 | Lost in the Middle: How Language Models Use Long Contexts | |
| 2023-08-29 | https://arxiv.org/abs/2306.08568 | WizardCoder: Empowering Code Large Language Models with Evol-Instruct | |
| 2023-08-22 | https://arxiv.org/abs/2308.09687 | Graph of Thoughts: Solving Elaborate Problems with Large Language Models | |
| 2023-08-18 | https://arxiv.org/abs/2305.11206 | LIMA: Less Is More for Alignment | |
| 2023-08-17 | https://arxiv.org/abs/2308.07317 | Platypus: Quick, Cheap, and Powerful Refinement of LLMs | |
| 2023-08-15 | https://arxiv.org/abs/2305.16635 | Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing | |
| 2023-08-10 | https://arxiv.org/abs/2306.11644 | Textbooks Are All You Need | |
| 2023-08-09 | https://arxiv.org/abs/2308.03296 | Studying Large Language Model Generalization with Influence Functions | |
| 2023-08-08 | https://arxiv.org/abs/2308.01399 | Learning to Model the World with Language | |
| 2023-08-03 | https://arxiv.org/abs/2307.12856 | A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis | |
| 2023-08-01 | https://arxiv.org/abs/2307.07924 | ChatDev: Communicative Agents for Software Development | |
| 2023-07-31 | https://arxiv.org/abs/1907.04164 | Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model | |
| 2023-07-28 | https://arxiv.org/abs/1412.6980 | Adam: A Method for Stochastic Optimization | |
| 2023-07-27 | https://arxiv.org/abs/2212.09251 | Discovering Language Model Behaviors with Model-Written Evaluations | |
| 2023-07-26 | https://arxiv.org/abs/1812.06162 | An Empirical Model of Large-Batch Training | |
| 2023-07-21 | https://arxiv.org/abs/2205.05638 | Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning | |
| 2023-07-17 | https://arxiv.org/abs/2210.17323 | GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | |
| 2023-07-14 | https://arxiv.org/abs/2306.03078 | SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression | |
| 2023-07-12 | https://arxiv.org/abs/2212.09720 | The case for 4-bit precision: k-bit Inference Scaling Laws | |
| 2023-07-11 | https://arxiv.org/abs/2106.09685 | LoRA: Low-Rank Adaptation of Large Language Models | |
| 2023-07-10 | https://arxiv.org/abs/2208.07339 | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
| 2023-07-07 | https://arxiv.org/abs/2110.02861 | 8-bit Optimizers via Block-wise Quantization | |
| 2023-07-05 | https://arxiv.org/abs/1710.03740 | Mixed Precision Training | |
| 2023-07-05 | https://arxiv.org/abs/2209.05433 | FP8 Formats for Deep Learning | |
| 2023-07-04 | https://arxiv.org/abs/2306.12456 | Pushing the Limits of Machine Design: Automated CPU Design with AI | |
| 2023-07-03 | https://arxiv.org/abs/2306.16388 | Towards Measuring the Representation of Subjective Global Opinions in Language Models | |
| 2023-06-30 | https://arxiv.org/abs/2201.11990 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | |
| 2023-06-29 | https://arxiv.org/abs/2304.11477 | LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | |
| 2023-06-29 | https://arxiv.org/abs/2306.14325 | The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs | |
| 2023-06-28 | https://arxiv.org/abs/2304.08467 | Learning to Compress Prompts with Gist Tokens | |
| 2023-06-22 | https://arxiv.org/abs/2306.04563v1 | ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models | |
| 2023-06-21 | https://arxiv.org/abs/2306.07906 | WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences | |
| 2023-06-20 | https://arxiv.org/abs/2305.18654 | Faith and Fate: Limits of Transformers on Compositionality | |
| 2023-06-19 | https://arxiv.org/abs/2001.08361 | Scaling Laws for Neural Language Models | |
| 2023-06-16 | https://arxiv.org/abs/2306.07899v1 | Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks | |
| 2023-06-16 | https://arxiv.org/abs/2306.07906v1 | WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences | |
| 2023-06-15 | https://arxiv.org/abs/2306.05425 | MIMIC-IT: Multi-Modal In-Context Instruction Tuning | |
| 2023-06-14 | https://arxiv.org/abs/2306.04751 | How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources | |
| 2023-06-13 | https://arxiv.org/abs/2305.17126 | Large Language Models as Tool Makers | |
| 2023-06-12 | https://arxiv.org/abs/2306.03341v2 | Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | |
| 2023-06-09 | https://arxiv.org/abs/2305.17926 | Large Language Models are not Fair Evaluators | |
| 2023-06-09 | https://arxiv.org/abs/2305.00118 | Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 | |
| 2023-06-08 | https://arxiv.org/abs/2306.00323 | Thought Cloning: Learning to Think while Acting by Imitating Human Thinking | |
| 2023-06-06 | https://arxiv.org/abs/2305.20050 | Let's Verify Step by Step | |
| 2023-06-06 | https://arxiv.org/abs/2306.01694 | Evaluating Language Models for Mathematics through Interactions | |
| 2023-05-31 | https://arxiv.org/abs/2305.15334v1 | Gorilla: Large Language Model Connected with Massive APIs | |
| 2023-05-30 | https://arxiv.org/abs/2305.16291 | Voyager: An Open-Ended Embodied Agent with Large Language Models | |
| 2023-05-29 | https://arxiv.org/abs/2305.15717 | The False Promise of Imitating Proprietary LLMs | |
| 2023-05-26 | https://arxiv.org/abs/2305.15324v1 | Model evaluation for extreme risks | |
| 2023-05-24 | https://arxiv.org/abs/2305.10601 | Tree of Thoughts: Deliberate Problem Solving with Large Language Models | |
| 2023-05-21 | https://arxiv.org/abs/2305.11169 | Emergent Representations of Program Semantics in Language Models Trained on Programs | |
| 2023-05-19 | https://arxiv.org/abs/2303.11341 | What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring | |
| 2023-05-19 | https://arxiv.org/abs/2305.08746v1 | Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability | |
| 2023-05-17 | https://arxiv.org/abs/2305.00833 | Learning to Reason and Memorize with Self-Notes | |
| 2023-05-15 | https://arxiv.org/abs/2304.03442 | Generative Agents: Interactive Simulacra of Human Behavior | |
| 2023-05-12 | https://arxiv.org/abs/2305.04388 | Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | |
| 2023-05-11 | https://arxiv.org/abs/2304.09848 | Evaluating Verifiability in Generative Search Engines | |
| 2023-05-09 | https://arxiv.org/abs/2302.12173 | Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | |
| 2023-05-04 | https://arxiv.org/abs/2304.15004 | Are Emergent Abilities of Large Language Models a Mirage? | |
| 2023-05-03 | https://arxiv.org/abs/2209.00626 | The Alignment Problem from a Deep Learning Perspective | |
| 2023-05-01 | https://arxiv.org/abs/2304.12210 | A Cookbook of Self-Supervised Learning | |
| 2023-04-21 | https://arxiv.org/abs/2304.07193 | DINOv2: Learning Robust Visual Features without Supervision | |
| 2023-04-19 | https://arxiv.org/abs/2301.12597 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
| 2023-04-19 | https://arxiv.org/abs/2304.08466 | Synthetic Data from Diffusion Models Improves ImageNet Classification | |
| 2023-04-18 | https://arxiv.org/abs/2010.11929 | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | |
| 2023-04-17 | https://arxiv.org/abs/1812.11118 | Reconciling modern machine learning practice and the bias-variance trade-off | |
| 2023-04-17 | https://arxiv.org/abs/2304.06035 | Choose Your Weapon: Survival Strategies for Depressed AI Academics | |
| 2023-04-14 | https://arxiv.org/abs/2303.18223v4 | A Survey of Large Language Models | |
| 2023-04-07 | https://arxiv.org/abs/2304.00186 | Subject-driven Text-to-Image Generation via Apprenticeship Learning | |
| 2023-04-07 | https://arxiv.org/abs/2303.17651 | Self-Refine: Iterative Refinement with Self-Feedback | |
| 2023-04-03 | https://arxiv.org/abs/2202.07785 | Predictability and Surprise in Large Generative Models | |
| 2023-03-28 | https://arxiv.org/abs/2303.14177 | Scaling Expert Language Models with Unsupervised Domain Discovery | |
| 2023-03-23 | https://arxiv.org/abs/2303.12712 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 | |
| 2023-03-22 | https://arxiv.org/abs/2112.00861 | A General Language Assistant as a Laboratory for Alignment | |
| 2023-03-16 | https://arxiv.org/abs/2212.10560 | Self-Instruct: Aligning Language Models with Self-Generated Instructions | |
| 2023-03-13 | https://arxiv.org/abs/2302.08582 | Pretraining Language Models with Human Preferences | |
| 2023-03-10 | https://arxiv.org/abs/2206.05802 | Self-critiquing models for assisting human evaluators | |
| 2023-03-06 | https://arxiv.org/abs/2303.03323 | CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning | |
| 2023-03-06 | https://arxiv.org/abs/2303.03378 | PaLM-E: An Embodied Multimodal Language Model | |
| 2023-03-06 | https://arxiv.org/abs/2302.13971 | LLaMA: Open and Efficient Foundation Language Models | |
| 2023-03-01 | https://arxiv.org/abs/2109.10862 | Recursively Summarizing Books with Human Feedback | |
| 2023-02-27 | https://arxiv.org/abs/2009.01325 | Learning to summarize from human feedback | |
| 2023-02-24 | https://arxiv.org/abs/2112.09332 | WebGPT: Browser-assisted question-answering with human feedback | |
| 2023-02-22 | https://arxiv.org/abs/2210.10760 | Scaling Laws for Reward Model Overoptimization | |
| 2023-02-20 | https://arxiv.org/abs/2211.15006 | Fine-tuning language models to find agreement among humans with diverse preferences | |
| 2023-02-19 | https://arxiv.org/abs/2302.07027 | AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models | |
| 2023-02-18 | https://arxiv.org/abs/2110.02642 | Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy | |
| 2023-02-13 | https://arxiv.org/abs/2202.03286 | Red Teaming Language Models with Language Models | |
| 2023-02-06 | https://arxiv.org/abs/2301.12810 | Crawling the Internal Knowledge-Base of Language Models | |
| 2023-01-26 | https://arxiv.org/abs/1909.08593v2 | Fine-Tuning Language Models from Human Preferences | |
| 2023-01-16 | https://arxiv.org/abs/2211.09066 | Teaching Algorithmic Reasoning via In-context Learning | |
| 2022-12-22 | https://arxiv.org/abs/2207.07611 | Position Prediction as an Effective Pretraining Strategy | |
| 2022-12-05 | https://arxiv.org/abs/1707.04585v1 | The Reversible Residual Network: Backpropagation Without Storing Activations |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment