Skip to content

Instantly share code, notes, and snippets.

@k-nearest-neighbor
Created February 1, 2026 23:27
Show Gist options
  • Select an option

  • Save k-nearest-neighbor/6d9a34f54fc17a0ed84c0b0df7b4d809 to your computer and use it in GitHub Desktop.

Select an option

Save k-nearest-neighbor/6d9a34f54fc17a0ed84c0b0df7b4d809 to your computer and use it in GitHub Desktop.
392 @daily-paper-discussion messages and their arxiv links as of 01/2026
discussion_date link paper_title
2026-01-20 https://arxiv.org/abs/2512.24601 Recursive Language Models
2025-11-11 https://arxiv.org/abs/2504.16828 Process Reward Models That Think
2025-11-06 https://arxiv.org/abs/2510.09596 BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards
2025-11-05 https://arxiv.org/abs/2510.25976 Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
2025-11-04 https://arxiv.org/abs/2509.17196 Evolution of Concepts in Language Model Pre-Training
2025-10-29 https://arxiv.org/abs/2510.23691 Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
2025-10-28 https://arxiv.org/abs/2510.21614 Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
2025-10-27 https://arxiv.org/abs/2510.15511 Language Models are Injective and Hence Invertible
2025-10-20 https://arxiv.org/abs/2510.14901 Reasoning with Sampling: Your Base Model is Smarter Than You Think
2025-10-16 https://arxiv.org/abs/2510.01171 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
2025-10-15 https://arxiv.org/abs/1802.06070 Diversity is All You Need: Learning Skills without a Reward Function
2025-10-14 https://arxiv.org/abs/2510.01279 TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
2025-10-13 https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?
2025-10-09 https://arxiv.org/abs/2510.04871v1 Less is More: Recursive Reasoning with Tiny Networks
2025-10-01 https://arxiv.org/abs/2506.09047 Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
2025-09-30 https://arxiv.org/abs/2507.06203v1 A Survey on Latent Reasoning
2025-09-24 https://arxiv.org/abs/2509.19249 Reinforcement Learning on Pre-Training Data
2025-09-22 https://arxiv.org/abs/2509.14252 LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
2025-09-18 https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
2025-09-17 https://arxiv.org/abs/2509.05276 SpikingBrain: Spiking Brain-inspired Large Models
2025-09-12 https://arxiv.org/abs/2509.08519 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
2025-09-11 https://arxiv.org/abs/2508.13948 Prompt Orchestration Markup Language
2025-09-10 https://arxiv.org/abs/2509.02722 Planning with Reasoning using Vision Language World Model
2025-09-04 https://arxiv.org/abs/2507.19703 The wall confronting large language models
2025-09-02 https://arxiv.org/abs/2508.21038 On the Theoretical Limitations of Embedding-Based Retrieval
2025-08-27 https://arxiv.org/abs/2506.08343 Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
2025-08-26 https://arxiv.org/abs/2506.02867 Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
2025-08-25 https://arxiv.org/abs/2508.10390 Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
2025-08-21 https://arxiv.org/abs/2411.00986 Taking AI Welfare Seriously
2025-08-20 https://arxiv.org/abs/2508.06492 Effective Training Data Synthesis for Improving MLLM Chart Understanding
2025-08-07 https://arxiv.org/abs/2506.21734 Hierarchical Reasoning Model
2025-08-06 https://arxiv.org/abs/2402.15391 Genie: Generative Interactive Environments
2025-08-06 https://arxiv.org/abs/2404.10179 Scaling Instructable Agents Across Many Simulated Worlds
2025-07-11 https://arxiv.org/abs/2504.10612 Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
2025-07-10 https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
2025-07-04 https://arxiv.org/abs/2406.06484 Parallelizing Linear Transformers with the Delta Rule over Sequence Length
2025-07-03 https://arxiv.org/abs/2503.14456 RWKV-7 "Goose" with Expressive Dynamic State Evolution
2025-06-27 https://arxiv.org/abs/2505.06708 Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
2025-06-25 https://arxiv.org/abs/2506.10947 Spurious Rewards: Rethinking Training Signals in RLVR
2025-06-19 https://arxiv.org/abs/2506.09985 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
2025-06-18 https://arxiv.org/abs/2407.04117 Predictive Coding Networks and Inference Learning: Tutorial and Survey
2025-06-12 https://arxiv.org/abs/2506.01622 General agents contain world models
2025-06-06 https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings
2025-06-03 https://arxiv.org/abs/2409.12517 Scaling FP8 training to trillion-token LLMs
2025-05-14 https://arxiv.org/abs/2302.04761 Toolformer: Language Models Can Teach Themselves to Use Tools
2025-05-13 https://arxiv.org/abs/2305.13673 Physics of Language Models: Part 1, Learning Hierarchical Language Structures
2025-04-30 https://arxiv.org/abs/2504.15376 Towards Understanding Camera Motions in Any Video
2025-04-23 https://arxiv.org/abs/2409.20325 Old Optimizer, New Norm: An Anthology
2025-04-23 https://arxiv.org/abs/2412.10925 Video Representation Learning with Joint-Embedding Predictive Architectures
2025-04-17 https://arxiv.org/abs/2402.03300 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
2025-04-17 https://arxiv.org/abs/2403.05525 DeepSeek-VL: Towards Real-World Vision-Language Understanding
2025-04-10 https://arxiv.org/abs/2401.14196 DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
2025-04-09 https://arxiv.org/abs/2401.06066 DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
2025-04-08 https://arxiv.org/abs/2401.02954 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
2025-04-02 https://arxiv.org/abs/2503.22230 Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
2025-03-27 https://arxiv.org/abs/2503.19551 Scaling Laws of Synthetic Data for Language Models
2025-03-26 https://arxiv.org/abs/2503.00735 LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
2025-03-25 https://arxiv.org/abs/2503.14607 Can Large Vision Language Models Read Maps Like a Human?
2025-03-20 https://arxiv.org/abs/2503.14378 Impossible Videos
2025-03-20 https://arxiv.org/abs/2503.14478 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
2025-03-18 https://arxiv.org/abs/2503.10965 Auditing language models for hidden objectives
2025-03-11 https://arxiv.org/abs/2503.04130 STORM: Token-Efficient Long Video Understanding for Multimodal LLMs
2025-03-07 https://arxiv.org/abs/2503.00865 Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
2025-03-06 https://arxiv.org/abs/2412.06771 Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
2025-02-27 https://arxiv.org/abs/2410.16179 MagicPIG: LSH Sampling for Efficient LLM Generation
2025-02-26 https://arxiv.org/abs/2502.03387 LIMO: Less is More for Reasoning
2025-02-25 https://arxiv.org/abs/2502.14786 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
2025-02-22 https://arxiv.org/abs/2305.18290 Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2025-02-20 https://arxiv.org/abs/2502.11089 Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
2025-02-19 https://arxiv.org/abs/2502.09696 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
2025-02-19 https://arxiv.org/abs/2502.12150 Idiosyncrasies in Large Language Models
2025-02-12 https://arxiv.org/abs/2402.10588 Do Llamas Work in English? On the Latent Language of Multilingual Transformers
2025-02-11 https://arxiv.org/abs/2111.00396v3 Efficiently Modeling Long Sequences with Structured State Spaces
2025-02-04 https://arxiv.org/abs/2501.18837 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
2025-01-31 https://arxiv.org/abs/2501.17161 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
2025-01-30 https://arxiv.org/abs/2501.12370 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
2025-01-25 https://arxiv.org/abs/2501.12326 UI-TARS: Pioneering Automated GUI Interaction with Native Agents
2025-01-24 https://arxiv.org/abs/2501.13011 MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
2025-01-18 https://arxiv.org/abs/2501.06425 Tensor Product Attention Is All You Need
2025-01-17 https://arxiv.org/abs/2501.08313 MiniMax-01: Scaling Foundation Models with Lightning Attention
2025-01-16 https://arxiv.org/abs/2501.00663 Titans: Learning to Memorize at Test Time
2025-01-14 https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus
2025-01-09 https://arxiv.org/abs/2501.01423v1 Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
2025-01-08 https://arxiv.org/abs/2412.06769 Training Large Language Models to Reason in a Continuous Latent Space
2025-01-08 https://arxiv.org/abs/2412.19437 DeepSeek-V3 Technical Report
2024-12-20 https://arxiv.org/abs/2412.08905 Phi-4 Technical Report
2024-12-13 https://arxiv.org/abs/2411.19865 Reverse Thinking Makes LLMs Stronger Reasoners
2024-12-12 https://arxiv.org/abs/2412.06966 Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
2024-12-11 https://arxiv.org/abs/2412.04468 NVILA: Efficient Frontier Visual Language Models
2024-12-05 https://arxiv.org/abs/2407.08608 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
2024-12-04 https://arxiv.org/abs/2411.10440 LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
2024-12-03 https://arxiv.org/abs/2411.07191 The Super Weight in Large Language Models
2024-12-03 https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
2024-11-29 https://arxiv.org/abs/2411.17690 Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
2024-11-27 https://arxiv.org/abs/2411.14402 Multimodal Autoregressive Pre-training of Large Vision Encoders
2024-11-26 https://arxiv.org/abs/2406.02061 Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
2024-11-20 https://arxiv.org/abs/2406.19370 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
2024-11-19 https://arxiv.org/abs/2411.04996 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
2024-11-18 https://arxiv.org/abs/2411.09009 Cut Your Losses in Large-Vocabulary Language Models
2024-11-15 https://arxiv.org/abs/2411.07279 The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
2024-11-13 https://arxiv.org/abs/2411.04330 Scaling Laws for Precision
2024-11-13 https://arxiv.org/abs/2411.02853 ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate
2024-11-13 https://arxiv.org/abs/1707.06347 Proximal Policy Optimization Algorithms
2024-11-13 https://arxiv.org/abs/2410.00907 Addition is All You Need for Energy-efficient Language Models
2024-11-09 https://arxiv.org/abs/2411.02385 How Far is Video Generation from World Model: A Physical Law Perspective
2024-11-08 https://arxiv.org/abs/2411.02355 "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
2024-11-07 https://arxiv.org/abs/2407.12831 Truth is Universal: Robust Detection of Lies in LLMs
2024-11-06 https://arxiv.org/abs/2410.22071 Distinguishing Ignorance from Error in LLM Hallucinations
2024-11-05 https://arxiv.org/abs/2410.23179 Does equivariance matter at scale?
2024-11-01 https://arxiv.org/abs/2410.16090 Analysing the Residual Stream of Language Models Under Knowledge Conflicts
2024-10-31 https://arxiv.org/abs/2410.11081 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
2024-10-29 https://arxiv.org/abs/2410.16270 Reflection-Bench: Evaluating Epistemic Agency in Large Language Models
2024-10-25 https://arxiv.org/abs/2108.08481 Neural Operator: Learning Maps Between Function Spaces
2024-10-24 https://arxiv.org/abs/2410.08146 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
2024-10-23 https://arxiv.org/abs/2410.06205 Round and Round We Go! What makes Rotary Positional Encodings useful?
2024-10-18 https://arxiv.org/abs/2405.15071 Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
2024-10-11 https://arxiv.org/abs/2410.01131 nGPT: Normalized Transformer with Representation Learning on the Hypersphere
2024-10-10 https://arxiv.org/abs/2410.01912 A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
2024-10-09 https://arxiv.org/abs/2410.04717 $\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization
2024-10-09 https://arxiv.org/abs/2410.05258 Differential Transformer
2024-10-08 https://arxiv.org/abs/2410.02757 Loong: Generating Minute-level Long Videos with Autoregressive Language Models
2024-10-04 https://arxiv.org/abs/2408.07199 Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
2024-10-04 https://arxiv.org/abs/2409.19951 Law of the Weakest Link: Cross Capabilities of Large Language Models
2024-10-03 https://arxiv.org/abs/2409.17481 MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
2024-10-01 https://arxiv.org/abs/2409.18869 Emu3: Next-Token Prediction is All You Need
2024-09-27 https://arxiv.org/abs/2211.14275 Solving math word problems with process- and outcome-based feedback
2024-09-27 https://arxiv.org/abs/2407.01449 ColPali: Efficient Document Retrieval with Vision Language Models
2024-09-26 https://arxiv.org/abs/2409.13373 LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
2024-09-25 https://arxiv.org/abs/2409.14677 Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
2024-09-11 https://arxiv.org/abs/2409.04431 Theory, Analysis, and Best Practices for Sigmoid Self-Attention
2024-09-05 https://arxiv.org/abs/2409.00558 Compositional 3D-aware Video Generation with LLM Director
2024-09-04 https://arxiv.org/abs/2408.16725 Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
2024-08-30 https://arxiv.org/abs/2408.11475 TrackGo: A Flexible and Efficient Method for Controllable Video Generation
2024-08-28 https://arxiv.org/abs/2408.13934 Learning to Move Like Professional Counter-Strike Players
2024-08-27 https://arxiv.org/abs/2408.12637 Building and better understanding vision-language models: insights and future directions
2024-08-24 https://arxiv.org/abs/2408.08210 Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models
2024-08-23 https://arxiv.org/abs/1912.01603 Dream to Control: Learning Behaviors by Latent Imagination
2024-08-23 https://arxiv.org/abs/2010.02193 Mastering Atari with Discrete World Models
2024-08-23 https://arxiv.org/abs/2301.04104 Mastering Diverse Domains through World Models
2024-08-22 https://arxiv.org/abs/2106.08295 A White Paper on Neural Network Quantization
2024-08-22 https://arxiv.org/abs/2211.10438 SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
2024-08-21 https://arxiv.org/abs/2406.19470 Changing Answer Order Can Decrease MMLU Accuracy
2024-08-20 https://arxiv.org/abs/2403.19159 Disentangling Length from Quality in Direct Preference Optimization
2024-08-17 https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
2024-08-16 https://arxiv.org/abs/2406.19108 Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
2024-08-16 https://arxiv.org/abs/2402.14740 Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
2024-08-14 https://arxiv.org/abs/2407.02446v1 Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling
2024-08-13 https://arxiv.org/abs/2406.03476 Does your data spark joy? Performance gains from domain upsampling at the end of training
2024-08-08 https://arxiv.org/abs/2407.01502 AI Agents That Matter
2024-08-08 https://arxiv.org/abs/2408.00118 Gemma 2: Improving Open Language Models at a Practical Size
2024-08-02 https://arxiv.org/abs/2004.07780 Shortcut Learning in Deep Neural Networks
2024-07-24 https://arxiv.org/abs/2406.19999 The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
2024-07-23 https://arxiv.org/abs/2407.06581 Vision language models are blind: Failing to translate detailed visual features into words
2024-07-23 https://arxiv.org/abs/2407.04622 On scalable oversight with weak LLMs judging strong LLMs
2024-07-19 https://arxiv.org/abs/2406.06469 Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
2024-07-18 https://arxiv.org/abs/2407.04620 Learning to (Learn at Test Time): RNNs with Expressive Hidden States
2024-07-18 https://arxiv.org/abs/2407.10671 Qwen2 Technical Report
2024-07-17 https://arxiv.org/abs/2402.01817 LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
2024-07-15 https://arxiv.org/abs/2406.13236 Data Contamination Can Cross Language Barriers
2024-07-12 https://arxiv.org/abs/2407.07726 PaliGemma: A versatile 3B VLM for transfer
2024-07-11 https://arxiv.org/abs/2407.03618 BM25S: Orders of magnitude faster lexical search via eager sparse scoring
2024-07-09 https://arxiv.org/abs/2407.04172 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
2024-07-04 https://arxiv.org/abs/2407.02371 OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
2024-07-03 https://arxiv.org/abs/2404.16130 From Local to Global: A Graph RAG Approach to Query-Focused Summarization
2024-06-27 https://arxiv.org/abs/2406.10162 Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
2024-06-21 https://arxiv.org/abs/2406.09406 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
2024-06-21 https://arxiv.org/abs/2406.07394 Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
2024-06-21 https://arxiv.org/abs/2304.08485 Visual Instruction Tuning
2024-06-21 https://arxiv.org/abs/2310.03744 Improved Baselines with Visual Instruction Tuning
2024-06-19 https://arxiv.org/abs/2311.08516 LLMs cannot find reasoning errors, but can correct them given the error location
2024-06-14 https://arxiv.org/abs/2406.04093 Scaling and evaluating sparse autoencoders
2024-06-14 https://arxiv.org/abs/2406.06525 Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
2024-06-14 https://arxiv.org/abs/2406.07550 An Image is Worth 32 Tokens for Reconstruction and Generation
2024-06-14 https://arxiv.org/abs/2406.08478 What If We Recaption Billions of Web Images with LLaMA-3?
2024-06-08 https://arxiv.org/abs/2305.09636 SoundStorm: Efficient Parallel Audio Generation
2024-06-08 https://arxiv.org/abs/2405.21075 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
2024-06-08 https://arxiv.org/abs/2306.13549 A Survey on Multimodal Large Language Models
2024-06-08 https://arxiv.org/abs/2405.14860 Not All Language Model Features Are One-Dimensionally Linear
2024-06-08 https://arxiv.org/abs/2310.06114 Learning Interactive Real-World Simulators
2024-05-31 https://arxiv.org/abs/2107.03312 SoundStream: An End-to-End Neural Audio Codec
2024-05-31 https://arxiv.org/abs/2209.03143 AudioLM: a Language Modeling Approach to Audio Generation
2024-05-24 https://arxiv.org/abs/2404.11568 On the Scalability of GNNs for Molecular Graphs
2024-05-24 https://arxiv.org/abs/2405.02246 What matters when building vision-language models?
2024-05-24 https://arxiv.org/abs/2405.10626 Dynamic data sampler for cross-language transfer learning in large language models
2024-05-24 https://arxiv.org/abs/2312.08566 Learning adaptive planning representations with natural language guidance
2024-05-24 https://arxiv.org/abs/2405.11473 FIFO-Diffusion: Generating Infinite Videos from Text without Training
2024-05-24 https://arxiv.org/abs/2405.07987 The Platonic Representation Hypothesis
2024-05-23 https://arxiv.org/abs/2306.12925 AudioPaLM: A Large Language Model That Can Speak and Listen
2024-05-17 https://arxiv.org/abs/2405.05904 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
2024-05-16 https://arxiv.org/abs/2403.06098 VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
2024-05-13 https://arxiv.org/abs/2311.12786 Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
2024-05-10 https://arxiv.org/abs/2404.19756 KAN: Kolmogorov-Arnold Networks
2024-05-10 https://arxiv.org/abs/2405.04517 xLSTM: Extended Long Short-Term Memory
2024-05-10 https://arxiv.org/abs/2302.00487 A Comprehensive Survey of Continual Learning: Theory, Method and Application
2024-05-10 https://arxiv.org/abs/2405.00332 A Careful Examination of Large Language Model Performance on Grade School Arithmetic
2024-05-03 https://arxiv.org/abs/2312.13558 The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
2024-05-02 https://arxiv.org/abs/2404.19737 Better & Faster Large Language Models via Multi-token Prediction
2024-05-01 https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
2024-04-26 https://arxiv.org/abs/2404.03592 ReFT: Representation Finetuning for Language Models
2024-04-26 https://arxiv.org/abs/2404.13208 The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
2024-04-26 https://arxiv.org/abs/2404.07143 Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
2024-04-26 https://arxiv.org/abs/2401.13660 MambaByte: Token-free Selective State Space Model
2024-04-26 https://arxiv.org/abs/2404.14219 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
2024-04-26 https://arxiv.org/abs/2404.14047 An empirical study of LLaMA3 quantization: from LLMs to MLLMs
2024-04-19 https://arxiv.org/abs/2404.08634 When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models
2024-04-19 https://arxiv.org/abs/2404.03683 Stream of Search (SoS): Learning to Search in Language
2024-04-19 https://arxiv.org/abs/2403.17844 Mechanistic Design and Scaling of Hybrid Architectures
2024-04-19 https://arxiv.org/abs/2404.09656 Learn Your Reference Model for Real Good Alignment
2024-04-19 https://arxiv.org/abs/2404.09937 Compression Represents Intelligence Linearly
2024-04-19 https://arxiv.org/abs/2404.07979 LLoCO: Learning Long Contexts Offline
2024-04-19 https://arxiv.org/abs/2404.09956 Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
2024-04-12 https://arxiv.org/abs/2402.19427 Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
2024-04-11 https://arxiv.org/abs/2404.05966 THOUGHTSCULPT: Reasoning with Intermediate Revision and Search
2024-04-10 https://arxiv.org/abs/2404.05595 UniFL: Improve Latent Diffusion Model via Unified Feedback Learning
2024-04-10 https://arxiv.org/abs/2404.05666 YaART: Yet Another ART Rendering Technology
2024-04-09 https://arxiv.org/abs/2402.05120 More Agents Is All You Need
2024-04-09 https://arxiv.org/abs/2403.02419 Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
2024-04-07 https://arxiv.org/abs/2403.12881 Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
2024-04-05 https://arxiv.org/abs/2403.19887 Jamba: A Hybrid Transformer-Mamba Language Model
2024-04-05 https://arxiv.org/abs/2404.00399 Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
2024-04-05 https://arxiv.org/abs/2403.15371 Can large language models explore in-context?
2024-04-05 https://arxiv.org/abs/2404.01744 Octopus v2: On-device language model for super agent
2024-03-29 https://arxiv.org/abs/2403.10616 DiPaCo: Distributed Path Composition
2024-03-29 https://arxiv.org/abs/2403.14773 StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
2024-03-29 https://arxiv.org/abs/2403.15042 LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
2024-03-29 https://arxiv.org/abs/2403.15377 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
2024-03-29 https://arxiv.org/abs/2403.11901 Larimar: Large Language Models with Episodic Memory Control
2024-03-29 https://arxiv.org/abs/2403.17297 InternLM2 Technical Report
2024-03-22 https://arxiv.org/abs/2403.13187 Evolutionary Optimization of Model Merging Recipes
2024-03-21 https://arxiv.org/abs/2403.04642 Teaching Large Language Models to Reason with Reinforcement Learning
2024-03-20 https://arxiv.org/abs/2310.04799 Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
2024-03-19 https://arxiv.org/abs/2403.06963 The pitfalls of next-token prediction
2024-03-19 https://arxiv.org/abs/2403.09629 Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
2024-03-15 https://arxiv.org/abs/2403.03507 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
2024-03-15 https://arxiv.org/abs/2402.11753 ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
2024-03-15 https://arxiv.org/abs/2403.03163 Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
2024-03-15 https://arxiv.org/abs/2402.19450 Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
2024-03-15 https://arxiv.org/abs/2403.04652 Yi: Open Foundation Models by 01.AI
2024-03-08 https://arxiv.org/abs/2104.09864 RoFormer: Enhanced Transformer with Rotary Position Embedding
2024-03-08 https://arxiv.org/abs/2306.15595 Extending Context Window of Large Language Models via Positional Interpolation
2024-03-08 https://arxiv.org/abs/2402.13753 LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
2024-03-06 https://arxiv.org/abs/2402.19155 Beyond Language Models: Byte Models are Digital World Simulators
2024-03-06 https://arxiv.org/abs/2402.17764 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
2024-03-05 https://arxiv.org/abs/2402.08268 World Model on Million-Length Video And Language With Blockwise RingAttention
2024-03-01 https://arxiv.org/abs/2310.01889 Ring Attention with Blockwise Transformers for Near-Infinite Context
2024-03-01 https://arxiv.org/abs/2311.09431 Striped Attention: Faster Ring Attention for Causal Transformers
2024-03-01 https://arxiv.org/abs/2402.17177 Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
2024-02-26 https://arxiv.org/abs/2402.08609 Mixtures of Experts Unlock Parameter Scaling for Deep RL
2024-02-23 https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting
2024-02-15 https://arxiv.org/abs/2306.00637 Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
2024-02-14 https://arxiv.org/abs/2402.05929 An Interactive Agent Foundation Model
2024-02-12 https://arxiv.org/abs/2402.04252 EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
2024-02-09 https://arxiv.org/abs/2402.04494 Amortized Planning with Large-Scale Transformers: A Case Study on Chess
2024-02-08 https://arxiv.org/abs/2402.03620 Self-Discover: Large Language Models Self-Compose Reasoning Structures
2024-02-07 https://arxiv.org/abs/2401.08967 ReFT: Reasoning with Reinforced Fine-Tuning
2024-02-06 https://arxiv.org/abs/2402.01391 StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
2024-02-05 https://arxiv.org/abs/2402.00742 Transforming and Combining Rewards for Aligning Large Language Models
2024-02-02 https://arxiv.org/abs/2311.16567 MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
2024-01-30 https://arxiv.org/abs/2310.17567 Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
2024-01-29 https://arxiv.org/abs/2401.15077 EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
2024-01-25 https://arxiv.org/abs/2401.11605 Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
2024-01-24 https://arxiv.org/abs/2401.12945 Lumiere: A Space-Time Diffusion Model for Video Generation
2024-01-19 https://arxiv.org/abs/2401.10020 Self-Rewarding Language Models
2024-01-17 https://arxiv.org/abs/2401.03065 CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
2024-01-16 https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
2024-01-15 https://arxiv.org/abs/2312.11865 Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach
2024-01-15 https://arxiv.org/abs/2308.00352 MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
2024-01-11 https://arxiv.org/abs/2311.12983 GAIA: a benchmark for General AI Assistants
2024-01-10 https://arxiv.org/abs/2401.04088 Mixtral of Experts
2024-01-05 https://arxiv.org/abs/2312.08361 Distributed Inference and Fine-tuning of Large Language Models Over The Internet
2024-01-04 https://arxiv.org/abs/2401.01335 Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
2024-01-03 https://arxiv.org/abs/2312.00886 Nash Learning from Human Feedback
2023-12-21 https://arxiv.org/abs/2312.11444 An In-depth Look at Gemini's Language Abilities
2023-12-21 https://arxiv.org/abs/2312.12456 PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
2023-12-13 https://arxiv.org/abs/2312.04884 UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
2023-12-04 https://arxiv.org/abs/2310.01783 Can large language models provide useful feedback on research papers? A large-scale empirical analysis
2023-12-01 https://arxiv.org/abs/2311.16933 SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
2023-11-30 https://arxiv.org/abs/2311.14737 Positional Description Matters for Transformers Arithmetic
2023-11-21 https://arxiv.org/abs/2311.00871 Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
2023-11-20 https://arxiv.org/abs/2311.04850 Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
2023-11-16 https://arxiv.org/abs/2311.05997 JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
2023-11-15 https://arxiv.org/abs/2306.14824 Kosmos-2: Grounding Multimodal Large Language Models to the World
2023-11-14 https://arxiv.org/abs/2310.04378 Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
2023-11-13 https://arxiv.org/abs/2306.05284 Simple and Controllable Music Generation
2023-11-10 https://arxiv.org/abs/2309.17421 The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
2023-11-08 https://arxiv.org/abs/2310.17680v1 CodeFusion: A Pre-trained Diffusion Model for Code Generation
2023-11-07 https://arxiv.org/abs/2310.03214 FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
2023-11-06 https://arxiv.org/abs/2310.11511 Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
2023-10-31 https://arxiv.org/abs/2207.07051 Language models show human-like content effects on reasoning tasks
2023-10-30 https://arxiv.org/abs/2310.08560 MemGPT: Towards LLMs as Operating Systems
2023-10-25 https://arxiv.org/abs/2310.13548 Towards Understanding Sycophancy in Language Models
2023-10-13 https://arxiv.org/abs/2212.11281 Language models are better than humans at next-token prediction
2023-10-13 https://arxiv.org/abs/2212.04037 Demystifying Prompts in Language Models via Perplexity Estimation
2023-09-28 https://arxiv.org/abs/2210.15097 Contrastive Decoding: Open-ended Text Generation as Optimization
2023-09-28 https://arxiv.org/abs/2309.09117 Contrastive Decoding Improves Reasoning in Large Language Models
2023-09-26 https://arxiv.org/abs/2309.12499 CodePlan: Repository-level Coding using LLMs and Planning
2023-09-21 https://arxiv.org/abs/2309.10668 Language Modeling Is Compression
2023-09-14 https://arxiv.org/abs/2308.11432 A Survey on Large Language Model based Autonomous Agents
2023-09-14 https://arxiv.org/abs/2309.05463 Textbooks Are All You Need II: phi-1.5 technical report
2023-09-05 https://arxiv.org/abs/2307.03172 Lost in the Middle: How Language Models Use Long Contexts
2023-08-29 https://arxiv.org/abs/2306.08568 WizardCoder: Empowering Code Large Language Models with Evol-Instruct
2023-08-22 https://arxiv.org/abs/2308.09687 Graph of Thoughts: Solving Elaborate Problems with Large Language Models
2023-08-18 https://arxiv.org/abs/2305.11206 LIMA: Less Is More for Alignment
2023-08-17 https://arxiv.org/abs/2308.07317 Platypus: Quick, Cheap, and Powerful Refinement of LLMs
2023-08-15 https://arxiv.org/abs/2305.16635 Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
2023-08-10 https://arxiv.org/abs/2306.11644 Textbooks Are All You Need
2023-08-09 https://arxiv.org/abs/2308.03296 Studying Large Language Model Generalization with Influence Functions
2023-08-08 https://arxiv.org/abs/2308.01399 Learning to Model the World with Language
2023-08-03 https://arxiv.org/abs/2307.12856 A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
2023-08-01 https://arxiv.org/abs/2307.07924 ChatDev: Communicative Agents for Software Development
2023-07-31 https://arxiv.org/abs/1907.04164 Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
2023-07-28 https://arxiv.org/abs/1412.6980 Adam: A Method for Stochastic Optimization
2023-07-27 https://arxiv.org/abs/2212.09251 Discovering Language Model Behaviors with Model-Written Evaluations
2023-07-26 https://arxiv.org/abs/1812.06162 An Empirical Model of Large-Batch Training
2023-07-21 https://arxiv.org/abs/2205.05638 Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
2023-07-17 https://arxiv.org/abs/2210.17323 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
2023-07-14 https://arxiv.org/abs/2306.03078 SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
2023-07-12 https://arxiv.org/abs/2212.09720 The case for 4-bit precision: k-bit Inference Scaling Laws
2023-07-11 https://arxiv.org/abs/2106.09685 LoRA: Low-Rank Adaptation of Large Language Models
2023-07-10 https://arxiv.org/abs/2208.07339 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
2023-07-07 https://arxiv.org/abs/2110.02861 8-bit Optimizers via Block-wise Quantization
2023-07-05 https://arxiv.org/abs/1710.03740 Mixed Precision Training
2023-07-05 https://arxiv.org/abs/2209.05433 FP8 Formats for Deep Learning
2023-07-04 https://arxiv.org/abs/2306.12456 Pushing the Limits of Machine Design: Automated CPU Design with AI
2023-07-03 https://arxiv.org/abs/2306.16388 Towards Measuring the Representation of Subjective Global Opinions in Language Models
2023-06-30 https://arxiv.org/abs/2201.11990 Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2023-06-29 https://arxiv.org/abs/2304.11477 LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
2023-06-29 https://arxiv.org/abs/2306.14325 The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs
2023-06-28 https://arxiv.org/abs/2304.08467 Learning to Compress Prompts with Gist Tokens
2023-06-22 https://arxiv.org/abs/2306.04563v1 ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models
2023-06-21 https://arxiv.org/abs/2306.07906 WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
2023-06-20 https://arxiv.org/abs/2305.18654 Faith and Fate: Limits of Transformers on Compositionality
2023-06-19 https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language Models
2023-06-16 https://arxiv.org/abs/2306.07899v1 Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
2023-06-16 https://arxiv.org/abs/2306.07906v1 WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
2023-06-15 https://arxiv.org/abs/2306.05425 MIMIC-IT: Multi-Modal In-Context Instruction Tuning
2023-06-14 https://arxiv.org/abs/2306.04751 How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
2023-06-13 https://arxiv.org/abs/2305.17126 Large Language Models as Tool Makers
2023-06-12 https://arxiv.org/abs/2306.03341v2 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
2023-06-09 https://arxiv.org/abs/2305.17926 Large Language Models are not Fair Evaluators
2023-06-09 https://arxiv.org/abs/2305.00118 Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
2023-06-08 https://arxiv.org/abs/2306.00323 Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
2023-06-06 https://arxiv.org/abs/2305.20050 Let's Verify Step by Step
2023-06-06 https://arxiv.org/abs/2306.01694 Evaluating Language Models for Mathematics through Interactions
2023-05-31 https://arxiv.org/abs/2305.15334v1 Gorilla: Large Language Model Connected with Massive APIs
2023-05-30 https://arxiv.org/abs/2305.16291 Voyager: An Open-Ended Embodied Agent with Large Language Models
2023-05-29 https://arxiv.org/abs/2305.15717 The False Promise of Imitating Proprietary LLMs
2023-05-26 https://arxiv.org/abs/2305.15324v1 Model evaluation for extreme risks
2023-05-24 https://arxiv.org/abs/2305.10601 Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2023-05-21 https://arxiv.org/abs/2305.11169 Emergent Representations of Program Semantics in Language Models Trained on Programs
2023-05-19 https://arxiv.org/abs/2303.11341 What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
2023-05-19 https://arxiv.org/abs/2305.08746v1 Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
2023-05-17 https://arxiv.org/abs/2305.00833 Learning to Reason and Memorize with Self-Notes
2023-05-15 https://arxiv.org/abs/2304.03442 Generative Agents: Interactive Simulacra of Human Behavior
2023-05-12 https://arxiv.org/abs/2305.04388 Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
2023-05-11 https://arxiv.org/abs/2304.09848 Evaluating Verifiability in Generative Search Engines
2023-05-09 https://arxiv.org/abs/2302.12173 Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
2023-05-04 https://arxiv.org/abs/2304.15004 Are Emergent Abilities of Large Language Models a Mirage?
2023-05-03 https://arxiv.org/abs/2209.00626 The Alignment Problem from a Deep Learning Perspective
2023-05-01 https://arxiv.org/abs/2304.12210 A Cookbook of Self-Supervised Learning
2023-04-21 https://arxiv.org/abs/2304.07193 DINOv2: Learning Robust Visual Features without Supervision
2023-04-19 https://arxiv.org/abs/2301.12597 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
2023-04-19 https://arxiv.org/abs/2304.08466 Synthetic Data from Diffusion Models Improves ImageNet Classification
2023-04-18 https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2023-04-17 https://arxiv.org/abs/1812.11118 Reconciling modern machine learning practice and the bias-variance trade-off
2023-04-17 https://arxiv.org/abs/2304.06035 Choose Your Weapon: Survival Strategies for Depressed AI Academics
2023-04-14 https://arxiv.org/abs/2303.18223v4 A Survey of Large Language Models
2023-04-07 https://arxiv.org/abs/2304.00186 Subject-driven Text-to-Image Generation via Apprenticeship Learning
2023-04-07 https://arxiv.org/abs/2303.17651 Self-Refine: Iterative Refinement with Self-Feedback
2023-04-03 https://arxiv.org/abs/2202.07785 Predictability and Surprise in Large Generative Models
2023-03-28 https://arxiv.org/abs/2303.14177 Scaling Expert Language Models with Unsupervised Domain Discovery
2023-03-23 https://arxiv.org/abs/2303.12712 Sparks of Artificial General Intelligence: Early experiments with GPT-4
2023-03-22 https://arxiv.org/abs/2112.00861 A General Language Assistant as a Laboratory for Alignment
2023-03-16 https://arxiv.org/abs/2212.10560 Self-Instruct: Aligning Language Models with Self-Generated Instructions
2023-03-13 https://arxiv.org/abs/2302.08582 Pretraining Language Models with Human Preferences
2023-03-10 https://arxiv.org/abs/2206.05802 Self-critiquing models for assisting human evaluators
2023-03-06 https://arxiv.org/abs/2303.03323 CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
2023-03-06 https://arxiv.org/abs/2303.03378 PaLM-E: An Embodied Multimodal Language Model
2023-03-06 https://arxiv.org/abs/2302.13971 LLaMA: Open and Efficient Foundation Language Models
2023-03-01 https://arxiv.org/abs/2109.10862 Recursively Summarizing Books with Human Feedback
2023-02-27 https://arxiv.org/abs/2009.01325 Learning to summarize from human feedback
2023-02-24 https://arxiv.org/abs/2112.09332 WebGPT: Browser-assisted question-answering with human feedback
2023-02-22 https://arxiv.org/abs/2210.10760 Scaling Laws for Reward Model Overoptimization
2023-02-20 https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences
2023-02-19 https://arxiv.org/abs/2302.07027 AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
2023-02-18 https://arxiv.org/abs/2110.02642 Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy
2023-02-13 https://arxiv.org/abs/2202.03286 Red Teaming Language Models with Language Models
2023-02-06 https://arxiv.org/abs/2301.12810 Crawling the Internal Knowledge-Base of Language Models
2023-01-26 https://arxiv.org/abs/1909.08593v2 Fine-Tuning Language Models from Human Preferences
2023-01-16 https://arxiv.org/abs/2211.09066 Teaching Algorithmic Reasoning via In-context Learning
2022-12-22 https://arxiv.org/abs/2207.07611 Position Prediction as an Effective Pretraining Strategy
2022-12-05 https://arxiv.org/abs/1707.04585v1 The Reversible Residual Network: Backpropagation Without Storing Activations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment