trinib/llm_papers.txt

## llm_papers.txt
Cedille: A large autoregressive French language model
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks
Query2doc: Query Expansion with Large Language Models
The Internal State of an LLM Knows When its Lying
Structured information extraction from complex scientific text with fine-tuned large language models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Large Language Models Encode Clinical Knowledge
PoET: A generative model of protein families as sequences-of-sequences
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Modeling Protein Using Large-scale Pretrain Language Model
A Watermark for Large Language Models
GPT is becoming a Turing machine: Here are some ways to program it
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Large Language Models are Zero-Shot Reasoners
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
How is ChatGPT's behavior changing over time?
Meta-Transformer: A Unified Framework for Multimodal Learning
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Getting More out of Large Language Models for Proofs
Teaching Small Language Models to Reason
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Learning to Retrieve In-Context Examples for Large Language Models
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Context-Aware Abbreviation Expansion Using Large Language Models
Focused Transformer: Contrastive Training for Context Scaling
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Long-range Language Modeling with Self-retrieval
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
Towards Generalist Biomedical AI
Shortcut Learning of Large Language Models in Natural Language Understanding
Quantifying Memorization Across Neural Language Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Copy Is All You Need
Automatic Chain of Thought Prompting in Large Language Models
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Evaluating the Text-to-SQL Capabilities of Large Language Models
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Are Emergent Abilities of Large Language Models a Mirage?
Enhancing Network Management Using Code Generated by Large Language Models
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks
ThinkSum: Probabilistic reasoning over sets using large language models
On the Tool Manipulation Capability of Open-source Large Language Models
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
WavJourney: Compositional Audio Creation with Large Language Models
ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course
Secrets of RLHF in Large Language Models Part I: PPO
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes
Challenges and Applications of Large Language Models
SPOT: Knowledge-Enhanced Language Representations for Information Extraction
Kosmos-2: Grounding Multimodal Large Language Models to the World
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
SKILL: Structured Knowledge Infusion for Large Language Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Understanding Social Reasoning in Language Models with Language Models
The Science of Detecting LLM-Generated Texts
CausalLM is not optimal for in-context learning
Questioning the Survey Responses of Large Language Models
Extending Context Window of Large Language Models via Positional Interpolation
ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing
Probing Factually Grounded Content Transfer with Factual Ablation
Teach LLMs to Personalize -- An Approach inspired by Writing Education
Pre-Trained Large Language Models for Industrial Control
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Self-Alignment with Instruction Backtranslation
Guiding Pretraining in Reinforcement Learning with Large Language Models
Large Language Models are Zero-Shot Rankers for Recommender Systems
Model evaluation for extreme risks
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL
A Simple and Effective Pruning Approach for Large Language Models
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
PromptChainer: Chaining Large Language Model Prompts through Visual Programming
PIPPA: A Partially Synthetic Conversational Dataset
Let's Verify Step by Step
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
Large Language Models Are Reasoning Teachers
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Connecting Neural Response measurements & Computational Models of language: a non-comprehensive guide
Accelerating LLM Inference with Staged Speculative Decoding
Large Language Models for Supply Chain Optimization
Do Large Language Models know what humans know?
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction
Faithful Chain-of-Thought Reasoning
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Superposition of many models into one
Learning to Model the World with Language
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Unifying Large Language Models and Knowledge Graphs: A Roadmap
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models
QLoRA: Efficient Finetuning of Quantized LLMs
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Co-Writing with Opinionated Language Models Affects Users' Views
Language models show human-like content effects on reasoning
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code
OpenAGI: When LLM Meets Domain Experts
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Beyond Generating Code: Evaluating GPT on a Data Visualization Course
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
Studying Large Language Model Generalization with Influence Functions
Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change)
From Sparse to Soft Mixtures of Experts
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models
Large Language Model Guided Tree-of-Thought
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
When Geometric Deep Learning Meets Pretrained Protein Language Models
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level
Language models are weak learners
How Many Demonstrations Do You Need for In-context Learning?
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Gorilla: Large Language Model Connected with Massive APIs
Automatic Generation of Programming Exercises and Code Explanations using Large Language Models
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models
Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models
WebArena: A Realistic Web Environment for Building Autonomous Agents
Language Models can Solve Computer Tasks
ChatGPT Is on the Horizon: Could a Large Language Model Be All We Need for Intelligent Transportation?
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
Invariant Language Modeling
Solving Quantitative Reasoning Problems with Language Models
Personality Traits in Large Language Models
Prompting Large Language Models with Speech Recognition Abilities
Selective Annotation Makes Language Models Better Few-Shot Learners
Using Captum to Explain Generative Language Models
Fine-Tuning Language Models with Just Forward Passes
In-context Autoencoder for Context Compression in a Large Language Model
Entity Projection via Machine Translation for Cross-Lingual NER
OctoPack: Instruction Tuning Code Large Language Models
AlpaGasus: Training A Better Alpaca with Fewer Data
Large Language Models Are Human-Level Prompt Engineers
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach
Large Language Models Can Self-Improve
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
More Agents Is All You Need
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Teaching Algorithmic Reasoning via In-context Learning
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Automatic Evaluation of Attribution by Large Language Models
Generative Agents: Interactive Simulacra of Human Behavior
ALERT: Adapting Language Models to Reasoning Tasks
How does the pre-training objective affect what large language models learn about linguistic properties?
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
FLIRT: Feedback Loop In-context Red Teaming
News Summarization and Evaluation in the Era of GPT-3
Galactica: A Large Language Model for Science
Towards Reasoning in Large Language Models: A Survey
Chain-Of-Thought Prompting Under Streaming Batch: A Case Study
Shepherd: A Critic for Language Model Generation
Emergent autonomous scientific research capabilities of large language models
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Social Simulacra: Creating Populated Prototypes for Social Computing Systems
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Universal and Transferable Adversarial Attacks on Aligned Language Models
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Complexity-Based Prompting for Multi-Step Reasoning
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Scaling TransNormer to 175 Billion Parameters
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Learning ASR pathways: A sparse multilingual ASR model
Stay on topic with Classifier-Free Guidance
Constitutional AI: Harmlessness from AI Feedback
Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis
Teaching Arithmetic to Small Transformers
Demystifying GPT Self-Repair for Code Generation
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education
Link-Context Learning for Multimodal LLMs
Large Language Models Perform Diagnostic Reasoning
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
AgentBench: Evaluating LLMs as Agents
Simple synthetic data reduces sycophancy in large language models
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
Re-visiting Automated Topic Model Evaluation with Large Language Models
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Adaptive Test Generation Using a Large Language Model
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
PaLM: Scaling Language Modeling with Pathways
Teaching Large Language Models to Self-Debug
Building Cooperative Embodied Agents Modularly with Large Language Models
Urdu text in natural scene images: a new dataset and preliminary text detection
LIMA: Less Is More for Alignment
Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs
GPT-NER: Named Entity Recognition via Large Language Models
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Code as Policies: Language Model Programs for Embodied Control
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models
Inspecting and Editing Knowledge Representations in Language Models
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Med-Flamingo: a Multimodal Medical Few-shot Learner
Jigsaw: Large Language Models meet Program Synthesis
Large Language Models Struggle to Learn Long-Tail Knowledge
Llama 2: Open Foundation and Fine-Tuned Chat Models
Textbooks Are All You Need
Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Three Bricks to Consolidate Watermarks for Large Language Models
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
One-shot Machine Teaching: Cost Very Few Examples to Converge Faster
Theory of Mind May Have Spontaneously Emerged in Large Language Models
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Language Is Not All You Need: Aligning Perception with Language Models
Mind's Eye: Grounded Language Model Reasoning through Simulation
StarCoder: may the source be with you!
Self-Critique Prompting with Large Language Models for Inductive Instructions
PaLM 2 Technical Report
Repository-Level Prompt Generation for Large Language Models of Code
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Measuring and Narrowing the Compositionality Gap in Language Models
Differentially Private Fine-tuning of Language Models
A Latent Space Theory for Emergent Abilities in Large Language Models
Reflexion: Language Agents with Verbal Reinforcement Learning
Ambient Adventures: Teaching ChatGPT on Developing Complex Stories
LEACE: Perfect linear concept erasure in closed form
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods
A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
FinGPT: Open-Source Financial Large Language Models
Block Belief Propagation for Parameter Learning in Markov Random Fields
Lost in the Middle: How Language Models Use Long Contexts
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
The Hydra Effect: Emergent Self-repair in Language Model Computations
Educational data augmentation in physics education research using ChatGPT
PolyLM: An Open Source Polyglot Large Language Model
Towards Expert-Level Medical Question Answering with Large Language Models
Is GPT-4 a Good Data Analyst?
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models
Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
ReAct: Synergizing Reasoning and Acting in Language Models
Augmenting Language Models with Long-Term Memory
BloombergGPT: A Large Language Model for Finance
A Systematic Evaluation of Large Language Models of Code
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Robot Task Planning and Situation Handling in Open Worlds
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
Emergent Abilities of Large Language Models
Can Large Language Models design a Robot?
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding
DarkBERT: A Language Model for the Dark Side of the Internet
Measuring Faithfulness in Chain-of-Thought Reasoning
Retentive Network: A Successor to Transformer for Large Language Models
Dissociating language and thought in large language models: a cognitive perspective
Large Language Models are Better Reasoners with Self-Verification
Can large language models reason about medical questions?
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
ARB: Advanced Reasoning Benchmark for Large Language Models
Rethinking with Retrieval: Faithful Large Language Model Inference
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Large Language Models as Corporate Lobbyists
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Talking About Large Language Models
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Large Language Models Can Be Easily Distracted by Irrelevant Context
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
OpenICL: An Open-Source Framework for In-context Learning
Emergence of Maps in the Memories of Blind Navigation Agents
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Learning to Reason and Memorize with Self-Notes
ChemCrow: Augmenting large-language models with chemistry tools
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Learning to Compress Prompts with Gist Tokens
Unlimiformer: Long-Range Transformers with Unlimited Length Input
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
ChatGPT: Applications, Opportunities, and Threats
Memory Augmented Large Language Models are Computationally Universal
PaLM-E: An Embodied Multimodal Language Model
M2T: Masking Transformers Twice for Faster Decoding
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Auditing large language models: a three-layered approach
Language models in molecular discovery
Offsite-Tuning: Transfer Learning without Full Model
MusicLM: Generating Music From Text
Context-faithful Prompting for Large Language Models
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models
GPTutor: a ChatGPT-powered programming tool for code explanation
Larger language models do in-context learning differently
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Multimodal Chain-of-Thought Reasoning in Language Models
Recitation-Augmented Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Eight Things to Know about Large Language Models
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
A Survey on Model Compression for Large Language Models
Active Retrieval Augmented Generation
Toolformer: Language Models Can Teach Themselves to Use Tools
Evaluating Verifiability in Generative Search Engines
Augmented Language Models: a Survey
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness
Giraffe: Adventures in Expanding Context Lengths in LLMs
LLM As DBA
Scaling Transformer to 1M tokens and beyond with RMT
TidyBot: Personalized Robot Assistance with Large Language Models
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
Active Prompting with Chain-of-Thought for Large Language Models
A Categorical Archive of ChatGPT Failures
Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity
Better Language Models of Code through Self-Improvement
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
The Capacity for Moral Self-Correction in Large Language Models
Poisoning Language Models During Instruction Tuning
Prompt2Model: Generating Deployable Models from Natural Language Instructions
Data Selection for Language Models via Importance Resampling
Enabling Conversational Interaction with Mobile UI using Large Language Models
Evidence of Meaning in Language Models Trained on Programs
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Symbol tuning improves in-context learning in language models
REPLUG: Retrieval-Augmented Black-Box Language Models
Why do Nearest Neighbor Language Models Work?
Prismer: A Vision-Language Model with An Ensemble of Experts
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
CALYPSO: LLMs as Dungeon Masters' Assistants
Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice
Code Llama: Open Foundation Models for Code
Ground Manipulator Primitive Tasks to Executable Actions using Large Language Models
Faithful to Whom? Questioning Interpretability Measures in NLP
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts
How Good Are Large Language Models at Out-of-Distribution Detection?
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Can Large Language Models Find And Fix Vulnerable Software?
Large Language Models for Software Engineering: A Systematic Literature Review
Informed Named Entity Recognition Decoding for Generative Language Models
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models
Better Zero-Shot Reasoning with Role-Play Prompting
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis
A Survey on Large Language Model based Autonomous Agents
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Evaluating ChatGPT and GPT-4 for Visual Programming
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Cabrita: closing the gap for foreign languages
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems
ProAgent: Building Proactive Cooperative AI with Large Language Models
Instruction Position Matters in Sequence Generation with Large Language Models
Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value Extraction
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Large Language Model as Autonomous Decision Maker
Large Language Models as Superpositions of Cultural Perspectives
Activation Addition: Steering Language Models Without Optimization
Enhancing Recommender Systems with Large Language Model Reasoning Graphs
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
An Empirical Study on Challenging Math Problem Solving with GPT-4
Forward-Backward Reasoning in Large Language Models for Verification
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI
Dynamic Planning with a LLM
"Guinea Pig Trials" Utilizing GPT: A Novel Smart Agent-Based Modeling Approach for Studying Firm Competition and Collusion
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models
Bridging the Gap: Deciphering Tabular Data Using Large Language Model
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Prompting Is Programming: A Query Language for Large Language Models
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Knowledge Graph Prompting for Multi-Document Question Answering
GPT detectors are biased against non-native English writers
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Can Language Models Learn to Listen?
Detecting The Corruption Of Online Questionnaires By Artificial Intelligence
Towards an Understanding of Large Language Models in Software Engineering Tasks
YaRN: Efficient Context Window Extension of Large Language Models
An Examination of the Compositionality of Large Generative Vision-Language Models
Company Similarity using Large Language Models
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs
Instruction Tuning for Large Language Models: A Survey
Language to Rewards for Robotic Skill Synthesis
Is There Any Social Principle for LLM-Based Agents?
A Study on Robustness and Reliability of Large Language Model Code Generation
Leveraging Large Language Models for Pre-trained Recommender Systems
Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models
LLaSM: Large Language and Speech Model
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue
FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Pretraining on the Test Set Is All You Need
The AI Revolution in Education: Will AI Replace or Assist Teachers in Higher Education?
Reinforced Self-Training (ReST) for Language Modeling
Fast Inference from Transformers via Speculative Decoding
LoRA: Low-Rank Adaptation of Large Language Models
Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models
AI Deception: A Survey of Examples, Risks, and Potential Solutions
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Blockwise Parallel Decoding for Deep Autoregressive Models
Assigning AI: Seven Approaches for Students, with Prompts
Conformal Prediction with Large Language Models for Multi-Choice Question Answering
Attention: Marginal Probability is All You Need?
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
XGen-7B Technical Report
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Can Programming Languages Boost Each Other via Instruction Tuning?
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Efficient RLHF: Reducing the Memory Usage of PPO
Universal Self-adaptive Prompting
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
One Wide Feedforward is All You Need
Better Zero-Shot Reasoning with Self-Adaptive Prompting
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
SoTaNa: The Open-Source Software Development Assistant
GPT Can Solve Mathematical Problems Without a Calculator
Physically Grounded Vision-Language Models for Robotic Manipulation
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
FLM-101B: An Open LLM and How to Train It with $100K Budget
LaMDA: Language Models for Dialog Applications
LMDX: Language Model-based Document Information Extraction and Localization
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Do Multilingual Language Models Think Better in English?
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Textbooks Are All You Need II: phi-1.5 technical report
Replacing softmax with ReLU in Vision Transformers
Investigating Answerability of LLMs for Long-Form Question Answering
Vector Search with OpenAI Embeddings: Lucene Is All You Need
The Rise and Potential of Large Language Model Based Agents: A Survey
Cure the headache of Transformers via Collinear Constrained Attention
Uncovering mesa-optimization algorithms in Transformers
Large Language Models for Compiler Optimization
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Chain-of-Verification Reduces Hallucination in Large Language Models
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Compositional Foundation Models for Hierarchical Planning
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
Sparse Autoencoders Find Highly Interpretable Features in Language Models
DreamLLM: Synergistic Multimodal Comprehension and Creation
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Improving Language Models with Advantage-based Offline Policy Gradients
Improving Factuality and Reasoning in Language Models through Multiagent Debate
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Boolformer: Symbolic Regression of Logic Functions with Transformers
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
TP-Aware Dequantization
LASER: LLM Agent with State-Space Exploration for Web Navigation
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Baichuan 2: Open Large-scale Language Models
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Efficient Benchmarking (of Language Models)
Context is Environment
Analyzing Transformer Dynamics as Movement through Embedding Space
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
RMT: Retentive Networks Meet Vision Transformers
Stack-and-Delay: a new codebook pattern for music generation
Neurons in Large Language Models: Dead, N-gram, Positional
Large Language Model for Science: A Study on P vs. NP
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Data Augmentation for Spoken Language Understanding via Pretrained Language Models
Petals: Collaborative Inference and Fine-tuning of Large Models
Scaling Laws for Sparsely-Connected Foundation Models
Kosmos-2.5: A Multimodal Literate Model
PDFTriage: Question Answering over Long, Structured Documents
Statistical Rejection Sampling Improves Preference Optimization
Stabilizing RLHF through Advantage Model and Selective Rehearsal
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Leveraging Contextual Information for Effective Entity Salience Detection
NExT-GPT: Any-to-Any Multimodal LLM
Are Emergent Abilities in Large Language Models just In-Context Learning?
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Large-Scale Automatic Audiobook Creation
Recovering from Privacy-Preserving Masking with Large Language Models
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
RAIN: Your Language Models Can Align Themselves without Finetuning
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Hypothesis Search: Inductive Reasoning with Language Models
Agents: An Open-source Framework for Autonomous Language Agents
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Gated recurrent neural networks discover attention
Contrastive Decoding Improves Reasoning in Large Language Models
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Adapting Large Language Models via Reading Comprehension
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
MindAgent: Emergent Gaming Interaction
Graph Neural Prompting with Large Language Models
Sparks of Artificial General Intelligence: Early experiments with GPT-4
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Efficient Post-training Quantization with FP8 Formats
Taken out of context: On measuring situational awareness in LLMs
Jointly Training Large Autoregressive Multimodal Models
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Curriculum Learning with Adam: The Devil Is in the Wrong Details
OWL: A Large Language Model for IT Operations
Faith and Fate: Limits of Transformers on Compositionality
CodePlan: Repository-level Coding using LLMs and Planning
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Efficient Memory Management for Large Language Model Serving with PagedAttention
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
SCREWS: A Modular Framework for Reasoning with Revisions
Transformer models: an introduction and catalog
Small-scale proxies for large-scale Transformer training instabilities
Effective Long-Context Scaling of Foundation Models
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Qwen Technical Report
Attention Approximates Sparse Distributed Memory
Calibrating LLM-Based Evaluator
Ambiguity-Aware In-Context Learning with Large Language Models
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Vision Transformers Need Registers
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Language Modeling Is Compression
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
Aligning Large Multimodal Models with Factually Augmented RLHF
Large Language Models as Optimizers
SlimPajama-DC: Understanding Data Combinations for LLM Training
Finite Scalar Quantization: VQ-VAE Made Simple
Physics of Language Models: Part 3.2, Knowledge Manipulation
Efficient Streaming Language Models with Attention Sinks
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
LLM-grounded Video Diffusion Models
Enable Language Models to Implicitly Learn Self-Improvement From Data
Emergent Analogical Reasoning in Large Language Models
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Large Language Models Cannot Self-Correct Reasoning Yet
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Language Models Represent Space and Time
Retrieval meets Long Context Large Language Models
Borges and AI
Can large language models provide useful feedback on research papers? A large-scale empirical analysis
Ring Attention with Blockwise Transformers for Near-Infinite Context
Can Language Models be Instructed to Protect Personal Information?
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Who's Harry Potter? Approximate Unlearning in LLMs
Low-Resource Languages Jailbreak GPT-4
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
EcoAssistant: Using LLM Assistant More Affordably and Accurately
How FaR Are Large Language Models From Agents with Theory-of-Mind?
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
HeaP: Hierarchical Policies for Web Actions using LLMs
A Long Way to Go: Investigating Length Correlations in RLHF
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Think before you speak: Training Language Models With Pause Tokens
Mistral 7B
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation
Large Language Models can Learn Rules
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Large Language Models Are Zero-Shot Time Series Forecasters
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Learning Interactive Real-World Simulators
FireAct: Toward Language Agent Fine-tuning
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Text Embeddings Reveal (Almost) As Much As Text
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Lemur: Harmonizing Natural Language and Code for Language Agents
LangNav: Language as a Perceptual Representation for Navigation
The LAMBADA dataset: Word prediction requiring a broad discourse context
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Toward Joint Language Modeling for Speech Units and Text
MemGPT: Towards LLMs as Operating Systems
A Zero-Shot Language Agent for Computer Control with Structured Reflection
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
The Consensus Game: Language Model Generation via Equilibrium Search
Table-GPT: Table-tuned GPT for Diverse Table Tasks
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation
Deep Learning Scaling is Predictable, Empirically
MLQA: Evaluating Cross-lingual Extractive Question Answering
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Intersectional Bias in Hate Speech and Abusive Language Datasets
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning
AI Ethics Issues in Real World: Evidence from AI Incident Database
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
Measuring Mathematical Problem Solving With the MATH Dataset
Can Machines Learn Morality? The Delphi Experiment
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts
AndroidEnv: A Reinforcement Learning Platform for Android
Demoting Racial Bias in Hate Speech Detection
Social Bias Frames: Reasoning about Social and Power Implications of Language
Characterising Bias in Compressed Models
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Towards Robust Toxic Content Classification
The Challenge of Value Alignment: from Fairer Algorithms to AI Safety
Towards Continual Knowledge Learning of Language Models
The Pushshift Reddit Dataset
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus
One Epoch Is All You Need
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
NewsQA: A Machine Comprehension Dataset
AmbiPun: Generating Humorous Puns with Ambiguous Context
Deal or No Deal? End-to-End Learning for Negotiation Dialogues
Competition-Level Code Generation with AlphaCode
STaR: Bootstrapping Reasoning With Reasoning
Efficient Neural Architecture Search via Parameter Sharing
Recursively Summarizing Books with Human Feedback
Habitat: A Platform for Embodied AI Research
Generate & Rank: A Multi-task Framework for Math Word Problems
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Mitigating Statistical Bias within Differentially Private Synthetic Data
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
TruthfulQA: Measuring How Models Mimic Human Falsehoods
An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Controlling Style in Generated Dialogue
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Societal Biases in Language Generation: Progress and Challenges
Counterfactual Fairness in Text Classification through Robustness
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Deep Double Descent: Where Bigger Models and More Data Hurt
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations
InCoder: A Generative Model for Code Infilling and Synthesis
Back to the Future: On Potential Histories in NLP
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Sharp Minima Can Generalize For Deep Nets
Self-attention Does Not Need $O(n^2)$ Memory
Measuring the Carbon Intensity of AI in Cloud Instances
SocialIQA: Commonsense Reasoning about Social Interactions
Generating Long Sequences with Sparse Transformers
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
QAmeleon: Multilingual QA with Only 5 Examples
CTRL: A Conditional Transformer Language Model for Controllable Generation
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Is neural language acquisition similar to natural? A chronological probing study
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Bag of Tricks for Efficient Text Classification
Automatic Detection of Machine Generated Text: A Critical Survey
Adversarial Training for Large Neural Language Models
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
TALM: Tool Augmented Language Models
Training Language Models with Language Feedback
Toxicity in Multilingual Machine Translation at Scale
PEER: A Collaborative Language Model
On the Multilingual Capabilities of Very Large-Scale English Language Models
LLaMA: Open and Efficient Foundation Language Models
SECure: A Social and Environmental Certificate for AI Systems
Gaussian Error Linear Units (GELUs)
RoFormer: Enhanced Transformer with Rotary Position Embedding
Measuring Massive Multitask Language Understanding
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making
Leveraging QA Datasets to Improve Generative Data Augmentation
Decoupled Weight Decay Regularization
A Distributional Approach to Controlled Text Generation
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
The Turking Test: Can Language Models Understand Instructions?
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Language Models (Mostly) Know What They Know
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Towards Understanding and Mitigating Social Biases in Language Models
Discovering and Categorising Language Biases in Reddit
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation
Training Verifiers to Solve Math Word Problems
The Curse of Recursion: Training on Generated Data Makes Models Forget
Compositional Semantic Parsing with Large Language Models
Transforming Question Answering Datasets Into Natural Language Inference Datasets
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
The Values Encoded in Machine Learning Research
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems
Ethical and social risks of harm from Language Models
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Understanding HTML with Large Language Models
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
AudioLM: a Language Modeling Approach to Audio Generation
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Behavior Cloned Transformers are Neurosymbolic Reasoners
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Thou shalt not hate: Countering Online Hate Speech
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Participation is not a Design Fix for Machine Learning
Retrieval Augmentation Reduces Hallucination in Conversation
Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize
How Many Data Samples is an Additional Instruction Worth?
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Crosslingual Generalization through Multitask Finetuning
The Curious Case of Neural Text Degeneration
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
Evaluating the Social Impact of Generative AI Systems in Systems and Society
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
Towards A Rigorous Science of Interpretable Machine Learning
An Analysis of the Automatic Bug Fixing Performance of ChatGPT
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender
Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models
Defending Against Neural Fake News
Analyzing Dynamic Adversarial Training Data in the Limit
Criticality in Formal Languages and Statistical Physics
Generating Wikipedia by Summarizing Long Sequences
Gender Bias in Contextualized Word Embeddings
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Deep Generative Dual Memory Network for Continual Learning
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Persistent Anti-Muslim Bias in Large Language Models
Mirages: On Anthropomorphism in Dialogue Systems
Deep Learning for Symbolic Mathematics
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
A Survey On Universal Adversarial Attack
Atlas: Few-shot Learning with Retrieval Augmented Language Models
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
A framework for the extraction of Deep Neural Networks by leveraging public data
Recipes for building an open-domain chatbot
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Measuring the Effects of Data Parallelism on Neural Network Training
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
X-SQL: reinforce schema representation with context
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
FastText.zip: Compressing text classification models
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
A General Language Assistant as a Laboratory for Alignment
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Deep Learning Based Text Classification: A Comprehensive Review
Automated Hate Speech Detection and the Problem of Offensive Language
Multi-Dimensional Gender Bias Classification
Extracting Training Data from Large Language Models
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
FlowQA: Grasping Flow in History for Conversational Machine Comprehension
Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey
Improving alignment of dialogue agents via targeted human judgements
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing
Explanation in Artificial Intelligence: Insights from the Social Sciences
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Revealing Persona Biases in Dialogue Systems
GeDi: Generative Discriminator Guided Sequence Generation
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
UL2: Unifying Language Learning Paradigms
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Does Gender Matter? Towards Fairness in Dialogue Systems
Energy and Policy Considerations for Deep Learning in NLP
The False Promise of Imitating Proprietary LLMs
Directional Bias Amplification
Hierarchical Text-Conditional Image Generation with CLIP Latents
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
Task-aware Retrieval with Instructions
Do Prompt-Based Models Really Understand the Meaning of their Prompts?
Reading Wikipedia to Answer Open-Domain Questions
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Teaching language models to support answers with verified quotes
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
MasakhaNER: Named Entity Recognition for African Languages
Predicting the Type and Target of Offensive Posts in Social Media
Learning to Model Editing Processes
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles
Quantifying the Carbon Emissions of Machine Learning
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Chasing Carbon: The Elusive Environmental Footprint of Computing
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion
Distilling Reasoning Capabilities into Smaller Language Models
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
WebGPT: Browser-assisted question-answering with human feedback
Making Large Language Models Better Reasoners with Step-Aware Verifier
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
SGPT: GPT Sentence Embeddings for Semantic Search
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models
Building a Conversational Agent Overnight with Dialogue Self-Play
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
Neural Machine Translation of Rare Words with Subword Units
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Know What You Don't Know: Unanswerable Questions for SQuAD
Longformer: The Long-Document Transformer
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
A Constructive Prediction of the Generalization Error Across Scales
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
KERMIT: Generative Insertion-Based Modeling for Sequences
mGPT: Few-Shot Learners Go Multilingual
The Natural Language Decathlon: Multitask Learning as Question Answering
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
A Survey of Race, Racism, and Anti-Racism in NLP
Unraveling the Hidden Environmental Impacts of AI Solutions for Environment
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Hyperbolic Image-Text Representations
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Pretraining Language Models with Human Preferences
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English
MTEB: Massive Text Embedding Benchmark
Interscript: A dataset for interactive learning of scripts through error feedback
Looped Transformers as Programmable Computers
Inner Monologue: Embodied Reasoning through Planning with Language Models
No Language Left Behind: Scaling Human-Centered Machine Translation
Collaborative Storytelling with Large-scale Neural Language Models
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Recipes for Safety in Open-domain Chatbots
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Pre-Trained Language Models for Interactive Decision-Making
Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Formal Algorithms for Transformers
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
HellaSwag: Can a Machine Really Finish Your Sentence?
Teaching Language Models to Self-Improve through Interactive Demonstrations
Ranking LLM-Generated Loop Invariants for Program Verification
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets
When can transformers reason with abstract symbols?
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Language Models are Few-shot Multilingual Learners
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
AutoMix: Automatically Mixing Language Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Pre-trained Summarization Distillation
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Improving Large Language Model Fine-tuning for Solving Math Problems
Language Models are General-Purpose Interfaces
Llemma: An Open Language Model For Mathematics
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Gender Bias in Machine Translation
Towards a Human-like Open-Domain Chatbot
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
A Network-based End-to-End Trainable Task-oriented Dialogue System
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Cloze-driven Pretraining of Self-attention Networks
Universal Language Model Fine-tuning for Text Classification
OPT: Open Pre-trained Transformer Language Models
Towards Zero-Label Language Learning
GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models
Fine-tuned Language Models are Continual Learners
3D-GPT: Procedural 3D Modeling with Large Language Models
PAL: Program-aided Language Models
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Large Language Models for Software Engineering: Survey and Open Problems
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
Self-critiquing models for assisting human evaluators
Towards Understanding Sycophancy in Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Finetuned Language Models Are Zero-Shot Learners
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Generating Sequences by Learning to Self-Correct
The Depth-to-Width Interplay in Self-Attention
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning
Internet-augmented language models through few-shot prompting for open-domain question answering
GLM-130B: An Open Bilingual Pre-trained Model
Three scenarios for continual learning
Eureka: Human-Level Reward Design via Coding Large Language Models
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
An Explanation of In-context Learning as Implicit Bayesian Inference
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Snapshot Ensembles: Train 1, get M for free
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
On the Planning Abilities of Large Language Models -- A Critical Investigation
Efficient Estimation of Word Representations in Vector Space
Visualizing the Loss Landscape of Neural Nets
Contrastive Preference Learning: Learning from Human Feedback without RL
High-Resolution Image Synthesis with Latent Diffusion Models
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
H2O Open Ecosystem for State-of-the-art Large Language Models
Calibrate Before Use: Improving Few-Shot Performance of Language Models
All-in-One Image-Grounded Conversational Agents
Interactive Task Planning with Language Models
Can AI-Generated Text be Reliably Detected?
BitNet: Scaling 1-bit Transformers for Large Language Models
Scaling Laws for Neural Language Models
Self-Refine: Iterative Refinement with Self-Feedback
Adversarial Environment Generation for Learning to Navigate the Web
Cross-Lingual Language Model Meta-Pretraining
Creative Robot Tool Use with Large Language Models
Simple and Effective Multi-Paragraph Reading Comprehension
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
VeRA: Vector-based Random Matrix Adaptation
Open-Ended Learning Leads to Generally Capable Agents
Exploring the Boundaries of GPT-4 in Radiology
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Eliciting Human Preferences with Language Models
One-Shot Learning from a Demonstration with Hierarchical Latent Language
OpenAgents: An Open Platform for Language Agents in the Wild
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Specific versus General Principles for Constitutional AI
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Task2Vec: Task Embedding for Meta-Learning
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Tuna: Instruction Tuning using Feedback from Large Language Models
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Transcending Scaling Laws with 0.1% Extra Compute
InstructExcel: A Benchmark for Natural Language Instruction in Excel
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Understanding Retrieval Augmentation for Long-Form Question Answering
A Neural Conversational Model
Exploring the Limits of Language Modeling
Scaling Instruction-Finetuned Language Models
Learning Performance-Improving Code Edits
Training Compute-Optimal Large Language Models
Instruction Tuning with GPT-4
Holistic Evaluation of Language Models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Large Language Models as Analogical Reasoners
Negative Training for Neural Dialogue Response Generation
On the Opportunities and Risks of Foundation Models
Dissecting In-Context Learning of Translations in GPTs
Carbon Emissions and Large Neural Network Training
Faithful Reasoning Using Large Language Models
Detecting Pretraining Data from Large Language Models
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Unified Language Model Pre-training for Natural Language Understanding and Generation
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Predictability and Surprise in Large Generative Models
Alignment of Language Agents
Zephyr: Direct Distillation of LM Alignment
Binding Language Models in Symbolic Languages
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
The Evolved Transformer
Detecting Hate Speech with GPT-3
Learning to summarize from human feedback
Efficient Large Scale Language Modeling with Mixtures of Experts
Jailbreaking Black Box Large Language Models in Twenty Queries
How do Language Models Bind Entities in Context?
Program Synthesis with Large Language Models
Challenges in Detoxifying Language Models
A Deep Reinforced Model for Abstractive Summarization
Moral Foundations of Large Language Models
Training Production Language Models without Memorizing User Data
A Deep Reinforcement Learning Chatbot
RT-1: Robotics Transformer for Real-World Control at Scale
Entity Tracking in Language Models
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Controlled Decoding from Language Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
FP8-LM: Training FP8 Large Language Models
The Perils & Promises of Fact-checking with Large Language Models
Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)?
Unsolved Problems in ML Safety
Woodpecker: Hallucination Correction for Multimodal Large Language Models
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Data-Centric Financial Large Language Models
CodeFusion: A Pre-trained Diffusion Model for Code Generation
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Personas as a Way to Model Truthfulness in Language Models
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
CLEX: Continuous Length Extrapolation for Large Language Models
ALCUNA: Large Language Models Meet New Knowledge
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Large Language Models as Generalizable Policies for Embodied Tasks
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Linear Representations of Sentiment in Large Language Models
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
MM-VID: Advancing Video Understanding with GPT-4V(ision)
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
ChipNeMo: Domain-Adapted LLMs for Chip Design
What's In My Big Data?
Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve
Idempotent Generative Network
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
NEFTune: Noisy Embeddings Improve Instruction Finetuning
The Impact of Depth and Width on Transformer Language Model Generalization
FlashDecoding++: Faster Large Language Model Inference on GPUs
Skywork: A More Open Bilingual Foundation Model
GRIM: GRaph-based Interactive narrative visualization for gaMes
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Does GPT-4 Pass the Turing Test?
Text Rendering Strategies for Pixel Language Models
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Learning From Mistakes Makes LLM Better Reasoner
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation
Ultra-Long Sequence Distributed Transformer
Ziya2: Data-centric Learning is All LLMs Need
GLaMM: Pixel Grounding Large Multimodal Model
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Unveiling Safety Vulnerabilities of Large Language Models
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Levels of AGI: Operationalizing Progress on the Path to AGI
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
CogVLM: Visual Expert for Pretrained Language Models
Tailoring Self-Rationalizers with Multi-Reward Distillation
NExT-Chat: An LMM for Chat, Detection and Segmentation
The Efficiency Misnomer
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Training Dynamics of Contextual N-Grams in Language Models
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Gzip versus bag-of-words for text classification
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
GPT4All: An Ecosystem of Open Source Compressed Language Models
Evaluating Large Language Models: A Comprehensive Survey
Leveraging Large Language Models for Automated Proof Synthesis in Rust
GPTScore: Evaluate as You Desire
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Simple and Controllable Music Generation
Can LLMs Follow Simple Rules?
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Memory Augmented Language Models through Mixture of Word Experts
Language Models can be Logical Solvers
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
ADaPT: As-Needed Decomposition and Planning with Language Models
FinGPT: Large Generative Models for a Small Language
Simplifying Transformer Blocks
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
Prompt Engineering a Prompt Engineer
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
Accelerating Large Language Model Decoding with Speculative Sampling
Alternating Updates for Efficient Transformers
White-Box Transformers via Sparse Rate Reduction
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
LayoutPrompter: Awaken the Design Ability of Large Language Models
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Trusted Source Alignment in Large Language Models
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
The ART of LLM Refinement: Ask, Refine, and Trust
Fine-tuning Language Models for Factuality
A Survey on Language Models for Code
DiLoCo: Distributed Low-Communication Training of Language Models
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Fusion-Eval: Integrating Evaluators with LLMs
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
SiRA: Sparse Mixture of Low Rank Adaptation
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Contrastive Chain-of-Thought Prompting
Learning to Filter Context for Retrieval-Augmented Generation
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
System 2 Attention (is something you might need too)
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Language Models are Multilingual Chain-of-Thought Reasoners
ProAgent: From Robotic Process Automation to Agentic Process Automation
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Exponentially Faster Language Modelling
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Testing Language Model Agents Safely in the Wild
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Orca 2: Teaching Small Language Models How to Reason
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
SelfEval: Leveraging the discriminative nature of generative models for evaluation
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Transformer Memory as a Differentiable Search Index
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
DeiT III: Revenge of the ViT
Scaling Vision Transformers to 22 Billion Parameters
On Calibration of Modern Neural Networks
A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Attention Is All You Need
Acceleration via Fractal Learning Rate Schedules
Transformers learn in-context by gradient descent
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Toy Models of Superposition
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Unified Scaling Laws for Routed Language Models
CLIPPO: Image-and-Language Understanding from Pixels Only
Task-Specific Skill Localization in Fine-tuned Language Models
Discovering Latent Knowledge in Language Models Without Supervision
OCR-free Document Understanding Transformer
Language Models are Few-Shot Learners
Progress measures for grokking via mechanistic interpretability
Learning Transferable Visual Models From Natural Language Supervision
Zero-Shot Text-to-Image Generation
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems
Language Models as Agent Models
Learning Models of Individual Behavior in Chess
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
Ask Me Anything: A simple strategy for prompting language models
Training language models to follow instructions with human feedback
Sequence to Sequence Learning with Neural Networks
SegGPT: Segmenting Everything In Context
A data-driven approach for learning to control computers
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Unifying Vision, Text, and Layout for Universal Document Processing
Memorizing Transformers
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
A Succinct Summary of Reinforcement Learning
Symbolic Discovery of Optimization Algorithms
Confronting Reward Model Overoptimization with Constrained RLHF
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
A Cookbook of Self-Supervised Learning
Training Language Models with Language Feedback at Scale
Answering Questions by Meta-Reasoning over Multiple Chains of Thought
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Adversarial Examples for Evaluating Reading Comprehension Systems
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
ImageBind: One Embedding Space To Bind Them All
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Scaling Data-Constrained Language Models
Efficient LLM Inference on CPUs
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Efficiently Scaling Transformer Inference
One Model To Learn Them All
Brain decoding: toward real-time reconstruction of visual perception
GLU Variants Improve Transformer
HyperNetworks
InRank: Incremental Low-Rank Learning
Text-to-Image Diffusion Models are Zero-Shot Classifiers
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
MAGVLT: Masked Generative Vision-and-Language Transformer
DINOv2: Learning Robust Visual Features without Supervision
What learning algorithm is in-context learning? Investigations with linear models
Any-to-Any Generation via Composable Diffusion
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Shortformer: Better Language Modeling using Shorter Inputs
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
PaLI: A Jointly-Scaled Multilingual Language-Image Model
The alignment problem from a deep learning perspective
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Multimodal Analogical Reasoning over Knowledge Graphs
Segment Everything Everywhere All at Once
DocPrompting: Generating Code by Retrieving the Docs
Emergent Tool Use From Multi-Agent Autocurricula
Root Mean Square Layer Normalization
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Efficient Training of Language Models to Fill in the Middle
AI for Mathematics: A Cognitive Science Perspective
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
The First Room-Temperature Ambient-Pressure Superconductor
Segment Anything
Less is More: Parameter-Free Text Classification with Gzip
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
A Generalist Agent
Meet in the Middle: A New Pre-training Paradigm
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Can Humans Do Less-Than-One-Shot Learning?
Diffusion-LM Improves Controllable Text Generation
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Text-to-3D using Gaussian Splatting
Precise Zero-Shot Dense Retrieval without Relevance Labels
Brainformers: Trading Simplicity for Efficiency
DETRs Beat YOLOs on Real-time Object Detection
OtterHD: A High-Resolution Multi-modality Model
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
ConvNets Match Vision Transformers at Scale
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models
Scaling Robot Learning with Semantically Imagined Experience
Do LLMs exhibit human-like response biases? A case study in survey design
READ: Recurrent Adaptation of Large Transformers
Benchmarking Neural Network Training Algorithms
Automatic Gradient Descent: Deep Learning without Hyperparameters
Layer Normalization
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Implicit Representations of Meaning in Neural Language Models
Calibrated Chaos: Variance Between Runs of Neural Network Training is Harmless and Inevitable
SqueezeLLM: Dense-and-Sparse Quantization
Optimisation & Generalisation in Networks of Neurons
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Transformers as Recognizers of Formal Languages: A Survey on Expressivity
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Decoupled Context Processing for Context Augmented Language Modeling
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
The Transient Nature of Emergent In-Context Learning in Transformers
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Matryoshka Diffusion Models
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Beyond neural scaling laws: beating power law scaling via data pruning
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Going Deeper with Convolutions
TimeGPT-1
Capabilities of GPT-4 on Medical Challenge Problems
Training Large Language Models Efficiently with Sparsity and Dataflow
Optimal Policies Tend to Seek Power
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
Thinking Like Transformers
Why think step by step? Reasoning emerges from the locality of experience
Mixture-of-Experts with Expert Choice Routing
GPT-4 Technical Report
Scaling Expert Language Models with Unsupervised Domain Discovery
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Mass-Editing Memory in a Transformer
Erasing Concepts from Diffusion Models
Physics of Language Models: Part 1, Context-Free Grammar
Flamingo: a Visual Language Model for Few-Shot Learning
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Semantic Tokenizer for Enhanced Natural Language Processing
A Survey of Large Language Models
Affordances from Human Videos as a Versatile Representation for Robotics
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Conditioning Predictive Models: Risks and Strategies
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
Vision Transformers with Mixed-Resolution Tokenization
Implicit Chain of Thought Reasoning via Knowledge Distillation
Scaling Laws for Transfer
Risks from Learned Optimization in Advanced Machine Learning Systems
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Bayesian Optimization of Catalysts With In-context Learning
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
Knowledge Graphs
Language Modelling with Pixels
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Retrofitting Word Vectors to Semantic Lexicons
CoLT5: Faster Long-Range Transformers with Conditional Computation
Deep contextualized word representations
Boosted Prompt Ensembles for Large Language Models
Recurrent Memory Transformer
Multitask Prompted Training Enables Zero-Shot Task Generalization
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
On the Turing Completeness of Modern Neural Network Architectures
Generalized Out-of-Distribution Detection: A Survey
AugGPT: Leveraging ChatGPT for Text Data Augmentation
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Human-Timescale Adaptation in an Open-Ended Task Space
Sigmoid Loss for Language Image Pre-Training
OpenScene: 3D Scene Understanding with Open Vocabularies
Nougat: Neural Optical Understanding for Academic Documents
SoundStorm: Efficient Parallel Audio Generation
Text and Code Embeddings by Contrastive Pre-Training
Fine-Tuning Language Models from Human Preferences
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
Effective Theory of Transformers at Initialization
ST-MoE: Designing Stable and Transferable Sparse Expert Models
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Natural Selection Favors AIs over Humans
ART: Automatic multi-step reasoning and tool-use for large language models
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Visual Instruction Tuning
Efficiently Modeling Long Sequences with Structured State Spaces
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Mastering Diverse Domains through World Models
Simplified State Space Layers for Sequence Modeling
Offline RL for Natural Language Generation with Implicit Language Q Learning
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Self-supervised Learning: Generative or Contrastive
Towards Automated Circuit Discovery for Mechanistic Interpretability
Neural Story Planning
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Dota 2 with Large Scale Deep Reinforcement Learning
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
The Matrix Calculus You Need For Deep Learning
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models
DeepNet: Scaling Transformers to 1,000 Layers
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
LLMs cannot find reasoning errors, but can correct them!
Pretraining Without Attention
Large language models are not zero-shot communicators
Semi-supervised Sequence Learning
Improving language models by retrieving from trillions of tokens
Synthetic Data from Diffusion Models Improves ImageNet Classification
Level Generation Through Large Language Models
How Does Generative Retrieval Scale to Millions of Passages?
State Spaces Aren't Enough: Machine Translation Needs Attention
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Evaluating Large Language Models Trained on Code
Injecting structural hints: Using language models to study inductive biases in language learning
The case for 4-bit precision: k-bit Inference Scaling Laws
Downstream Datasets Make Surprisingly Good Pretraining Corpora
ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark
Fast Transformer Decoding: One Write-Head is All You Need
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities
Towards Deep Learning Models Resistant to Adversarial Attacks
A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Large Language Models as General Pattern Machines
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Fast and forward stable randomized algorithms for linear least-squares problems
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Twist Decoding: Diverse Generators Guide Each Other
Monolith: Real Time Recommendation System With Collisionless Embedding Table
On-Device Training Under 256KB Memory
Meta-Learning in Neural Networks: A Survey
The Linear Representation Hypothesis and the Geometry of Large Language Models
The Power of Scale for Parameter-Efficient Prompt Tuning
LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Spreading vectors for similarity search
REFINER: Reasoning Feedback on Intermediate Representations
Low-code LLM: Visual Programming over LLMs
Decoding speech perception from non-invasive brain recordings
Towards Agile Text Classifiers for Everyone
Cramming: Training a Language Model on a Single GPU in One Day
Text-to-Table: A New Way of Information Extraction
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
ViperGPT: Visual Inference via Python Execution for Reasoning
Spatial-Language Attention Policies for Efficient Robot Learning
Improved Baselines with Visual Instruction Tuning
Decision Transformer: Reinforcement Learning via Sequence Modeling
What Algorithms can Transformers Learn? A Study in Length Generalization
Tracking Everything Everywhere All at Once
Bad Global Minima Exist and SGD Can Reach Them
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Fine-Tuning LLaMA for Multi-Stage Text Retrieval
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Optimizing Memory Mapping Using Deep Reinforcement Learning
A General Theoretical Paradigm to Understand Learning from Human Preferences
Beyond Words: A Comprehensive Survey of Sentence Representations
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Adding Gradient Noise Improves Learning for Very Deep Networks
Positional Description Matters for Transformers Arithmetic
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Calibrated Language Models Must Hallucinate
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
Online Decision Transformer
Benchmarking Large Language Models for News Summarization
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Scalable Extraction of Training Data from (Production) Language Models
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Visual In-Context Prompting
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
GAIA: a benchmark for General AI Assistants
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text
Chain-of-Thought Reasoning is a Policy Improvement Operator
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Thinking Fast and Slow in Large Language Models
Towards Accurate Differential Diagnosis with Large Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Vanishing Gradients in Reinforcement Finetuning of Language Models
The History and Risks of Reinforcement Learning and Human Feedback
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Video Language Planning
Thread of Thought Unraveling Chaotic Contexts
PaSS: Parallel Speculative Sampling
SeaLLMs -- Large Language Models for Southeast Asia
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
An LLM Compiler for Parallel Function Calling
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Magicoder: Source Code Is All You Need
SILC: Improving Vision Language Pretraining with Self-Distillation
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
An Early Evaluation of GPT-4V(ision)
Farzi Data: Autoregressive Data Distillation
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Towards a Unified View of Parameter-Efficient Transfer Learning
Beyond Surface: Probing LLaMA Across Scales and Layers
TiC-CLIP: Continual Training of CLIP Models
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
GOAT: GO to Any Thing
Nash Learning from Human Feedback
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
Axiomatic Preference Modeling for Longform Question Answering
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
Efficient Monotonic Multihead Attention
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Are LLMs Useful in the Poorest Schools? theTeacherAI in Sierra Leone
De-Diffusion Makes Text a Strong Cross-Modal Interface
Dolphins: Multimodal Language Model for Driving
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
Efficient Transformer Knowledge Distillation: A Performance Review
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments
Instruction-tuning Aligns LLMs to the Human Brain
Large Language Model Alignment: A Survey
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Instruction-Following Evaluation for Large Language Models
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Pre-Training to Learn in Context
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Large Language Models for Mathematicians
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Language Model Inversion
Training Chain-of-Thought via Latent-Variable Inference
The Quantization Model of Neural Scaling
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
TinyGSM: achieving >80% on GSM8k with small language models
Context Tuning for Retrieval Augmented Generation
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
TigerBot: An Open Multilingual Multitask LLM
PromptBench: A Unified Library for Evaluation of Large Language Models
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Challenges with unsupervised LLM knowledge discovery
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Honeybee: Locality-enhanced Projector for Multimodal LLM
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
ProTIP: Progressive Tool Retrieval Improves Planning
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
SparQ Attention: Bandwidth-Efficient LLM Inference
Silkie: Preference Distillation for Large Visual Language Models
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Mathematical Language Models: A Survey
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Pixel Aligned Language Models
PathFinder: Guided Search over Multi-Step Reasoning Paths
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Vision-Language Models as a Source of Rewards
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Language-Informed Visual Concept Learning
Evaluation of Large Language Models for Decision Making in Autonomous Driving
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Extending Context Window of Large Language Models via Semantic Compression
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Large Language Models on Graphs: A Comprehensive Survey
Merlin:Empowering Multimodal LLMs with Foresight Minds
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming
Generating Illustrated Instructions
Alignment for Honesty
Paloma: A Benchmark for Evaluating Language Model Fit
Self-Evaluation Improves Selective Generation in Large Language Models
Nomic Embed: Training a Reproducible Long Context Text Embedder
Rejuvenating image-GPT as Strong Visual Representation Learners
Object Recognition as Next Token Prediction
Foundation Models in Robotics: Applications, Challenges, and the Future
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Data Management For Large Language Models: A Survey
Knowledge Distillation of Large Language Models
Faithful Persona-based Conversational Dataset Generation with Large Language Models
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Weight subcloning: direct initialization of transformers using larger pretrained ones
Segment and Caption Anything
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
OneLLM: One Framework to Align All Modalities with Language
Steering Llama 2 via Contrastive Activation Addition
VILA: On Pre-training for Visual Language Models
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions
HyperAttention: Long-context Attention in Near-Linear Time
LLM360: Towards Fully Transparent Open-Source LLMs
Efficient Transformers with Dynamic Token Pooling
GIVT: Generative Infinite-Vocabulary Transformers
Modeling Context in Referring Expressions
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Text-Conditioned Resampler For Long Form Video Understanding
Gemini: A Family of Highly Capable Multimodal Models
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Cascade Speculative Drafting for Even Faster LLM Inference
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
VideoPoet: A Large Language Model for Zero-Shot Video Generation
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
AppAgent: Multimodal Agents as Smartphone Users
Time is Encoded in the Weights of Finetuned Language Models
Generative Multimodal Models are In-Context Learners
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
An In-depth Look at Gemini's Language Abilities
Retrieval-Augmented Generation for Large Language Models: A Survey
Intriguing Properties of Quantization at Scale
Parrot Captions Teach CLIP to Spot Text
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
YAYI 2: Multilingual Open-Source Large Language Models
Reasons to Reject? Aligning Language Models with Judgments
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion
Exploiting Novel GPT-4 APIs
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models
LLM4VG: Large Language Models Evaluation for Video Grounding
Shai: A large language model for asset management
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Supervised Knowledge Makes Large Language Models Better In-context Learners
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases
The LLM Surgeon
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Task Contamination: Language Models May Not Be Few-Shot Anymore
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Learning Vision from Models Rivals Learning Vision from Data
TinyLlama: An Open-Source Small Language Model
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation
Making Large Language Models A Better Foundation For Dense Retrieval
LARP: Language-Agent Role Play for Open-World Games
A Survey of Reasoning with Foundation Models
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
Towards the Law of Capacity Gap in Distilling Language Models
At Which Training Stage Does Code Data Help LLMs Reasoning?
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
A Comprehensive Study of Knowledge Editing for Large Language Models
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Orion-14B: Open-source Multilingual Large Language Models
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
DocLLM: A layout-aware generative language model for multimodal document understanding
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
GeoGalactica: A Scientific Large Language Model in Geoscience
Improving Text Embeddings with Large Language Models
Boosting Large Language Model for Speech Synthesis: An Empirical Study
TrustLLM: Trustworthiness in Large Language Models
Unicron: Economizing Self-Healing LLM Training at Scale
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Proving Test Set Contamination in Black Box Language Models
LLaMA Pro: Progressive LLaMA with Block Expansion
LLM Augmented LLMs: Expanding Capabilities through Composition
LLaVA-$φ$: Efficient Multi-Modal Assistant with Small Language Model
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers
Understanding LLMs: A Comprehensive Overview from Training to Inference
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
A Vision Check-up for Language Models
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
GPT-4V(ision) is a Generalist Web Agent, if Grounded
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Mind2Web: Towards a Generalist Agent for the Web
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DocGraphLM: Documental Graph Language Model for Information Extraction
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
TOFU: A Task of Fictitious Unlearning for LLMs
Transformers are Multi-State RNNs
Secrets of RLHF in Large Language Models Part II: Reward Modeling
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
Towards Conversational Diagnostic AI
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Efficient LLM inference solution on Intel GPU
I am a Strange Dataset: Metalinguistic Tests for Language Models
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models
The Impact of Reasoning Step Length on Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Mixtral of Experts
ChatQA: Building GPT-4 Level Conversational QA Models
TeleChat Technical Report
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach
MaLA-500: Massive Language Adaptation of Large Language Models
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
State of What Art? A Call for Multi-Prompt LLM Evaluation
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Compressing Context to Enhance Inference Efficiency of Large Language Models
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks
VMamba: Visual State Space Model
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Self-Rewarding Language Models
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Asynchronous Local-SGD Training for Language Modeling
ReFT: Reasoning with Reinforced Fine-Tuning
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Tuning Language Models by Proxy
Scalable Pre-training of Large Autoregressive Image Models
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Extending LLMs' Context Window with 100 Samples
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
SPADE: Synthesizing Assertions for Large Language Model Pipelines
Foundations of Vector Retrieval
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Evaluating the Moral Beliefs Encoded in LLMs
Boosting Theory-of-Mind Performance in Large Language Models via Prompting
MambaByte: Token-free Selective State Space Model
MM-LLMs: Recent Advances in MultiModal Large Language Models
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
Small Language Model Meets with Reinforced Vision Vocabulary
WARM: On the Benefits of Weight Averaged Reward Models
In-Context Learning for Extreme Multi-Label Classification
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Mission: Impossible Language Models
Benchmarking LLMs via Uncertainty Quantification
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
H2O-Danube-1.8B Technical Report
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Representation Engineering: A Top-Down Approach to AI Transparency
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Efficient Tool Use with Chain-of-Abstraction Reasoning
YOLO-World: Real-Time Open-Vocabulary Object Detection
Weaver: Foundation Models for Creative Writing
Weak-to-Strong Jailbreaking on Large Language Models
Transfer Learning for Text Diffusion Models
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Generative Expressive Robot Behaviors using Large Language Models
Efficient Exploration for LLMs
Can Large Language Models Understand Context?
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
OLMo: Accelerating the Science of Language Models
Tree Prompting: Efficient Task Adaptation without Fine-Tuning
CroissantLLM: A Truly Bilingual French-English Language Model
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model
Transforming and Combining Rewards for Aligning Large Language Models
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Scaling Laws for Downstream Task Performance of Large Language Models
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Seven Failure Points When Engineering a Retrieval Augmented Generation System
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Multi-line AI-assisted Code Authoring
Self-Discover: Large Language Models Self-Compose Reasoning Structures
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Training-Free Consistent Text-to-Image Generation
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Rethinking Optimization and Architecture for Tiny Language Models
LiPO: Listwise Preference Optimization through Learning-to-Rank
BlackMamba: Mixture of Experts for State-Space Models
Rethinking Interpretability in the Era of Large Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
K-Level Reasoning with Large Language Models
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Specialized Language Models with Cheap Inference from Limited Domain Data
Repeat After Me: Transformers are Better than State Space Models at Copying
A Survey on Hallucination in Large Vision-Language Models
Corrective Retrieval Augmented Generation
A Comprehensive Survey of Compression Algorithms for Language Models
Leveraging Large Language Models for NLG Evaluation: A Survey
The Power of Noise: Redefining Retrieval for RAG Systems
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Red Teaming Visual Language Models
Knowledge Fusion of Large Language Models
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Lexinvariant Language Models
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
Mathematical Capabilities of ChatGPT
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
Large Language Models for Mathematical Reasoning: Progresses and Challenges
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Driving Everywhere with Large Language Model Policy Adaptation
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
SpiRit-LM: Interleaved Spoken and Written Language Model
Multilingual E5 Text Embeddings: A Technical Report
In-Context Principle Learning from Mistakes
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Hydragen: High-Throughput LLM Inference with Shared Prefixes
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Fast Timing-Conditioned Latent Audio Diffusion
Direct Language Model Alignment from Online AI Feedback
Grandmaster-Level Chess Without Search
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs