Last active
May 24, 2024 00:40
-
-
Save masta-g3/8f7227397b1053b42e727bbd6abf1d2e to your computer and use it in GitHub Desktop.
Updated 2024-05-23
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cedille: A large autoregressive French language model | |
The Wisdom of Hindsight Makes Language Models Better Instruction Followers | |
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks | |
Query2doc: Query Expansion with Large Language Models | |
The Internal State of an LLM Knows When its Lying | |
Structured information extraction from complex scientific text with fine-tuned large language models | |
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models | |
Large Language Models Encode Clinical Knowledge | |
PoET: A generative model of protein families as sequences-of-sequences | |
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training | |
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services | |
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs | |
Modeling Protein Using Large-scale Pretrain Language Model | |
A Watermark for Large Language Models | |
GPT is becoming a Turing machine: Here are some ways to program it | |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | |
Large Language Models are Zero-Shot Reasoners | |
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models | |
How is ChatGPT's behavior changing over time? | |
Meta-Transformer: A Unified Framework for Multimodal Learning | |
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | |
Getting More out of Large Language Models for Proofs | |
Teaching Small Language Models to Reason | |
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | |
Learning to Retrieve In-Context Examples for Large Language Models | |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
Context-Aware Abbreviation Expansion Using Large Language Models | |
Focused Transformer: Contrastive Training for Context Scaling | |
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | |
Long-range Language Modeling with Self-retrieval | |
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI | |
Towards Generalist Biomedical AI | |
Shortcut Learning of Large Language Models in Natural Language Understanding | |
Quantifying Memorization Across Neural Language Models | |
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models | |
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models | |
Copy Is All You Need | |
Automatic Chain of Thought Prompting in Large Language Models | |
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models | |
Decomposed Prompting: A Modular Approach for Solving Complex Tasks | |
Evaluating the Text-to-SQL Capabilities of Large Language Models | |
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models | |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
Are Emergent Abilities of Large Language Models a Mirage? | |
Enhancing Network Management Using Code Generated by Large Language Models | |
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks | |
ThinkSum: Probabilistic reasoning over sets using large language models | |
On the Tool Manipulation Capability of Open-source Large Language Models | |
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm | |
WavJourney: Compositional Audio Creation with Large Language Models | |
ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course | |
Secrets of RLHF in Large Language Models Part I: PPO | |
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models | |
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning | |
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes | |
Challenges and Applications of Large Language Models | |
SPOT: Knowledge-Enhanced Language Representations for Information Extraction | |
Kosmos-2: Grounding Multimodal Large Language Models to the World | |
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference | |
SKILL: Structured Knowledge Infusion for Large Language Models | |
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models | |
Understanding Social Reasoning in Language Models with Language Models | |
The Science of Detecting LLM-Generated Texts | |
CausalLM is not optimal for in-context learning | |
Questioning the Survey Responses of Large Language Models | |
Extending Context Window of Large Language Models via Positional Interpolation | |
ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing | |
Probing Factually Grounded Content Transfer with Factual Ablation | |
Teach LLMs to Personalize -- An Approach inspired by Writing Education | |
Pre-Trained Large Language Models for Industrial Control | |
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences | |
LongNet: Scaling Transformers to 1,000,000,000 Tokens | |
Self-Alignment with Instruction Backtranslation | |
Guiding Pretraining in Reinforcement Learning with Large Language Models | |
Large Language Models are Zero-Shot Rankers for Recommender Systems | |
Model evaluation for extreme risks | |
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks | |
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL | |
A Simple and Effective Pruning Approach for Large Language Models | |
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors | |
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback | |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates | |
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT | |
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models | |
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | |
PromptChainer: Chaining Large Language Model Prompts through Visual Programming | |
PIPPA: A Partially Synthetic Conversational Dataset | |
Let's Verify Step by Step | |
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics | |
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts | |
Large Language Models Are Reasoning Teachers | |
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models | |
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence | |
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations | |
Connecting Neural Response measurements & Computational Models of language: a non-comprehensive guide | |
Accelerating LLM Inference with Staged Speculative Decoding | |
Large Language Models for Supply Chain Optimization | |
Do Large Language Models know what humans know? | |
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction | |
Faithful Chain-of-Thought Reasoning | |
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts | |
Superposition of many models into one | |
Learning to Model the World with Language | |
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | |
Unifying Large Language Models and Knowledge Graphs: A Roadmap | |
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models | |
QLoRA: Efficient Finetuning of Quantized LLMs | |
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment | |
Co-Writing with Opinionated Language Models Affects Users' Views | |
Language models show human-like content effects on reasoning | |
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking | |
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code | |
OpenAGI: When LLM Meets Domain Experts | |
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies | |
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models | |
Beyond Generating Code: Evaluating GPT on a Data Visualization Course | |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | |
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition | |
LLM-Rec: Personalized Recommendation via Prompting Large Language Models | |
Studying Large Language Model Generalization with Influence Functions | |
Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) | |
From Sparse to Soft Mixtures of Experts | |
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization | |
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation | |
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models | |
Large Language Model Guided Tree-of-Thought | |
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | |
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | |
When Geometric Deep Learning Meets Pretrained Protein Language Models | |
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level | |
Language models are weak learners | |
How Many Demonstrations Do You Need for In-context Learning? | |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | |
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? | |
Gorilla: Large Language Model Connected with Massive APIs | |
Automatic Generation of Programming Exercises and Code Explanations using Large Language Models | |
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models | |
Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models | |
WebArena: A Realistic Web Environment for Building Autonomous Agents | |
Language Models can Solve Computer Tasks | |
ChatGPT Is on the Horizon: Could a Large Language Model Be All We Need for Intelligent Transportation? | |
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling | |
Invariant Language Modeling | |
Solving Quantitative Reasoning Problems with Language Models | |
Personality Traits in Large Language Models | |
Prompting Large Language Models with Speech Recognition Abilities | |
Selective Annotation Makes Language Models Better Few-Shot Learners | |
Using Captum to Explain Generative Language Models | |
Fine-Tuning Language Models with Just Forward Passes | |
In-context Autoencoder for Context Compression in a Large Language Model | |
Entity Projection via Machine Translation for Cross-Lingual NER | |
OctoPack: Instruction Tuning Code Large Language Models | |
AlpaGasus: Training A Better Alpaca with Fewer Data | |
Large Language Models Are Human-Level Prompt Engineers | |
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales | |
CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction | |
WizardCoder: Empowering Code Large Language Models with Evol-Instruct | |
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning | |
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | |
Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach | |
Large Language Models Can Self-Improve | |
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks | |
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | |
More Agents Is All You Need | |
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models | |
Teaching Algorithmic Reasoning via In-context Learning | |
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | |
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs | |
The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python | |
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding | |
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | |
Automatic Evaluation of Attribution by Large Language Models | |
Generative Agents: Interactive Simulacra of Human Behavior | |
ALERT: Adapting Language Models to Reasoning Tasks | |
How does the pre-training objective affect what large language models learn about linguistic properties? | |
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | |
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | |
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought | |
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks | |
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality | |
FLIRT: Feedback Loop In-context Red Teaming | |
News Summarization and Evaluation in the Era of GPT-3 | |
Galactica: A Large Language Model for Science | |
Towards Reasoning in Large Language Models: A Survey | |
Chain-Of-Thought Prompting Under Streaming Batch: A Case Study | |
Shepherd: A Critic for Language Model Generation | |
Emergent autonomous scientific research capabilities of large language models | |
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language | |
Social Simulacra: Creating Populated Prototypes for Social Computing Systems | |
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | |
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs | |
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis | |
Universal and Transferable Adversarial Attacks on Aligned Language Models | |
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages | |
Complexity-Based Prompting for Multi-Step Reasoning | |
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | |
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance | |
Scaling TransNormer to 175 Billion Parameters | |
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | |
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation | |
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | |
Learning ASR pathways: A sparse multilingual ASR model | |
Stay on topic with Classifier-Free Guidance | |
Constitutional AI: Harmlessness from AI Feedback | |
Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis | |
Teaching Arithmetic to Small Transformers | |
Demystifying GPT Self-Repair for Code Generation | |
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education | |
Link-Context Learning for Multimodal LLMs | |
Large Language Models Perform Diagnostic Reasoning | |
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback | |
AgentBench: Evaluating LLMs as Agents | |
Simple synthetic data reduces sycophancy in large language models | |
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | |
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | |
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models | |
Re-visiting Automated Topic Model Evaluation with Large Language Models | |
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting | |
Adaptive Test Generation Using a Large Language Model | |
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning | |
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models | |
PaLM: Scaling Language Modeling with Pathways | |
Teaching Large Language Models to Self-Debug | |
Building Cooperative Embodied Agents Modularly with Large Language Models | |
Urdu text in natural scene images: a new dataset and preliminary text detection | |
LIMA: Less Is More for Alignment | |
Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs | |
GPT-NER: Named Entity Recognition via Large Language Models | |
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge | |
Code as Policies: Language Model Programs for Embodied Control | |
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | |
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models | |
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models | |
Inspecting and Editing Knowledge Representations in Language Models | |
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents | |
Large language models effectively leverage document-level context for literary translation, but critical errors persist | |
Med-Flamingo: a Multimodal Medical Few-shot Learner | |
Jigsaw: Large Language Models meet Program Synthesis | |
Large Language Models Struggle to Learn Long-Tail Knowledge | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | |
Textbooks Are All You Need | |
Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges | |
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | |
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 | |
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | |
Three Bricks to Consolidate Watermarks for Large Language Models | |
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation | |
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets | |
One-shot Machine Teaching: Cost Very Few Examples to Converge Faster | |
Theory of Mind May Have Spontaneously Emerged in Large Language Models | |
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | |
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | |
Tiny LVLM-eHub: Early Multimodal Experiments with Bard | |
Language Is Not All You Need: Aligning Perception with Language Models | |
Mind's Eye: Grounded Language Model Reasoning through Simulation | |
StarCoder: may the source be with you! | |
Self-Critique Prompting with Large Language Models for Inductive Instructions | |
PaLM 2 Technical Report | |
Repository-Level Prompt Generation for Large Language Models of Code | |
L-Eval: Instituting Standardized Evaluation for Long Context Language Models | |
Measuring and Narrowing the Compositionality Gap in Language Models | |
Differentially Private Fine-tuning of Language Models | |
A Latent Space Theory for Emergent Abilities in Large Language Models | |
Reflexion: Language Agents with Verbal Reinforcement Learning | |
Ambient Adventures: Teaching ChatGPT on Developing Complex Stories | |
LEACE: Perfect linear concept erasure in closed form | |
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods | |
A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models | |
Voyager: An Open-Ended Embodied Agent with Large Language Models | |
FinGPT: Open-Source Financial Large Language Models | |
Block Belief Propagation for Parameter Learning in Markov Random Fields | |
Lost in the Middle: How Language Models Use Long Contexts | |
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks | |
Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation | |
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | |
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | |
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding | |
The Hydra Effect: Emergent Self-repair in Language Model Computations | |
Educational data augmentation in physics education research using ChatGPT | |
PolyLM: An Open Source Polyglot Large Language Model | |
Towards Expert-Level Medical Question Answering with Large Language Models | |
Is GPT-4 a Good Data Analyst? | |
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision | |
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions | |
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models | |
Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data | |
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond | |
ReAct: Synergizing Reasoning and Acting in Language Models | |
Augmenting Language Models with Long-Term Memory | |
BloombergGPT: A Large Language Model for Finance | |
A Systematic Evaluation of Large Language Models of Code | |
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models | |
Robot Task Planning and Situation Handling in Open Worlds | |
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences | |
Emergent Abilities of Large Language Models | |
Can Large Language Models design a Robot? | |
KoLA: Carefully Benchmarking World Knowledge of Large Language Models | |
Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding | |
DarkBERT: A Language Model for the Dark Side of the Internet | |
Measuring Faithfulness in Chain-of-Thought Reasoning | |
Retentive Network: A Successor to Transformer for Large Language Models | |
Dissociating language and thought in large language models: a cognitive perspective | |
Large Language Models are Better Reasoners with Self-Verification | |
Can large language models reason about medical questions? | |
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective | |
ARB: Advanced Reasoning Benchmark for Large Language Models | |
Rethinking with Retrieval: Faithful Large Language Model Inference | |
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models | |
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning | |
Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning | |
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners | |
Large Language Models as Corporate Lobbyists | |
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework | |
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation | |
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models | |
Talking About Large Language Models | |
Platypus: Quick, Cheap, and Powerful Refinement of LLMs | |
Large Language Models Can Be Easily Distracted by Irrelevant Context | |
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | |
OpenICL: An Open-Source Framework for In-context Learning | |
Emergence of Maps in the Memories of Blind Navigation Agents | |
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers | |
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining | |
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention | |
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation | |
Learning to Reason and Memorize with Self-Notes | |
ChemCrow: Augmenting large-language models with chemistry tools | |
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor | |
Learning to Compress Prompts with Gist Tokens | |
Unlimiformer: Long-Range Transformers with Unlimited Length Input | |
StructGPT: A General Framework for Large Language Model to Reason over Structured Data | |
ChatGPT: Applications, Opportunities, and Threats | |
Memory Augmented Large Language Models are Computationally Universal | |
PaLM-E: An Embodied Multimodal Language Model | |
M2T: Masking Transformers Twice for Faster Decoding | |
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond | |
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models | |
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature | |
Auditing large language models: a three-layered approach | |
Language models in molecular discovery | |
Offsite-Tuning: Transfer Learning without Full Model | |
MusicLM: Generating Music From Text | |
Context-faithful Prompting for Large Language Models | |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models | |
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models | |
GPTutor: a ChatGPT-powered programming tool for code explanation | |
Larger language models do in-context learning differently | |
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans | |
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker | |
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge | |
Multimodal Chain-of-Thought Reasoning in Language Models | |
Recitation-Augmented Language Models | |
Hyena Hierarchy: Towards Larger Convolutional Language Models | |
Eight Things to Know about Large Language Models | |
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | |
A Survey on Model Compression for Large Language Models | |
Active Retrieval Augmented Generation | |
Toolformer: Language Models Can Teach Themselves to Use Tools | |
Evaluating Verifiability in Generative Search Engines | |
Augmented Language Models: a Survey | |
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness | |
Giraffe: Adventures in Expanding Context Lengths in LLMs | |
LLM As DBA | |
Scaling Transformer to 1M tokens and beyond with RMT | |
TidyBot: Personalized Robot Assistance with Large Language Models | |
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering | |
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability | |
Active Prompting with Chain-of-Thought for Large Language Models | |
A Categorical Archive of ChatGPT Failures | |
Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity | |
Better Language Models of Code through Self-Improvement | |
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents | |
The Capacity for Moral Self-Correction in Large Language Models | |
Poisoning Language Models During Instruction Tuning | |
Prompt2Model: Generating Deployable Models from Natural Language Instructions | |
Data Selection for Language Models via Importance Resampling | |
Enabling Conversational Interaction with Mobile UI using Large Language Models | |
Evidence of Meaning in Language Models Trained on Programs | |
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | |
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models | |
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark | |
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models | |
Symbol tuning improves in-context learning in language models | |
REPLUG: Retrieval-Augmented Black-Box Language Models | |
Why do Nearest Neighbor Language Models Work? | |
Prismer: A Vision-Language Model with An Ensemble of Experts | |
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models | |
CALYPSO: LLMs as Dungeon Masters' Assistants | |
Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice | |
Code Llama: Open Foundation Models for Code | |
Ground Manipulator Primitive Tasks to Executable Actions using Large Language Models | |
Faithful to Whom? Questioning Interpretability Measures in NLP | |
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis | |
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts | |
How Good Are Large Language Models at Out-of-Distribution Detection? | |
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | |
Can Large Language Models Find And Fix Vulnerable Software? | |
Large Language Models for Software Engineering: A Systematic Literature Review | |
Informed Named Entity Recognition Decoding for Generative Language Models | |
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities | |
Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models | |
Better Zero-Shot Reasoning with Role-Play Prompting | |
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning | |
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | |
A Survey on Large Language Model based Autonomous Agents | |
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | |
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models | |
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model | |
Evaluating ChatGPT and GPT-4 for Visual Programming | |
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models | |
D4: Improving LLM Pretraining via Document De-Duplication and Diversification | |
Cabrita: closing the gap for foreign languages | |
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems | |
ProAgent: Building Proactive Cooperative AI with Large Language Models | |
Instruction Position Matters in Sequence Generation with Large Language Models | |
Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value Extraction | |
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation | |
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models | |
Large Language Model as Autonomous Decision Maker | |
Large Language Models as Superpositions of Cultural Perspectives | |
Activation Addition: Steering Language Models Without Optimization | |
Enhancing Recommender Systems with Large Language Model Reasoning Graphs | |
GPTEval: A Survey on Assessments of ChatGPT and GPT-4 | |
An Empirical Study on Challenging Math Problem Solving with GPT-4 | |
Forward-Backward Reasoning in Large Language Models for Verification | |
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI | |
Dynamic Planning with a LLM | |
"Guinea Pig Trials" Utilizing GPT: A Novel Smart Agent-Based Modeling Approach for Studying Firm Competition and Collusion | |
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models | |
Bridging the Gap: Deciphering Tabular Data Using Large Language Model | |
The Pile: An 800GB Dataset of Diverse Text for Language Modeling | |
Prompting Is Programming: A Query Language for Large Language Models | |
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models | |
Knowledge Graph Prompting for Multi-Document Question Answering | |
GPT detectors are biased against non-native English writers | |
GradientCoin: A Peer-to-Peer Decentralized Large Language Models | |
RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models | |
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning | |
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models | |
Time Travel in LLMs: Tracing Data Contamination in Large Language Models | |
Can Language Models Learn to Listen? | |
Detecting The Corruption Of Online Questionnaires By Artificial Intelligence | |
Towards an Understanding of Large Language Models in Software Engineering Tasks | |
YaRN: Efficient Context Window Extension of Large Language Models | |
An Examination of the Compositionality of Large Generative Vision-Language Models | |
Company Similarity using Large Language Models | |
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs | |
Instruction Tuning for Large Language Models: A Survey | |
Language to Rewards for Robotic Skill Synthesis | |
Is There Any Social Principle for LLM-Based Agents? | |
A Study on Robustness and Reliability of Large Language Model Code Generation | |
Leveraging Large Language Models for Pre-trained Recommender Systems | |
Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models | |
LLaSM: Large Language and Speech Model | |
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation | |
DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue | |
FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt | |
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA | |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework | |
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | |
Pretraining on the Test Set Is All You Need | |
The AI Revolution in Education: Will AI Replace or Assist Teachers in Higher Education? | |
Reinforced Self-Training (ReST) for Language Modeling | |
Fast Inference from Transformers via Speculative Decoding | |
LoRA: Low-Rank Adaptation of Large Language Models | |
Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models | |
AI Deception: A Survey of Examples, Risks, and Potential Solutions | |
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback | |
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects | |
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation | |
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | |
Blockwise Parallel Decoding for Deep Autoregressive Models | |
Assigning AI: Seven Approaches for Students, with Prompts | |
Conformal Prediction with Large Language Models for Multi-Choice Question Answering | |
Attention: Marginal Probability is All You Need? | |
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test | |
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time | |
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records | |
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following | |
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models | |
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models | |
XGen-7B Technical Report | |
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models | |
Can Programming Languages Boost Each Other via Instruction Tuning? | |
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants | |
Efficient RLHF: Reducing the Memory Usage of PPO | |
Universal Self-adaptive Prompting | |
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | |
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior | |
One Wide Feedforward is All You Need | |
Better Zero-Shot Reasoning with Self-Adaptive Prompting | |
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge | |
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | |
Graph of Thoughts: Solving Elaborate Problems with Large Language Models | |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | |
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models | |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | |
SoTaNa: The Open-Source Software Development Assistant | |
GPT Can Solve Mathematical Problems Without a Calculator | |
Physically Grounded Vision-Language Models for Robotic Manipulation | |
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios | |
FLM-101B: An Open LLM and How to Train It with $100K Budget | |
LaMDA: Language Models for Dialog Applications | |
LMDX: Language Model-based Document Information Extraction and Localization | |
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers | |
Do Multilingual Language Models Think Better in English? | |
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute | |
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild | |
Textbooks Are All You Need II: phi-1.5 technical report | |
Replacing softmax with ReLU in Vision Transformers | |
Investigating Answerability of LLMs for Long-Form Question Answering | |
Vector Search with OpenAI Embeddings: Lucene Is All You Need | |
The Rise and Potential of Large Language Model Based Agents: A Survey | |
Cure the headache of Transformers via Collinear Constrained Attention | |
Uncovering mesa-optimization algorithms in Transformers | |
Large Language Models for Compiler Optimization | |
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages | |
Chain-of-Verification Reduces Hallucination in Large Language Models | |
AstroLLaMA: Towards Specialized Foundation Models in Astronomy | |
Compositional Foundation Models for Hierarchical Planning | |
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents | |
Sparse Autoencoders Find Highly Interpretable Features in Language Models | |
DreamLLM: Synergistic Multimodal Comprehension and Creation | |
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) | |
Improving Language Models with Advantage-based Offline Policy Gradients | |
Improving Factuality and Reasoning in Language Models through Multiagent Debate | |
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting | |
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model | |
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models | |
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? | |
Multimodal Foundation Models: From Specialists to General-Purpose Assistants | |
Boolformer: Symbolic Regression of Logic Functions with Transformers | |
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? | |
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models | |
TP-Aware Dequantization | |
LASER: LLM Agent with State-Space Exploration for Web Navigation | |
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models | |
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs | |
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | |
Baichuan 2: Open Large-scale Language Models | |
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer | |
Efficient Benchmarking (of Language Models) | |
Context is Environment | |
Analyzing Transformer Dynamics as Movement through Embedding Space | |
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs | |
RMT: Retentive Networks Meet Vision Transformers | |
Stack-and-Delay: a new codebook pattern for music generation | |
Neurons in Large Language Models: Dead, N-gram, Positional | |
Large Language Model for Science: A Study on P vs. NP | |
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | |
Data Augmentation for Spoken Language Understanding via Pretrained Language Models | |
Petals: Collaborative Inference and Fine-tuning of Large Models | |
Scaling Laws for Sparsely-Connected Foundation Models | |
Kosmos-2.5: A Multimodal Literate Model | |
PDFTriage: Question Answering over Long, Structured Documents | |
Statistical Rejection Sampling Improves Preference Optimization | |
Stabilizing RLHF through Advantage Model and Selective Rehearsal | |
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset | |
Leveraging Contextual Information for Effective Entity Salience Detection | |
NExT-GPT: Any-to-Any Multimodal LLM | |
Are Emergent Abilities in Large Language Models just In-Context Learning? | |
RACE: Large-scale ReAding Comprehension Dataset From Examinations | |
Large-Scale Automatic Audiobook Creation | |
Recovering from Privacy-Preserving Masking with Large Language Models | |
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | |
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations | |
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology | |
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning | |
RAIN: Your Language Models Can Align Themselves without Finetuning | |
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale | |
Hypothesis Search: Inductive Reasoning with Language Models | |
Agents: An Open-source Framework for Autonomous Language Agents | |
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models | |
Gated recurrent neural networks discover attention | |
Contrastive Decoding Improves Reasoning in Large Language Models | |
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts | |
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning | |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models | |
Adapting Large Language Models via Reading Comprehension | |
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention | |
MindAgent: Emergent Gaming Interaction | |
Graph Neural Prompting with Large Language Models | |
Sparks of Artificial General Intelligence: Early experiments with GPT-4 | |
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | |
Efficient Post-training Quantization with FP8 Formats | |
Taken out of context: On measuring situational awareness in LLMs | |
Jointly Training Large Autoregressive Multimodal Models | |
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" | |
Curriculum Learning with Adam: The Devil Is in the Wrong Details | |
OWL: A Large Language Model for IT Operations | |
Faith and Fate: Limits of Transformers on Compositionality | |
CodePlan: Repository-level Coding using LLMs and Planning | |
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | |
Efficient Memory Management for Large Language Model Serving with PagedAttention | |
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models | |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | |
SCREWS: A Modular Framework for Reasoning with Revisions | |
Transformer models: an introduction and catalog | |
Small-scale proxies for large-scale Transformer training instabilities | |
Effective Long-Context Scaling of Foundation Models | |
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | |
Qwen Technical Report | |
Attention Approximates Sparse Distributed Memory | |
Calibrating LLM-Based Evaluator | |
Ambiguity-Aware In-Context Learning with Large Language Models | |
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond | |
Vision Transformers Need Registers | |
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic | |
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction | |
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models | |
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models | |
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition | |
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval | |
Language Modeling Is Compression | |
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models | |
Aligning Large Multimodal Models with Factually Augmented RLHF | |
Large Language Models as Optimizers | |
SlimPajama-DC: Understanding Data Combinations for LLM Training | |
Finite Scalar Quantization: VQ-VAE Made Simple | |
Physics of Language Models: Part 3.2, Knowledge Manipulation | |
Efficient Streaming Language Models with Attention Sinks | |
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | |
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution | |
LLM-grounded Video Diffusion Models | |
Enable Language Models to Implicitly Learn Self-Improvement From Data | |
Emergent Analogical Reasoning in Large Language Models | |
RA-DIT: Retrieval-Augmented Dual Instruction Tuning | |
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation | |
Large Language Models Cannot Self-Correct Reasoning Yet | |
SmartPlay : A Benchmark for LLMs as Intelligent Agents | |
Language Models Represent Space and Time | |
Retrieval meets Long Context Large Language Models | |
Borges and AI | |
Can large language models provide useful feedback on research papers? A large-scale empirical analysis | |
Ring Attention with Blockwise Transformers for Near-Infinite Context | |
Can Language Models be Instructed to Protect Personal Information? | |
QuIP: 2-Bit Quantization of Large Language Models With Guarantees | |
Who's Harry Potter? Approximate Unlearning in LLMs | |
Low-Resource Languages Jailbreak GPT-4 | |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | |
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning | |
EcoAssistant: Using LLM Assistant More Affordably and Accurately | |
How FaR Are Large Language Models From Agents with Theory-of-Mind? | |
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | |
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation | |
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | |
HeaP: Hierarchical Policies for Web Actions using LLMs | |
A Long Way to Go: Investigating Length Correlations in RLHF | |
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation | |
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors | |
Think before you speak: Training Language Models With Pause Tokens | |
Mistral 7B | |
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? | |
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity | |
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading | |
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation | |
Large Language Models can Learn Rules | |
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | |
Large Language Models Are Zero-Shot Time Series Forecasters | |
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | |
Learning Interactive Real-World Simulators | |
FireAct: Toward Language Agent Fine-tuning | |
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining | |
Text Embeddings Reveal (Almost) As Much As Text | |
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation | |
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics | |
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models | |
Lemur: Harmonizing Natural Language and Code for Language Agents | |
LangNav: Language as a Perceptual Representation for Navigation | |
The LAMBADA dataset: Word prediction requiring a broad discourse context | |
Octopus: Embodied Vision-Language Programmer from Environmental Feedback | |
Toward Joint Language Modeling for Speech Units and Text | |
MemGPT: Towards LLMs as Operating Systems | |
A Zero-Shot Language Agent for Computer Control with Structured Reflection | |
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | |
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training | |
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules | |
The Consensus Game: Language Model Generation via Equilibrium Search | |
Table-GPT: Table-tuned GPT for Diverse Table Tasks | |
PaLI-3 Vision Language Models: Smaller, Faster, Stronger | |
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens | |
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation | |
Deep Learning Scaling is Predictable, Empirically | |
MLQA: Evaluating Cross-lingual Extractive Question Answering | |
OpenAssistant Conversations -- Democratizing Large Language Model Alignment | |
Intersectional Bias in Hate Speech and Abusive Language Datasets | |
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | |
Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning | |
AI Ethics Issues in Real World: Evidence from AI Incident Database | |
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models | |
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT | |
Measuring Mathematical Problem Solving With the MATH Dataset | |
Can Machines Learn Morality? The Delphi Experiment | |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | |
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts | |
AndroidEnv: A Reinforcement Learning Platform for Android | |
Demoting Racial Bias in Hate Speech Detection | |
Social Bias Frames: Reasoning about Social and Power Implications of Language | |
Characterising Bias in Compressed Models | |
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes | |
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback | |
Towards Robust Toxic Content Classification | |
The Challenge of Value Alignment: from Fairer Algorithms to AI Safety | |
Towards Continual Knowledge Learning of Language Models | |
The Pushshift Reddit Dataset | |
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs | |
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation | |
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? | |
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling | |
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack | |
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems | |
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus | |
One Epoch Is All You Need | |
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading | |
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango | |
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System | |
Plug and Play Language Models: A Simple Approach to Controlled Text Generation | |
NewsQA: A Machine Comprehension Dataset | |
AmbiPun: Generating Humorous Puns with Ambiguous Context | |
Deal or No Deal? End-to-End Learning for Negotiation Dialogues | |
Competition-Level Code Generation with AlphaCode | |
STaR: Bootstrapping Reasoning With Reasoning | |
Efficient Neural Architecture Search via Parameter Sharing | |
Recursively Summarizing Books with Human Feedback | |
Habitat: A Platform for Embodied AI Research | |
Generate & Rank: A Multi-task Framework for Math Word Problems | |
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity | |
Mitigating Statistical Bias within Differentially Private Synthetic Data | |
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning | |
RecGPT: Generative Pre-training for Text-based Recommendation | |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | |
An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models | |
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks | |
Controlling Style in Generated Dialogue | |
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | |
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search | |
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation | |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | |
Societal Biases in Language Generation: Progress and Challenges | |
Counterfactual Fairness in Text Classification through Robustness | |
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions | |
Deep Double Descent: Where Bigger Models and More Data Hurt | |
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations | |
InCoder: A Generative Model for Code Infilling and Synthesis | |
Back to the Future: On Potential Histories in NLP | |
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization | |
Sharp Minima Can Generalize For Deep Nets | |
Self-attention Does Not Need $O(n^2)$ Memory | |
Measuring the Carbon Intensity of AI in Cloud Instances | |
SocialIQA: Commonsense Reasoning about Social Interactions | |
Generating Long Sequences with Sparse Transformers | |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | |
QAmeleon: Multilingual QA with Only 5 Examples | |
CTRL: A Conditional Transformer Language Model for Controllable Generation | |
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models | |
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models | |
Impact of Pretraining Term Frequencies on Few-Shot Reasoning | |
Is neural language acquisition similar to natural? A chronological probing study | |
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent | |
Buffer Overflow in Mixture of Experts | |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | |
Bag of Tricks for Efficient Text Classification | |
Automatic Detection of Machine Generated Text: A Critical Survey | |
Adversarial Training for Large Neural Language Models | |
Diffsound: Discrete Diffusion Model for Text-to-sound Generation | |
TALM: Tool Augmented Language Models | |
Training Language Models with Language Feedback | |
Toxicity in Multilingual Machine Translation at Scale | |
PEER: A Collaborative Language Model | |
On the Multilingual Capabilities of Very Large-Scale English Language Models | |
LLaMA: Open and Efficient Foundation Language Models | |
SECure: A Social and Environmental Certificate for AI Systems | |
Gaussian Error Linear Units (GELUs) | |
RoFormer: Enhanced Transformer with Rotary Position Embedding | |
Measuring Massive Multitask Language Understanding | |
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension | |
To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making | |
Leveraging QA Datasets to Improve Generative Data Augmentation | |
Decoupled Weight Decay Regularization | |
A Distributional Approach to Controlled Text Generation | |
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering | |
The Turking Test: Can Language Models Understand Instructions? | |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | |
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation | |
Language Models (Mostly) Know What They Know | |
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned | |
Towards Understanding and Mitigating Social Biases in Language Models | |
Discovering and Categorising Language Biases in Reddit | |
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation | |
Training Verifiers to Solve Math Word Problems | |
The Curse of Recursion: Training on Generated Data Makes Models Forget | |
Compositional Semantic Parsing with Large Language Models | |
Transforming Question Answering Datasets Into Natural Language Inference Datasets | |
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets | |
The Values Encoded in Machine Learning Research | |
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning | |
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems | |
Ethical and social risks of harm from Language Models | |
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems | |
Understanding HTML with Large Language Models | |
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning | |
AudioLM: a Language Modeling Approach to Audio Generation | |
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding | |
Behavior Cloned Transformers are Neurosymbolic Reasoners | |
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review | |
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models | |
Thou shalt not hate: Countering Online Hate Speech | |
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) | |
Participation is not a Design Fix for Machine Learning | |
Retrieval Augmentation Reduces Hallucination in Conversation | |
Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize | |
How Many Data Samples is an Additional Instruction Worth? | |
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims | |
Crosslingual Generalization through Multitask Finetuning | |
The Curious Case of Neural Text Degeneration | |
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction | |
VinaLLaMA: LLaMA-based Vietnamese Foundation Model | |
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference | |
Evaluating the Social Impact of Generative AI Systems in Systems and Society | |
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | |
Towards A Rigorous Science of Interpretable Machine Learning | |
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | |
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender | |
Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models | |
Defending Against Neural Fake News | |
Analyzing Dynamic Adversarial Training Data in the Limit | |
Criticality in Formal Languages and Statistical Physics | |
Generating Wikipedia by Summarizing Long Sequences | |
Gender Bias in Contextualized Word Embeddings | |
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset | |
Deep Generative Dual Memory Network for Continual Learning | |
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | |
Persistent Anti-Muslim Bias in Large Language Models | |
Mirages: On Anthropomorphism in Dialogue Systems | |
Deep Learning for Symbolic Mathematics | |
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents | |
A Survey On Universal Adversarial Attack | |
Atlas: Few-shot Learning with Retrieval Augmented Language Models | |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | |
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning | |
A framework for the extraction of Deep Neural Networks by leveraging public data | |
Recipes for building an open-domain chatbot | |
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent | |
Measuring the Effects of Data Parallelism on Neural Network Training | |
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports | |
Kosmos-G: Generating Images in Context with Multimodal Large Language Models | |
X-SQL: reinforce schema representation with context | |
Constructing Datasets for Multi-hop Reading Comprehension Across Documents | |
FastText.zip: Compressing text classification models | |
The State and Fate of Linguistic Diversity and Inclusion in the NLP World | |
A General Language Assistant as a Laboratory for Alignment | |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention | |
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly | |
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms | |
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech | |
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model | |
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving | |
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection | |
Deep Learning Based Text Classification: A Comprehensive Review | |
Automated Hate Speech Detection and the Problem of Offensive Language | |
Multi-Dimensional Gender Bias Classification | |
Extracting Training Data from Large Language Models | |
ProsocialDialog: A Prosocial Backbone for Conversational Agents | |
Cross-Task Generalization via Natural Language Crowdsourcing Instructions | |
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection | |
FlowQA: Grasping Flow in History for Conversational Machine Comprehension | |
Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey | |
Improving alignment of dialogue agents via targeted human judgements | |
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing | |
Explanation in Artificial Intelligence: Insights from the Social Sciences | |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | |
Revealing Persona Biases in Dialogue Systems | |
GeDi: Generative Discriminator Guided Sequence Generation | |
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech | |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | |
UL2: Unifying Language Learning Paradigms | |
Self-Instruct: Aligning Language Models with Self-Generated Instructions | |
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings | |
Does Gender Matter? Towards Fairness in Dialogue Systems | |
Energy and Policy Considerations for Deep Learning in NLP | |
The False Promise of Imitating Proprietary LLMs | |
Directional Bias Amplification | |
Hierarchical Text-Conditional Image Generation with CLIP Latents | |
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection | |
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | |
Task-aware Retrieval with Instructions | |
Do Prompt-Based Models Really Understand the Meaning of their Prompts? | |
Reading Wikipedia to Answer Open-Domain Questions | |
Supervising Model Attention with Human Explanations for Robust Natural Language Inference | |
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis | |
Latent Retrieval for Weakly Supervised Open Domain Question Answering | |
Teaching language models to support answers with verified quotes | |
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension | |
MasakhaNER: Named Entity Recognition for African Languages | |
Predicting the Type and Target of Offensive Posts in Social Media | |
Learning to Model Editing Processes | |
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | |
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering | |
Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles | |
Quantifying the Carbon Emissions of Machine Learning | |
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping | |
Chasing Carbon: The Elusive Environmental Footprint of Computing | |
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion | |
Distilling Reasoning Capabilities into Smaller Language Models | |
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning | |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | |
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks | |
WebGPT: Browser-assisted question-answering with human feedback | |
Making Large Language Models Better Reasoners with Step-Aware Verifier | |
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books | |
SGPT: GPT Sentence Embeddings for Semantic Search | |
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models | |
Building a Conversational Agent Overnight with Dialogue Self-Play | |
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks | |
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets | |
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection | |
Neural Machine Translation of Rare Words with Subword Units | |
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection | |
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation | |
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models | |
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | |
Know What You Don't Know: Unanswerable Questions for SQuAD | |
Longformer: The Long-Document Transformer | |
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus | |
A Constructive Prediction of the Generalization Error Across Scales | |
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases | |
KERMIT: Generative Insertion-Based Modeling for Sequences | |
mGPT: Few-Shot Learners Go Multilingual | |
The Natural Language Decathlon: Multitask Learning as Question Answering | |
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents | |
A Survey of Race, Racism, and Anti-Racism in NLP | |
Unraveling the Hidden Environmental Impacts of AI Solutions for Environment | |
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding | |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | |
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering | |
Hyperbolic Image-Text Representations | |
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey | |
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models | |
Pretraining Language Models with Human Preferences | |
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English | |
MTEB: Massive Text Embedding Benchmark | |
Interscript: A dataset for interactive learning of scripts through error feedback | |
Looped Transformers as Programmable Computers | |
Inner Monologue: Embodied Reasoning through Planning with Language Models | |
No Language Left Behind: Scaling Human-Centered Machine Translation | |
Collaborative Storytelling with Large-scale Neural Language Models | |
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge | |
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation | |
Recipes for Safety in Open-domain Chatbots | |
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | |
Pre-Trained Language Models for Interactive Decision-Making | |
Can Large Language Models Really Improve by Self-critiquing Their Own Plans? | |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | |
Formal Algorithms for Transformers | |
An Emulator for Fine-Tuning Large Language Models using Small Language Models | |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | |
Democratizing Reasoning Ability: Tailored Learning from Large Language Model | |
HellaSwag: Can a Machine Really Finish Your Sentence? | |
Teaching Language Models to Self-Improve through Interactive Demonstrations | |
Ranking LLM-Generated Loop Invariants for Program Verification | |
Approximating Two-Layer Feedforward Networks for Efficient Transformers | |
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets | |
When can transformers reason with abstract symbols? | |
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models | |
Language Models are Few-shot Multilingual Learners | |
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP | |
AutoMix: Automatically Mixing Language Models | |
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models | |
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | |
Pre-trained Summarization Distillation | |
TEQ: Trainable Equivalent Transformation for Quantization of LLMs | |
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning | |
Improving Large Language Model Fine-tuning for Solving Math Problems | |
Language Models are General-Purpose Interfaces | |
Llemma: An Open Language Model For Mathematics | |
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners | |
Gender Bias in Machine Translation | |
Towards a Human-like Open-Domain Chatbot | |
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation | |
A Network-based End-to-End Trainable Task-oriented Dialogue System | |
Safe RLHF: Safe Reinforcement Learning from Human Feedback | |
Cloze-driven Pretraining of Self-attention Networks | |
Universal Language Model Fine-tuning for Text Classification | |
OPT: Open Pre-trained Transformer Language Models | |
Towards Zero-Label Language Learning | |
GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems | |
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | |
Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models | |
Fine-tuned Language Models are Continual Learners | |
3D-GPT: Procedural 3D Modeling with Large Language Models | |
PAL: Program-aided Language Models | |
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | |
Large Language Models for Software Engineering: Survey and Open Problems | |
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots | |
Self-critiquing models for assisting human evaluators | |
Towards Understanding Sycophancy in Language Models | |
SALMONN: Towards Generic Hearing Abilities for Large Language Models | |
Finetuned Language Models Are Zero-Shot Learners | |
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them | |
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search | |
Generating Sequences by Learning to Self-Correct | |
The Depth-to-Width Interplay in Self-Attention | |
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning | |
Internet-augmented language models through few-shot prompting for open-domain question answering | |
GLM-130B: An Open Bilingual Pre-trained Model | |
Three scenarios for continual learning | |
Eureka: Human-Level Reward Design via Coding Large Language Models | |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model | |
An Explanation of In-context Learning as Implicit Bayesian Inference | |
AgentTuning: Enabling Generalized Agent Abilities for LLMs | |
Snapshot Ensembles: Train 1, get M for free | |
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model | |
On the Planning Abilities of Large Language Models -- A Critical Investigation | |
Efficient Estimation of Word Representations in Vector Space | |
Visualizing the Loss Landscape of Neural Nets | |
Contrastive Preference Learning: Learning from Human Feedback without RL | |
High-Resolution Image Synthesis with Latent Diffusion Models | |
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents | |
H2O Open Ecosystem for State-of-the-art Large Language Models | |
Calibrate Before Use: Improving Few-Shot Performance of Language Models | |
All-in-One Image-Grounded Conversational Agents | |
Interactive Task Planning with Language Models | |
Can AI-Generated Text be Reliably Detected? | |
BitNet: Scaling 1-bit Transformers for Large Language Models | |
Scaling Laws for Neural Language Models | |
Self-Refine: Iterative Refinement with Self-Feedback | |
Adversarial Environment Generation for Learning to Navigate the Web | |
Cross-Lingual Language Model Meta-Pretraining | |
Creative Robot Tool Use with Large Language Models | |
Simple and Effective Multi-Paragraph Reading Comprehension | |
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | |
VeRA: Vector-based Random Matrix Adaptation | |
Open-Ended Learning Leads to Generally Capable Agents | |
Exploring the Boundaries of GPT-4 in Radiology | |
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs | |
High-Dimensional Continuous Control Using Generalized Advantage Estimation | |
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | |
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion | |
Eliciting Human Preferences with Language Models | |
One-Shot Learning from a Demonstration with Hierarchical Latent Language | |
OpenAgents: An Open Platform for Language Agents in the Wild | |
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation | |
Specific versus General Principles for Constitutional AI | |
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality | |
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
Task2Vec: Task Embedding for Meta-Learning | |
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams | |
Tuna: Instruction Tuning using Feedback from Large Language Models | |
In-Context Pretraining: Language Modeling Beyond Document Boundaries | |
Self-Consistency Improves Chain of Thought Reasoning in Language Models | |
Transcending Scaling Laws with 0.1% Extra Compute | |
InstructExcel: A Benchmark for Natural Language Instruction in Excel | |
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing | |
Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning | |
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets | |
Understanding Retrieval Augmentation for Long-Form Question Answering | |
A Neural Conversational Model | |
Exploring the Limits of Language Modeling | |
Scaling Instruction-Finetuned Language Models | |
Learning Performance-Improving Code Edits | |
Training Compute-Optimal Large Language Models | |
Instruction Tuning with GPT-4 | |
Holistic Evaluation of Language Models | |
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | |
Large Language Models as Analogical Reasoners | |
Negative Training for Neural Dialogue Response Generation | |
On the Opportunities and Risks of Foundation Models | |
Dissecting In-Context Learning of Translations in GPTs | |
Carbon Emissions and Large Neural Network Training | |
Faithful Reasoning Using Large Language Models | |
Detecting Pretraining Data from Large Language Models | |
Motif: Intrinsic Motivation from Artificial Intelligence Feedback | |
Unified Language Model Pre-training for Natural Language Understanding and Generation | |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | |
Predictability and Surprise in Large Generative Models | |
Alignment of Language Agents | |
Zephyr: Direct Distillation of LM Alignment | |
Binding Language Models in Symbolic Languages | |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | |
The Evolved Transformer | |
Detecting Hate Speech with GPT-3 | |
Learning to summarize from human feedback | |
Efficient Large Scale Language Modeling with Mixtures of Experts | |
Jailbreaking Black Box Large Language Models in Twenty Queries | |
How do Language Models Bind Entities in Context? | |
Program Synthesis with Large Language Models | |
Challenges in Detoxifying Language Models | |
A Deep Reinforced Model for Abstractive Summarization | |
Moral Foundations of Large Language Models | |
Training Production Language Models without Memorizing User Data | |
A Deep Reinforcement Learning Chatbot | |
RT-1: Robotics Transformer for Real-World Control at Scale | |
Entity Tracking in Language Models | |
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval | |
Controlled Decoding from Language Models | |
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models | |
FP8-LM: Training FP8 Large Language Models | |
The Perils & Promises of Fact-checking with Large Language Models | |
Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)? | |
Unsolved Problems in ML Safety | |
Woodpecker: Hallucination Correction for Multimodal Large Language Models | |
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications | |
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time | |
Data-Centric Financial Large Language Models | |
CodeFusion: A Pre-trained Diffusion Model for Code Generation | |
TRAMS: Training-free Memory Selection for Long-range Language Modeling | |
Personas as a Way to Model Truthfulness in Language Models | |
PockEngine: Sparse and Efficient Fine-tuning in a Pocket | |
LLM-FP4: 4-Bit Floating-Point Quantized Transformers | |
CLEX: Continuous Length Extrapolation for Large Language Models | |
ALCUNA: Large Language Models Meet New Knowledge | |
JudgeLM: Fine-tuned Large Language Models are Scalable Judges | |
Large Language Models as Generalizable Policies for Embodied Tasks | |
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers | |
ControlLLM: Augment Language Models with Tools by Searching on Graphs | |
Linear Representations of Sentiment in Large Language Models | |
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B | |
The Generative AI Paradox: "What It Can Create, It May Not Understand" | |
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving | |
MM-VID: Advancing Video Understanding with GPT-4V(ision) | |
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation | |
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | |
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing | |
ChipNeMo: Domain-Adapted LLMs for Chip Design | |
What's In My Big Data? | |
Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve | |
Idempotent Generative Network | |
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning | |
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | |
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | |
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? | |
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise | |
NEFTune: Noisy Embeddings Improve Instruction Finetuning | |
The Impact of Depth and Width on Transformer Language Model Generalization | |
FlashDecoding++: Faster Large Language Model Inference on GPUs | |
Skywork: A More Open Bilingual Foundation Model | |
GRIM: GRaph-based Interactive narrative visualization for gaMes | |
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery | |
Does GPT-4 Pass the Turing Test? | |
Text Rendering Strategies for Pixel Language Models | |
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling | |
Learning From Mistakes Makes LLM Better Reasoner | |
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning | |
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation | |
Ultra-Long Sequence Distributed Transformer | |
Ziya2: Data-centric Learning is All LLMs Need | |
GLaMM: Pixel Grounding Large Multimodal Model | |
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration | |
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | |
Unveiling Safety Vulnerabilities of Large Language Models | |
Prompt Cache: Modular Attention Reuse for Low-Latency Inference | |
Levels of AGI: Operationalizing Progress on the Path to AGI | |
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model | |
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning | |
Co-training and Co-distillation for Quality Improvement and Compression of Language Models | |
CogVLM: Visual Expert for Pretrained Language Models | |
Tailoring Self-Rationalizers with Multi-Reward Distillation | |
NExT-Chat: An LMM for Chat, Detection and Segmentation | |
The Efficiency Misnomer | |
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion | |
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs | |
Training Dynamics of Contextual N-Grams in Language Models | |
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents | |
Large Language Models Understand and Can be Enhanced by Emotional Stimuli | |
Gzip versus bag-of-words for text classification | |
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models | |
GPT4All: An Ecosystem of Open Source Compressed Language Models | |
Evaluating Large Language Models: A Comprehensive Survey | |
Leveraging Large Language Models for Automated Proof Synthesis in Rust | |
GPTScore: Evaluate as You Desire | |
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | |
S-LoRA: Serving Thousands of Concurrent LoRA Adapters | |
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency | |
Finding Neurons in a Haystack: Case Studies with Sparse Probing | |
Simple and Controllable Music Generation | |
Can LLMs Follow Simple Rules? | |
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM | |
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models | |
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning | |
Memory Augmented Language Models through Mixture of Word Experts | |
Language Models can be Logical Solvers | |
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | |
ADaPT: As-Needed Decomposition and Planning with Language Models | |
FinGPT: Large Generative Models for a Small Language | |
Simplifying Transformer Blocks | |
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs | |
Prompt Engineering a Prompt Engineer | |
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | |
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves | |
Accelerating Large Language Model Decoding with Speculative Sampling | |
Alternating Updates for Efficient Transformers | |
White-Box Transformers via Sparse Rate Reduction | |
ChatAnything: Facetime Chat with LLM-Enhanced Personas | |
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data | |
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 | |
LayoutPrompter: Awaken the Design Ability of Large Language Models | |
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations | |
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation | |
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning | |
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text | |
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | |
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models | |
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer | |
Trusted Source Alignment in Large Language Models | |
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations | |
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks | |
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5? | |
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | |
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure | |
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | |
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming | |
The ART of LLM Refinement: Ask, Refine, and Trust | |
Fine-tuning Language Models for Factuality | |
A Survey on Language Models for Code | |
DiLoCo: Distributed Low-Communication Training of Language Models | |
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks | |
Fusion-Eval: Integrating Evaluators with LLMs | |
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers | |
SiRA: Sparse Mixture of Low Rank Adaptation | |
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives | |
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation | |
UT5: Pretraining Non autoregressive T5 with unrolled denoising | |
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models | |
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying | |
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models | |
Contrastive Chain-of-Thought Prompting | |
Learning to Filter Context for Retrieval-Augmented Generation | |
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery | |
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models | |
System 2 Attention (is something you might need too) | |
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration | |
Language Models are Multilingual Chain-of-Thought Reasoners | |
ProAgent: From Robotic Process Automation to Agentic Process Automation | |
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers | |
Exponentially Faster Language Modelling | |
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 | |
ToolTalk: Evaluating Tool-Usage in a Conversational Setting | |
Testing Language Model Agents Safely in the Wild | |
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort | |
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning | |
Orca 2: Teaching Small Language Models How to Reason | |
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections | |
GPQA: A Graduate-Level Google-Proof Q&A Benchmark | |
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection | |
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning | |
SelfEval: Leveraging the discriminative nature of generative models for evaluation | |
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | |
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework | |
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores | |
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
HiPPO: Recurrent Memory with Optimal Polynomial Projections | |
Transformer Memory as a Differentiable Search Index | |
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | |
DeiT III: Revenge of the ViT | |
Scaling Vision Transformers to 22 Billion Parameters | |
On Calibration of Modern Neural Networks | |
A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks | |
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers | |
Attention Is All You Need | |
Acceleration via Fractal Learning Rate Schedules | |
Transformers learn in-context by gradient descent | |
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models | |
Toy Models of Superposition | |
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | |
Unified Scaling Laws for Routed Language Models | |
CLIPPO: Image-and-Language Understanding from Pixels Only | |
Task-Specific Skill Localization in Fine-tuned Language Models | |
Discovering Latent Knowledge in Language Models Without Supervision | |
OCR-free Document Understanding Transformer | |
Language Models are Few-Shot Learners | |
Progress measures for grokking via mechanistic interpretability | |
Learning Transferable Visual Models From Natural Language Supervision | |
Zero-Shot Text-to-Image Generation | |
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models | |
muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems | |
Language Models as Agent Models | |
Learning Models of Individual Behavior in Chess | |
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning | |
Ask Me Anything: A simple strategy for prompting language models | |
Training language models to follow instructions with human feedback | |
Sequence to Sequence Learning with Neural Networks | |
SegGPT: Segmenting Everything In Context | |
A data-driven approach for learning to control computers | |
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation | |
Unifying Vision, Text, and Layout for Universal Document Processing | |
Memorizing Transformers | |
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling | |
Beyond Memorization: Violating Privacy Via Inference with Large Language Models | |
A Succinct Summary of Reinforcement Learning | |
Symbolic Discovery of Optimization Algorithms | |
Confronting Reward Model Overoptimization with Constrained RLHF | |
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | |
A Cookbook of Self-Supervised Learning | |
Training Language Models with Language Feedback at Scale | |
Answering Questions by Meta-Reasoning over Multiple Chains of Thought | |
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment | |
SemDeDup: Data-efficient learning at web-scale through semantic deduplication | |
Adversarial Examples for Evaluating Reading Comprehension Systems | |
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction | |
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP | |
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning | |
ImageBind: One Embedding Space To Bind Them All | |
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks | |
Scaling Data-Constrained Language Models | |
Efficient LLM Inference on CPUs | |
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models | |
Efficiently Scaling Transformer Inference | |
One Model To Learn Them All | |
Brain decoding: toward real-time reconstruction of visual perception | |
GLU Variants Improve Transformer | |
Vision Transformers with Mixed-Resolution Tokenization | |
HyperNetworks | |
InRank: Incremental Low-Rank Learning | |
Text-to-Image Diffusion Models are Zero-Shot Classifiers | |
CoBIT: A Contrastive Bi-directional Image-Text Generation Model | |
MAGVLT: Masked Generative Vision-and-Language Transformer | |
DINOv2: Learning Robust Visual Features without Supervision | |
What learning algorithm is in-context learning? Investigations with linear models | |
Any-to-Any Generation via Composable Diffusion | |
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | |
Shortformer: Better Language Modeling using Shorter Inputs | |
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity | |
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture | |
PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
The alignment problem from a deep learning perspective | |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | |
Jailbreaking is Best Solved by Definition | |
Multimodal Analogical Reasoning over Knowledge Graphs | |
Segment Everything Everywhere All at Once | |
DocPrompting: Generating Code by Retrieving the Docs | |
Emergent Tool Use From Multi-Agent Autocurricula | |
Root Mean Square Layer Normalization | |
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans | |
Efficient Training of Language Models to Fill in the Middle | |
AI for Mathematics: A Cognitive Science Perspective | |
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators | |
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? | |
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | |
The First Room-Temperature Ambient-Pressure Superconductor | |
Segment Anything | |
Less is More: Parameter-Free Text Classification with Gzip | |
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions | |
A Generalist Agent | |
Meet in the Middle: A New Pre-training Paradigm | |
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations | |
Can Humans Do Less-Than-One-Shot Learning? | |
Diffusion-LM Improves Controllable Text Generation | |
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking | |
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets | |
Text-to-3D using Gaussian Splatting | |
Precise Zero-Shot Dense Retrieval without Relevance Labels | |
Brainformers: Trading Simplicity for Efficiency | |
DETRs Beat YOLOs on Real-time Object Detection | |
OtterHD: A High-Resolution Multi-modality Model | |
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval | |
ConvNets Match Vision Transformers at Scale | |
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models | |
Scaling Robot Learning with Semantically Imagined Experience | |
Do LLMs exhibit human-like response biases? A case study in survey design | |
READ: Recurrent Adaptation of Large Transformers | |
Benchmarking Neural Network Training Algorithms | |
Automatic Gradient Descent: Deep Learning without Hyperparameters | |
Layer Normalization | |
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | |
Implicit Representations of Meaning in Neural Language Models | |
Calibrated Chaos: Variance Between Runs of Neural Network Training is Harmless and Inevitable | |
SqueezeLLM: Dense-and-Sparse Quantization | |
Optimisation & Generalisation in Networks of Neurons | |
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals | |
Transformers as Recognizers of Formal Languages: A Survey on Expressivity | |
The effectiveness of MAE pre-pretraining for billion-scale pretraining | |
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks | |
Decoupled Context Processing for Context Augmented Language Modeling | |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | |
The Transient Nature of Emergent In-Context Learning in Transformers | |
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning | |
Matryoshka Diffusion Models | |
Show Your Work: Scratchpads for Intermediate Computation with Language Models | |
Beyond neural scaling laws: beating power law scaling via data pruning | |
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? | |
Going Deeper with Convolutions | |
TimeGPT-1 | |
Capabilities of GPT-4 on Medical Challenge Problems | |
Training Large Language Models Efficiently with Sparsity and Dataflow | |
Optimal Policies Tend to Seek Power | |
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | |
Thinking Like Transformers | |
Why think step by step? Reasoning emerges from the locality of experience | |
Mixture-of-Experts with Expert Choice Routing | |
GPT-4 Technical Report | |
Scaling Expert Language Models with Unsupervised Domain Discovery | |
End-to-End Spatio-Temporal Action Localisation with Video Transformers | |
Mass-Editing Memory in a Transformer | |
Erasing Concepts from Diffusion Models | |
Physics of Language Models: Part 1, Context-Free Grammar | |
Flamingo: a Visual Language Model for Few-Shot Learning | |
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs | |
Semantic Tokenizer for Enhanced Natural Language Processing | |
On Limitations of the Transformer Architecture | |
A Survey of Large Language Models | |
Affordances from Human Videos as a Versatile Representation for Robotics | |
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | |
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | |
Conditioning Predictive Models: Risks and Strategies | |
Implicit Chain of Thought Reasoning via Knowledge Distillation | |
Scaling Laws for Transfer | |
Risks from Learned Optimization in Advanced Machine Learning Systems | |
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression | |
Bayesian Optimization of Catalysts With In-context Learning | |
Teach LLMs to Phish: Stealing Private Information from Language Models | |
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization | |
Knowledge Graphs | |
Language Modelling with Pixels | |
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization | |
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | |
Chinchilla Scaling: A replication attempt | |
Retrofitting Word Vectors to Semantic Lexicons | |
CoLT5: Faster Long-Range Transformers with Conditional Computation | |
Deep contextualized word representations | |
Boosted Prompt Ensembles for Large Language Models | |
Recurrent Memory Transformer | |
Multitask Prompted Training Enables Zero-Shot Task Generalization | |
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs | |
Monarch: Expressive Structured Matrices for Efficient and Accurate Training | |
On the Turing Completeness of Modern Neural Network Architectures | |
Generalized Out-of-Distribution Detection: A Survey | |
AugGPT: Leveraging ChatGPT for Text Data Augmentation | |
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism | |
SLiC-HF: Sequence Likelihood Calibration with Human Feedback | |
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models | |
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | |
Human-Timescale Adaptation in an Open-Ended Task Space | |
Sigmoid Loss for Language Image Pre-Training | |
OpenScene: 3D Scene Understanding with Open Vocabularies | |
Nougat: Neural Optical Understanding for Academic Documents | |
SoundStorm: Efficient Parallel Audio Generation | |
Text and Code Embeddings by Contrastive Pre-Training | |
Fine-Tuning Language Models from Human Preferences | |
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | |
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models | |
Effective Theory of Transformers at Initialization | |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | |
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models | |
Natural Selection Favors AIs over Humans | |
ART: Automatic multi-step reasoning and tool-use for large language models | |
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection | |
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models | |
Visual Instruction Tuning | |
Efficiently Modeling Long Sequences with Structured State Spaces | |
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges | |
Mastering Diverse Domains through World Models | |
Simplified State Space Layers for Sequence Modeling | |
Offline RL for Natural Language Generation with Implicit Language Q Learning | |
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | |
Deduplicating Training Data Mitigates Privacy Risks in Language Models | |
Self-supervised Learning: Generative or Contrastive | |
Towards Automated Circuit Discovery for Mechanistic Interpretability | |
Neural Story Planning | |
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training | |
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements | |
Dota 2 with Large Scale Deep Reinforcement Learning | |
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability | |
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | |
The Matrix Calculus You Need For Deep Learning | |
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models | |
DeepNet: Scaling Transformers to 1,000 Layers | |
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens | |
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection | |
LLMs cannot find reasoning errors, but can correct them! | |
Pretraining Without Attention | |
Large language models are not zero-shot communicators | |
Semi-supervised Sequence Learning | |
Improving language models by retrieving from trillions of tokens | |
Synthetic Data from Diffusion Models Improves ImageNet Classification | |
Level Generation Through Large Language Models | |
How Does Generative Retrieval Scale to Millions of Passages? | |
State Spaces Aren't Enough: Machine Translation Needs Attention | |
Data Distributional Properties Drive Emergent In-Context Learning in Transformers | |
Evaluating Large Language Models Trained on Code | |
Injecting structural hints: Using language models to study inductive biases in language learning | |
The case for 4-bit precision: k-bit Inference Scaling Laws | |
Divide-or-Conquer? Which Part Should You Distill Your LLM? | |
Downstream Datasets Make Surprisingly Good Pretraining Corpora | |
ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark | |
Fast Transformer Decoding: One Write-Head is All You Need | |
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities | |
Towards Deep Learning Models Resistant to Adversarial Attacks | |
A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards | |
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok | |
Large Language Models as General Pattern Machines | |
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models | |
Fast and forward stable randomized algorithms for linear least-squares problems | |
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training | |
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models | |
Twist Decoding: Diverse Generators Guide Each Other | |
Monolith: Real Time Recommendation System With Collisionless Embedding Table | |
On-Device Training Under 256KB Memory | |
Meta-Learning in Neural Networks: A Survey | |
The Linear Representation Hypothesis and the Geometry of Large Language Models | |
The Power of Scale for Parameter-Efficient Prompt Tuning | |
LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction | |
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention | |
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers | |
GLM: General Language Model Pretraining with Autoregressive Blank Infilling | |
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference | |
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning | |
Spreading vectors for similarity search | |
REFINER: Reasoning Feedback on Intermediate Representations | |
Learning to Learn Faster from Human Feedback with Language Model Predictive Control | |
Low-code LLM: Visual Programming over LLMs | |
Decoding speech perception from non-invasive brain recordings | |
Towards Agile Text Classifiers for Everyone | |
Cramming: Training a Language Model on a Single GPU in One Day | |
Text-to-Table: A New Way of Information Extraction | |
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP | |
WizardLM: Empowering Large Language Models to Follow Complex Instructions | |
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints | |
ViperGPT: Visual Inference via Python Execution for Reasoning | |
Spatial-Language Attention Policies for Efficient Robot Learning | |
Improved Baselines with Visual Instruction Tuning | |
Decision Transformer: Reinforcement Learning via Sequence Modeling | |
What Algorithms can Transformers Learn? A Study in Length Generalization | |
Tracking Everything Everywhere All at Once | |
Bad Global Minima Exist and SGD Can Reach Them | |
Directly Fine-Tuning Diffusion Models on Differentiable Rewards | |
Fine-Tuning LLaMA for Multi-Stage Text Retrieval | |
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | |
EVA-CLIP: Improved Training Techniques for CLIP at Scale | |
Optimizing Memory Mapping Using Deep Reinforcement Learning | |
A General Theoretical Paradigm to Understand Learning from Human Preferences | |
Beyond Words: A Comprehensive Survey of Sentence Representations | |
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training | |
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought | |
Adding Gradient Noise Improves Learning for Very Deep Networks | |
Positional Description Matters for Transformers Arithmetic | |
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? | |
Calibrated Language Models Must Hallucinate | |
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks | |
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement | |
Online Decision Transformer | |
Benchmarking Large Language Models for News Summarization | |
Overthinking the Truth: Understanding how Language Models Process False Demonstrations | |
Scalable Extraction of Training Data from (Production) Language Models | |
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? | |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | |
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization | |
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization | |
Visual In-Context Prompting | |
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models | |
GAIA: a benchmark for General AI Assistants | |
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory | |
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia | |
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text | |
Chain-of-Thought Reasoning is a Policy Improvement Operator | |
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | |
Thinking Fast and Slow in Large Language Models | |
Towards Accurate Differential Diagnosis with Large Language Models | |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces | |
Vanishing Gradients in Reinforcement Finetuning of Language Models | |
The History and Risks of Reinforcement Learning and Human Feedback | |
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning | |
Video Language Planning | |
Thread of Thought Unraveling Chaotic Contexts | |
PaSS: Parallel Speculative Sampling | |
SeaLLMs -- Large Language Models for Southeast Asia | |
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models | |
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models | |
An LLM Compiler for Parallel Function Calling | |
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation | |
WinoGrande: An Adversarial Winograd Schema Challenge at Scale | |
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey | |
Magicoder: Source Code Is All You Need | |
SILC: Improving Vision Language Pretraining with Self-Distillation | |
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models | |
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback | |
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents | |
An Early Evaluation of GPT-4V(ision) | |
Farzi Data: Autoregressive Data Distillation | |
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models | |
One Embedder, Any Task: Instruction-Finetuned Text Embeddings | |
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents | |
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want | |
Towards a Unified View of Parameter-Efficient Transfer Learning | |
Beyond Surface: Probing LLaMA Across Scales and Layers | |
TiC-CLIP: Continual Training of CLIP Models | |
GPT4Point: A Unified Framework for Point-Language Understanding and Generation | |
GOAT: GO to Any Thing | |
Nash Learning from Human Feedback | |
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs | |
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency | |
Axiomatic Preference Modeling for Longform Question Answering | |
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling | |
Efficient Monotonic Multihead Attention | |
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | |
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | |
Are LLMs Useful in the Poorest Schools? theTeacherAI in Sierra Leone | |
De-Diffusion Makes Text a Strong Cross-Modal Interface | |
Dolphins: Multimodal Language Model for Driving | |
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture | |
Efficient Transformer Knowledge Distillation: A Performance Review | |
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs | |
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments | |
Instruction-tuning Aligns LLMs to the Human Brain | |
Large Language Model Alignment: A Survey | |
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities | |
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics | |
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models | |
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs | |
Instruction-Following Evaluation for Large Language Models | |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | |
Pre-Training to Learn in Context | |
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks | |
Large Language Models for Mathematicians | |
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words | |
Language Model Inversion | |
Training Chain-of-Thought via Latent-Variable Inference | |
The Quantization Model of Neural Scaling | |
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses | |
TinyGSM: achieving >80% on GSM8k with small language models | |
Context Tuning for Retrieval Augmented Generation | |
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning | |
TigerBot: An Open Multilingual Multitask LLM | |
PromptBench: A Unified Library for Evaluation of Large Language Models | |
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions | |
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | |
Challenges with unsupervised LLM knowledge discovery | |
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges | |
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning | |
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision | |
Honeybee: Locality-enhanced Projector for Multimodal LLM | |
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation | |
ProTIP: Progressive Tool Retrieval Improves Planning | |
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets | |
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models | |
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding | |
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection | |
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models | |
SparQ Attention: Bandwidth-Efficient LLM Inference | |
Silkie: Preference Distillation for Large Visual Language Models | |
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | |
Algorithmic Collusion by Large Language Models | |
Mathematical Language Models: A Survey | |
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention | |
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | |
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes | |
Pixel Aligned Language Models | |
PathFinder: Guided Search over Multi-Step Reasoning Paths | |
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models | |
Vision-Language Models as a Source of Rewards | |
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | |
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3" | |
Language-Informed Visual Concept Learning | |
Evaluation of Large Language Models for Decision Making in Autonomous Driving | |
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent | |
Extending Context Window of Large Language Models via Semantic Compression | |
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions | |
Formal Aspects of Language Modeling | |
Large Language Models on Graphs: A Comprehensive Survey | |
Merlin:Empowering Multimodal LLMs with Foresight Minds | |
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey | |
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming | |
Generating Illustrated Instructions | |
Alignment for Honesty | |
Paloma: A Benchmark for Evaluating Language Model Fit | |
Self-Evaluation Improves Selective Generation in Large Language Models | |
Nomic Embed: Training a Reproducible Long Context Text Embedder | |
Rejuvenating image-GPT as Strong Visual Representation Learners | |
Object Recognition as Next Token Prediction | |
Foundation Models in Robotics: Applications, Challenges, and the Future | |
Distributed Inference and Fine-tuning of Large Language Models Over The Internet | |
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning | |
Data Management For Large Language Models: A Survey | |
AtP*: An efficient and scalable method for localizing LLM behaviour to components | |
Knowledge Distillation of Large Language Models | |
Faithful Persona-based Conversational Dataset Generation with Large Language Models | |
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! | |
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks | |
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism | |
Localized Symbolic Knowledge Distillation for Visual Commonsense Models | |
Weight subcloning: direct initialization of transformers using larger pretrained ones | |
Segment and Caption Anything | |
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation | |
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models | |
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | |
OneLLM: One Framework to Align All Modalities with Language | |
Steering Llama 2 via Contrastive Activation Addition | |
VILA: On Pre-training for Visual Language Models | |
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions | |
HyperAttention: Long-context Attention in Near-Linear Time | |
LLM360: Towards Fully Transparent Open-Source LLMs | |
Efficient Transformers with Dynamic Token Pooling | |
GIVT: Generative Infinite-Vocabulary Transformers | |
Modeling Context in Referring Expressions | |
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes | |
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise | |
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model | |
Text-Conditioned Resampler For Long Form Video Understanding | |
Gemini: A Family of Highly Capable Multimodal Models | |
LLM in a flash: Efficient Large Language Model Inference with Limited Memory | |
Cascade Speculative Drafting for Even Faster LLM Inference | |
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model | |
VideoPoet: A Large Language Model for Zero-Shot Video Generation | |
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | |
AppAgent: Multimodal Agents as Smartphone Users | |
Time is Encoded in the Weights of Finetuned Language Models | |
Generative Multimodal Models are In-Context Learners | |
Cached Transformers: Improving Transformers with Differentiable Memory Cache | |
Mini-GPTs: Efficient Large Language Models through Contextual Pruning | |
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | |
An In-depth Look at Gemini's Language Abilities | |
Retrieval-Augmented Generation for Large Language Models: A Survey | |
Intriguing Properties of Quantization at Scale | |
Parrot Captions Teach CLIP to Spot Text | |
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math | |
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | |
YAYI 2: Multilingual Open-Source Large Language Models | |
Reasons to Reject? Aligning Language Models with Judgments | |
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation | |
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding | |
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion | |
Exploiting Novel GPT-4 APIs | |
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
VCoder: Versatile Vision Encoders for Multimodal Large Language Models | |
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models | |
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases | |
LLM4VG: Large Language Models Evaluation for Video Grounding | |
Shai: A large language model for asset management | |
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation | |
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment | |
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 | |
Supervised Knowledge Makes Large Language Models Better In-context Learners | |
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | |
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases | |
The LLM Surgeon | |
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action | |
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices | |
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | |
Task Contamination: Language Models May Not Be Few-Shot Anymore | |
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training | |
Learning Vision from Models Rivals Learning Vision from Data | |
TinyLlama: An Open-Source Small Language Model | |
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models | |
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation | |
Making Large Language Models A Better Foundation For Dense Retrieval | |
LARP: Language-Agent Role Play for Open-World Games | |
A Survey of Reasoning with Foundation Models | |
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape | |
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs | |
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks | |
Towards the Law of Capacity Gap in Distilling Language Models | |
At Which Training Stage Does Code Data Help LLMs Reasoning? | |
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve | |
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery | |
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition | |
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers | |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning | |
A Comprehensive Study of Knowledge Editing for Large Language Models | |
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM | |
Orion-14B: Open-source Multilingual Large Language Models | |
LLaMA Beyond English: An Empirical Study on Language Capability Transfer | |
DocLLM: A layout-aware generative language model for multimodal document understanding | |
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training | |
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | |
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models | |
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models | |
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws | |
GeoGalactica: A Scientific Large Language Model in Geoscience | |
Improving Text Embeddings with Large Language Models | |
Boosting Large Language Model for Speech Synthesis: An Empirical Study | |
TrustLLM: Trustworthiness in Large Language Models | |
Unicron: Economizing Self-Healing LLM Training at Scale | |
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | |
Proving Test Set Contamination in Black Box Language Models | |
LLaMA Pro: Progressive LLaMA with Block Expansion | |
LLM Augmented LLMs: Expanding Capabilities through Composition | |
LLaVA-$φ$: Efficient Multi-Modal Assistant with Small Language Model | |
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers | |
Understanding LLMs: A Comprehensive Overview from Training to Inference | |
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers | |
A Vision Check-up for Language Models | |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | |
Multilingual Instruction Tuning With Just a Pinch of Multilinguality | |
WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope | |
GPT-4V(ision) is a Generalist Web Agent, if Grounded | |
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs | |
Mind2Web: Towards a Generalist Agent for the Web | |
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism | |
DocGraphLM: Documental Graph Language Model for Information Extraction | |
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache | |
TOFU: A Task of Fictitious Unlearning for LLMs | |
Transformers are Multi-State RNNs | |
Secrets of RLHF in Large Language Models Part II: Reward Modeling | |
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | |
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages | |
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism | |
Towards Conversational Diagnostic AI | |
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | |
Efficient LLM inference solution on Intel GPU | |
I am a Strange Dataset: Metalinguistic Tests for Language Models | |
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | |
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models | |
The Impact of Reasoning Step Length on Large Language Models | |
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | |
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding | |
Mixtral of Experts | |
ChatQA: Building GPT-4 Level Conversational QA Models | |
TeleChat Technical Report | |
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models | |
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon | |
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding | |
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach | |
MaLA-500: Massive Language Adaptation of Large Language Models | |
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks | |
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion? | |
State of What Art? A Call for Multi-Prompt LLM Evaluation | |
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting | |
Compressing Context to Enhance Inference Efficiency of Large Language Models | |
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks | |
VMamba: Visual State Space Model | |
DiffusionGPT: LLM-Driven Text-to-Image Generation System | |
Self-Rewarding Language Models | |
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | |
Asynchronous Local-SGD Training for Language Modeling | |
ReFT: Reasoning with Reinforced Fine-Tuning | |
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | |
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference | |
Tuning Language Models by Proxy | |
Scalable Pre-training of Large Autoregressive Image Models | |
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation | |
Extending LLMs' Context Window with 100 Samples | |
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models | |
SPADE: Synthesizing Assertions for Large Language Model Pipelines | |
Foundations of Vector Retrieval | |
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation | |
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads | |
Evaluating the Moral Beliefs Encoded in LLMs | |
Boosting Theory-of-Mind Performance in Large Language Models via Prompting | |
MambaByte: Token-free Selective State Space Model | |
RakutenAI-7B: Extending Large Language Models for Japanese | |
MM-LLMs: Recent Advances in MultiModal Large Language Models | |
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents | |
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding | |
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study | |
Small Language Model Meets with Reinforced Vision Vocabulary | |
WARM: On the Benefits of Weight Averaged Reward Models | |
In-Context Learning for Extreme Multi-Label Classification | |
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | |
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text | |
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark | |
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | |
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | |
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion | |
What Are Tools Anyway? A Survey from the Language Model Perspective | |
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models | |
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection | |
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment | |
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation | |
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | |
Mission: Impossible Language Models | |
Benchmarking LLMs via Uncertainty Quantification | |
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models | |
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering | |
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence | |
H2O-Danube-1.8B Technical Report | |
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | |
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion | |
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI | |
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models | |
Representation Engineering: A Top-Down Approach to AI Transparency | |
LongAlign: A Recipe for Long Context Alignment of Large Language Models | |
Scavenging Hyena: Distilling Transformers into Long Convolution Models | |
Efficient Tool Use with Chain-of-Abstraction Reasoning | |
YOLO-World: Real-Time Open-Vocabulary Object Detection | |
Weaver: Foundation Models for Creative Writing | |
Weak-to-Strong Jailbreaking on Large Language Models | |
Transfer Learning for Text Diffusion Models | |
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis | |
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives | |
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | |
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling | |
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception | |
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models | |
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty | |
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture | |
Watermarking Makes Language Models Radioactive | |
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities | |
SliceGPT: Compress Large Language Models by Deleting Rows and Columns | |
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support | |
Generative Expressive Robot Behaviors using Large Language Models | |
Efficient Exploration for LLMs | |
Can Large Language Models Understand Context? | |
SymbolicAI: A framework for logic-based approaches combining generative models and solvers | |
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? | |
OLMo: Accelerating the Science of Language Models | |
Tree Prompting: Efficient Task Adaptation without Fine-Tuning | |
CroissantLLM: A Truly Bilingual French-English Language Model | |
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model | |
Transforming and Combining Rewards for Aligning Large Language Models | |
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models | |
Scaling Laws for Downstream Task Performance of Large Language Models | |
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research | |
Seven Failure Points When Engineering a Retrieval Augmented Generation System | |
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | |
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks | |
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations | |
Multi-line AI-assisted Code Authoring | |
Self-Discover: Large Language Models Self-Compose Reasoning Structures | |
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | |
Training-Free Consistent Text-to-Image Generation | |
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | |
Shortened LLaMA: A Simple Depth Pruning for Large Language Models | |
Rethinking Optimization and Architecture for Tiny Language Models | |
LiPO: Listwise Preference Optimization through Learning-to-Rank | |
BlackMamba: Mixture of Experts for State-Space Models | |
Rethinking Interpretability in the Era of Large Language Models | |
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | |
TravelPlanner: A Benchmark for Real-World Planning with Language Agents | |
K-Level Reasoning with Large Language Models | |
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | |
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models | |
Specialized Language Models with Cheap Inference from Limited Domain Data | |
Repeat After Me: Transformers are Better than State Space Models at Copying | |
A Survey on Hallucination in Large Vision-Language Models | |
Corrective Retrieval Augmented Generation | |
A Comprehensive Survey of Compression Algorithms for Language Models | |
Leveraging Large Language Models for NLG Evaluation: A Survey | |
The Power of Noise: Redefining Retrieval for RAG Systems | |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | |
Red Teaming Visual Language Models | |
Knowledge Fusion of Large Language Models | |
A Survey of Resource-efficient LLM and Multimodal Foundation Models | |
Lexinvariant Language Models | |
Noise2Music: Text-conditioned Music Generation with Diffusion Models | |
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery | |
Mathematical Capabilities of ChatGPT | |
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation | |
Large Language Models for Mathematical Reasoning: Progresses and Challenges | |
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models | |
Driving Everywhere with Large Language Model Policy Adaptation | |
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | |
SpiRit-LM: Interleaved Spoken and Written Language Model | |
Multilingual E5 Text Embeddings: A Technical Report | |
In-Context Principle Learning from Mistakes | |
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains | |
Hydragen: High-Throughput LLM Inference with Shared Prefixes | |
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay | |
Fast Timing-Conditioned Latent Audio Diffusion | |
Direct Language Model Alignment from Online AI Feedback | |
Grandmaster-Level Chess Without Search | |
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | |
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | |
Tandem Transformers for Inference Efficient LLMs | |
World Model on Million-Length Video And Language With RingAttention | |
Lumos : Empowering Multimodal LLMs with Scene Text Recognition | |
Suppressing Pink Elephants with Direct Principle Feedback | |
Policy Improvement using Language Feedback Models | |
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | |
Scaling Laws for Fine-Grained Mixture of Experts | |
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | |
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model | |
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts | |
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping | |
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement | |
ODIN: Disentangled Reward Mitigates Hacking in RLHF | |
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting | |
A Tale of Tails: Model Collapse as a Change of Scaling Laws | |
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | |
Generative Representational Instruction Tuning | |
ChemLLM: A Chemical Large Language Model | |
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning | |
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | |
DeAL: Decoding-time Alignment for Large Language Models | |
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling | |
SubGen: Token Generation in Sublinear Time and Memory | |
Keyframer: Empowering Animation Design using Large Language Models | |
Large Language Model for Table Processing: A Survey | |
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | |
Approaching Human-Level Forecasting with Language Models | |
A phase transition between positional and semantic learning in a solvable model of dot-product attention | |
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning | |
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks | |
Large Language Model based Multi-Agents: A Survey of Progress and Challenges | |
Premise Order Matters in Reasoning with Large Language Models | |
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment | |
Chain-of-Thought Reasoning Without Prompting | |
BitDelta: Your Fine-Tune May Only Be Worth One Bit | |
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | |
Data Engineering for Scaling Language Models to 128K Context | |
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization | |
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts | |
How to Train Data-Efficient LLMs | |
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects | |
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers | |
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency | |
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents | |
Arrows of Time for Large Language Models | |
Coercing LLMs to do and reveal (almost) anything | |
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | |
Speculative Streaming: Fast LLM Inference without Auxiliary Models | |
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting | |
User-LLM: Efficient LLM Contextualization with User Embeddings | |
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models | |
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization | |
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts | |
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | |
Instruction-tuned Language Models are Better Knowledge Learners | |
The FinBen: An Holistic Financial Benchmark for Large Language Models | |
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling | |
The boundary of neural network trainability is fractal | |
Reformatted Alignment | |
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | |
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | |
OneBit: Towards Extremely Low-bit Large Language Models | |
CoLLaVO: Crayon Large Language and Vision mOdel | |
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models | |
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements | |
RLVF: Learning from Verbal Feedback without Overgeneralization | |
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss | |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models | |
Efficient Guided Generation for Large Language Models | |
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention | |
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models | |
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling | |
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows | |
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing | |
Generative Language Modeling for Automated Theorem Proving | |
Automated Unit Test Improvement using Large Language Models at Meta | |
LLM Agents can Autonomously Hack Websites | |
Large Language Models: A Survey | |
In-Context Retrieval-Augmented Language Models | |
Consolidating Attention Features for Multi-view Image Editing | |
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models | |
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | |
Scaling Up LLM Reviews for Google Ads Content Moderation | |
Subobject-level Image Tokenization | |
TinyLLaVA: A Framework of Small-scale Large Multimodal Models | |
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming | |
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models | |
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons | |
EvoPrompting: Language Models for Code-Level Neural Architecture Search | |
Goal Driven Discovery of Distributional Differences via Language Descriptions | |
ChatMusician: Understanding and Generating Music Intrinsically with LLM | |
GPTVQ: The Blessing of Dimensionality for LLM Quantization | |
FuseChat: Knowledge Fusion of Chat Models | |
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | |
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | |
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | |
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition | |
Large Language Models for Data Annotation: A Survey | |
LoRA+: Efficient Low Rank Adaptation of Large Models | |
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | |
Towards Optimal Learning of Language Models | |
Evaluating Very Long-Term Conversational Memory of LLM Agents | |
Training-Free Long-Context Scaling of Large Language Models | |
Disentangled 3D Scene Generation with Layout Learning | |
Do Large Language Models Latently Perform Multi-Hop Reasoning? | |
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts | |
Nemotron-4 15B Technical Report | |
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding | |
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding | |
Towards Open-ended Visual Quality Comparison | |
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method | |
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT | |
Orca-Math: Unlocking the potential of SLMs in Grade School Math | |
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | |
MOSAIC: A Modular System for Assistive and Interactive Cooking | |
Priority Sampling of Large Language Models for Compilers | |
Simple linear attention language models balance the recall-throughput tradeoff | |
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access | |
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models | |
StarCoder 2 and The Stack v2: The Next Generation | |
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models | |
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs | |
Simulacra as Conscious Exotica | |
Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence | |
Enhancing Vision-Language Pre-training with Rich Supervisions | |
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | |
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs | |
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters | |
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | |
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | |
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey | |
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? | |
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | |
Emergent and Predictable Memorization in Large Language Models | |
Design2Code: How Far Are We From Automating Front-End Engineering? | |
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models | |
MathScale: Scaling Instruction Tuning for Mathematical Reasoning | |
Empowering Large Language Model Agents through Action Learning | |
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use | |
RT-H: Action Hierarchies Using Language | |
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models | |
Resonance RoPE: Improving Context Length Generalization of Large Language Models | |
Datasets for Large Language Models: A Comprehensive Survey | |
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models | |
Do Efficient Transformers Really Save Computation? | |
MathPrompter: Mathematical Reasoning using Large Language Models | |
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT | |
Can Large Language Models Reason and Plan? | |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | |
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | |
Common 7B Language Models Already Possess Strong Math Capabilities | |
Yi: Open Foundation Models by 01.AI | |
Teaching Large Language Models to Reason with Reinforcement Learning | |
SaulLM-7B: A pioneering Large Language Model for Law | |
Online Adaptation of Language Models with a Memory of Amortized Contexts | |
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | |
Learning to Decode Collaboratively with Multiple Language Models | |
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect | |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | |
The Unreasonable Effectiveness of Eccentric Automatic Prompts | |
A Survey on Evaluation of Large Language Models | |
The pitfalls of next-token prediction | |
Stealing Part of a Production Language Model | |
Algorithmic progress in language models | |
Thinking Tokens for Language Modeling | |
Is Cosine-Similarity of Embeddings Really About Similarity? | |
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment | |
Can't Remember Details in Long Documents? You Need Some R&R | |
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | |
Retrieval-Augmented Generation for AI-Generated Content: A Survey | |
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History | |
3D-VLA: A 3D Vision-Language-Action Generative World Model | |
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | |
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | |
GPT on a Quantum Computer | |
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding | |
GiT: Towards Generalist Vision Transformer through Universal Language Interface | |
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences | |
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring | |
Social Skill Training with Large Language Models | |
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control | |
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset | |
Veagle: Advancements in Multimodal Representation Learning | |
Simple and Scalable Strategies to Continually Pre-train Large Language Models | |
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents | |
Language models scale reliably with over-training and on downstream tasks | |
Gemma: Open Models Based on Gemini Research and Technology | |
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | |
On the Societal Impact of Open Foundation Models | |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | |
Chronos: Learning the Language of Time Series | |
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | |
ORPO: Monolithic Preference Optimization without Reference Model | |
MoAI: Mixture of All Intelligence for Large Language and Vision Models | |
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | |
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU | |
DeepSeek-VL: Towards Real-World Vision-Language Understanding | |
How Far Are We from Intelligent Visual Deductive Reasoning? | |
Small Models are Valuable Plug-ins for Large Language Models | |
Backtracing: Retrieving the Cause of the Query | |
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies | |
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | |
Learning to Generate Better Than Your LLM | |
Meta-in-context learning in large language models | |
LERF: Language Embedded Radiance Fields | |
Eliciting Latent Predictions from Transformers with the Tuned Lens | |
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | |
Resurrecting Recurrent Neural Networks for Long Sequences | |
An Overview on Language Models: Recent Developments and Outlook | |
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | |
A Survey of Evaluation Metrics Used for NLG Systems | |
SummEval: Re-evaluating Summarization Evaluation | |
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | |
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | |
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | |
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | |
Logits of API-Protected LLMs Leak Proprietary Information | |
Knowledge Conflicts for LLMs: A Survey | |
Revolutionizing Mental Health Care through LangChain: A Journey with a Large Language Model | |
Will GPT-4 Run DOOM? | |
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | |
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models | |
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization | |
Large language models surpass human experts in predicting neuroscience results | |
Reliable, Adaptable, and Attributable Language Models with Retrieval | |
You Need to Pay Better Attention | |
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval | |
Stable LM 2 1.6B Technical Report | |
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation | |
A Survey on Data Selection for Language Models | |
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails | |
Repetition Improves Language Model Embeddings | |
How Transformers Learn Causal Structure with Gradient Descent | |
Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models | |
Analysing The Impact of Sequence Composition on Language Model Pre-Training | |
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | |
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models | |
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models | |
Bayesian Reward Models for LLM Alignment | |
KMMLU: Measuring Massive Multitask Language Understanding in Korean | |
Dissecting Human and LLM Preferences | |
Exploring Value Biases: How LLMs Deviate Towards the Ideal | |
Do Llamas Work in English? On the Latent Language of Multilingual Transformers | |
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models | |
Why are Sensitive Functions Hard for Transformers? | |
Agents Need Not Know Their Purpose | |
Copyright Traps for Large Language Models | |
DoRA: Weight-Decomposed Low-Rank Adaptation | |
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks | |
Rethinking Machine Unlearning for Large Language Models | |
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | |
Improving Black-box Robustness with In-Context Rewriting | |
Secret Collusion Among Generative AI Agents | |
Natural Language Reinforcement Learning | |
Universal Neural Functionals | |
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks | |
LESS: Selecting Influential Data for Targeted Instruction Tuning | |
Building Your Own Product Copilot: Challenges, Opportunities, and Needs | |
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs | |
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache | |
Continual Learning for Large Language Models: A Survey | |
Towards Efficient and Exact Optimization of Language Model Alignment | |
HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction | |
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP | |
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | |
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding | |
Spike No More: Stabilizing the Pre-training of Large Language Models | |
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems | |
Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention | |
Zoology: Measuring and Improving Recall in Efficient Language Models | |
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer | |
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | |
LoBaSS: Gauging Learnability in Supervised Fine-tuning Data | |
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective | |
Instruction Tuning with Human Curriculum | |
MatFormer: Nested Transformer for Elastic Inference | |
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | |
xVal: A Continuous Number Encoding for Large Language Models | |
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models | |
Human Feedback is not Gold Standard | |
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation | |
Headless Language Models: Learning without Predicting with Contrastive Weight Tying | |
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models | |
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | |
Do language models plan ahead for future tokens? | |
CAME: Confidence-guided Adaptive Memory Efficient Optimization | |
Improving Language Plasticity via Pretraining with Active Forgetting | |
AdANNS: A Framework for Adaptive Semantic Search | |
Strategic Reasoning with Language Models | |
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies | |
Sparse is Enough in Scaling Transformers | |
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback | |
A Theory on Adam Instability in Large-Scale Machine Learning | |
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning | |
Are Language Models Worse than Humans at Following Prompts? It's Complicated | |
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition | |
Transformer Language Models without Positional Encodings Still Learn Positional Information | |
Sequence Parallelism: Long Sequence Training from System Perspective | |
Bio-inspired Structure Identification in Language Embeddings | |
Transformers without Tears: Improving the Normalization of Self-Attention | |
Neural Text Generation with Unlikelihood Training | |
MASS: Masked Sequence to Sequence Pre-training for Language Generation | |
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs | |
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | |
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | |
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs | |
TnT-LLM: Text Mining at Scale with Large Language Models | |
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | |
Larimar: Large Language Models with Episodic Memory Control | |
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images | |
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | |
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data | |
PERL: Parameter Efficient Reinforcement Learning from Human Feedback | |
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding | |
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | |
RAFT: Adapting Language Model to Domain Specific RAG | |
Recurrent Drafter for Fast Speculative Decoding in Large Language Models | |
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations | |
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | |
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews | |
Language Agents as Optimizable Graphs | |
Comparative Study of Large Language Model Architectures on Frontier | |
Optimizing Distributed Training on Frontier for Large Language Models | |
Striped Attention: Faster Ring Attention for Causal Transformers | |
Block-Recurrent Transformers | |
Addressing Some Limitations of Transformers with Feedback Memory | |
Reverse Training to Nurse the Reversal Curse | |
Evaluating Frontier Models for Dangerous Capabilities | |
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model | |
When Do We Not Need Larger Vision Models? | |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | |
Towards 3D Molecule-Text Interpretation in Language Models | |
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis | |
Mixture of Soft Prompts for Controllable Data Generation | |
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models | |
Evolutionary Optimization of Model Merging Recipes | |
Semiparametric Token-Sequence Co-Supervision | |
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | |
On Learning to Summarize with Large Language Models as References | |
Scalable Prompt Generation for Semi-supervised Learning with Language Models | |
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models | |
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | |
MyVLM: Personalizing VLMs for User-Specific Queries | |
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | |
Recourse for reclamation: Chatting with generative language models | |
On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial | |
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging | |
The MiniPile Challenge for Data-Efficient Language Models | |
OmniNet: Omnidirectional Representations from Transformers | |
Arcee's MergeKit: A Toolkit for Merging Large Language Models | |
FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications | |
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces | |
The Case for Co-Designing Model Architectures with Hardware | |
The Unreasonable Ineffectiveness of the Deeper Layers | |
Improving Text-to-Image Consistency via Automatic Prompt Optimization | |
InternLM2 Technical Report | |
AIOS: LLM Agent Operating System | |
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression | |
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | |
Can large language models explore in-context? | |
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series | |
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | |
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | |
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models | |
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | |
VidLA: Video-Language Alignment at Scale | |
Compiler generated feedback for Large Language Models | |
sDPO: Don't Use Your Data All at Once | |
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare | |
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners | |
LLM4Decompile: Decompiling Binary Code with Large Language Models | |
Getting the most out of your tokenizer for pre-training and domain adaptation | |
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese | |
Wider and Deeper LLM Networks are Fairer LLM Evaluators | |
Editing Large Language Models: Problems, Methods, and Opportunities | |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
Long-form factuality in large language models | |
Towards a World-English Language Model for On-Device Virtual Assistants | |
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | |
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling | |
STaR-GATE: Teaching Language Models to Ask Clarifying Questions | |
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding | |
LITA: Language Instructed Temporal-Localization Assistant | |
TextCraftor: Your Text Encoder Can be Image Quality Controller | |
Mechanistic Design and Scaling of Hybrid Architectures | |
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines | |
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore | |
Blockwise Parallel Transformer for Large Context Models | |
Large Language Models Can Be Strong Differentially Private Learners | |
Head-wise Shareable Attention for Large Language Models | |
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | |
ReALM: Reference Resolution As Language Modeling | |
Gecko: Versatile Text Embeddings Distilled from Large Language Models | |
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs | |
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer | |
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | |
DiJiang: Efficient Large Language Models through Compact Kernelization | |
Jamba: A Hybrid Transformer-Mamba Language Model | |
Localizing Paragraph Memorization in Language Models | |
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction | |
Group Preference Optimization: Few-Shot Alignment of Large Language Models | |
Communicative Agents for Software Development | |
Preference Ranking Optimization for Human Alignment | |
The CRINGE Loss: Learning what language not to model | |
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action | |
Attribute First, then Generate: Locally-attributable Grounded Text Generation | |
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models | |
FABLES: Evaluating faithfulness and content selection in book-length summarization | |
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward | |
WavLLM: Towards Robust and Adaptive Speech Large Language Model | |
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text | |
ST-LLM: Large Language Models Are Effective Temporal Learners | |
Advancing LLM Reasoning Generalists with Preference Trees | |
Best Practices and Lessons Learned on Synthetic Data for Language Models | |
Long-context LLMs Struggle with Long In-context Learning | |
HyperCLOVA X Technical Report | |
Poro 34B and the Blessing of Multilinguality | |
Octopus v2: On-device language model for super agent | |
Are large language models superhuman chemists? | |
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model | |
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course | |
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | |
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models | |
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models | |
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers | |
Auxiliary task demands mask the capabilities of smaller language models | |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks | |
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity | |
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? | |
Data Interpreter: An LLM Agent For Data Science | |
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | |
Training LLMs over Neurally Compressed Text | |
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | |
ReFT: Representation Finetuning for Language Models | |
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models | |
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | |
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? | |
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis | |
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models | |
Noise-Aware Training of Layout-Aware Language Models | |
AI and the Problem of Knowledge Collapse | |
Learning to Plan and Generate Text with Citations | |
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models | |
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models | |
ALOHa: A New Measure for Hallucination in Captioning Models | |
Efficient Multi-Vector Dense Retrieval Using Bit Vectors | |
Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization | |
Iterative Forward Tuning Boosts In-context Learning in Language Models | |
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | |
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | |
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues | |
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences | |
Stream of Search (SoS): Learning to Search in Language | |
Large Product Key Memory for Pretrained Language Models | |
Large Memory Layers with Product Keys | |
BRAVE: Broadening the visual encoding of vision-language models | |
Adapting LLaMA Decoder to Vision Transformer | |
RULER: What's the Real Context Size of Your Long-Context Language Models? | |
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models | |
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD | |
Reconstructing Hand-Held Objects in 3D | |
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | |
MuPT: A Generative Symbolic Music Pretrained Transformer | |
OmniFusion Technical Report | |
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
CodecLM: Aligning Language Models with Tailored Synthetic Data | |
SambaLingo: Teaching Large Language Models New Languages | |
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | |
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | |
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | |
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models | |
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws | |
Koala: Key frame-conditioned long video-LLM | |
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models | |
Understanding Emergent Abilities of Language Models from the Loss Perspective | |
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code | |
Making Large Language Models Better Data Creators | |
On Surgical Fine-tuning for Language Encoders | |
AdaLomo: Low-memory Optimization with Adaptive Learning Rate | |
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | |
Embedding Democratic Values into Social Media AIs via Societal Objective Functions | |
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning | |
Less is More: Selective Layer Finetuning with SubTuning | |
Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning | |
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling | |
Cut the CARP: Fishing for zero-shot story evaluation | |
LLoCO: Learning Long Contexts Offline | |
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | |
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | |
Rho-1: Not All Tokens Are What You Need | |
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | |
Audio Dialogues: Dialogues dataset for audio and music understanding | |
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples | |
JetMoE: Reaching Llama2 Performance with 0.1M Dollars | |
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents | |
Entity-Level Sentiment Analysis (ELSA): An exploratory task survey | |
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models | |
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | |
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators | |
Mechanics of Next Token Prediction with Self-Attention | |
Scaling Laws of RoPE-based Extrapolation | |
Pre-training Small Base LMs with Fewer Tokens | |
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies | |
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck | |
THOUGHTSCULPT: Reasoning with Intermediate Revision and Search | |
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers | |
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data | |
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought | |
Toward a Theory of Tokenization in LLMs | |
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | |
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca | |
Learn Your Reference Model for Real Good Alignment | |
Large Language Models are as persuasive as humans, but why? About the cognitive effort and moral-emotional language of LLM arguments | |
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models | |
TransformerFAM: Feedback attention is working memory | |
On Speculative Decoding for Multimodal Large Language Models | |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length | |
Generative Disco: Text-to-Video Generation for Music Visualization | |
Self-playing Adversarial Language Game Enhances LLM Reasoning | |
Compression Represents Intelligence Linearly | |
The Illusion of State in State-Space Models | |
ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past | |
A Thorough Examination of Decoding Methods in the Era of LLMs | |
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA | |
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? | |
Should You Mask 15% in Masked Language Modeling? | |
Finetuning Pretrained Transformers into RNNs | |
BLINK: Multimodal Large Language Models Can See but Not Perceive | |
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | |
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment | |
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | |
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data | |
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | |
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | |
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | |
Fewer Truncations Improve Language Modeling | |
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection | |
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity | |
Many-Shot In-Context Learning | |
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning | |
Exploring the landscape of large language models: Foundations, techniques, and challenges | |
Automated Social Science: Language Models as Scientist and Subjects | |
Language Models Still Struggle to Zero-shot Reason about Time Series | |
Stepwise Alignment for Constrained Language Model Policy Optimization | |
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents | |
Language Imbalance Can Boost Cross-lingual Generalisation | |
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge | |
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | |
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency | |
TextSquare: Scaling up Text-Centric Visual Instruction Tuning | |
Large Language Models are Few-Shot Health Learners | |
How Far Can We Go with Practical Function-Level Program Repair? | |
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation | |
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | |
A Survey on Retrieval-Augmented Text Generation for Large Language Models | |
A RAG Method for Source Code Inquiry Tailored to Long-Context LLMs | |
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior | |
State Space Model for New-Generation Network Alternative to Transformers: A Survey | |
LLM In-Context Recall is Prompt Dependent | |
Reducing hallucination in structured outputs via Retrieval-Augmented Generation | |
Towards Large Language Models as Copilots for Theorem Proving in Lean | |
Characterizing LLM Abstention Behavior in Science QA with Context Perturbations | |
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function | |
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences | |
Aligning language models with human preferences | |
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding | |
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation | |
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation | |
RAR-b: Reasoning as Retrieval Benchmark | |
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models | |
Deep Reinforcement Learning with a Natural Language Action Space | |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | |
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study | |
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions | |
FlowMind: Automatic Workflow Generation with LLMs | |
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference | |
DataComp: In search of the next generation of multimodal datasets | |
Stable and low-precision training for large-scale vision-language models | |
Multi-Head Mixture-of-Experts | |
Transformers Can Represent $n$-gram Language Models | |
Pegasus-v1 Technical Report | |
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs | |
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework | |
SnapKV: LLM Knows What You are Looking for Before Generation | |
SpaceByte: Towards Deleting Tokenization from Large Language Modeling | |
A Survey on Self-Evolution of Large Language Models | |
Retrieval Head Mechanistically Explains Long-Context Factuality | |
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels | |
SPLATE: Sparse Late Interaction Retrieval | |
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models | |
AgentKit: Flow Engineering with Graphs, not Coding | |
Rethinking LLM Memorization through the Lens of Adversarial Compression | |
What's the Magic Word? A Control Theory of LLM Prompting | |
Adapting Language Models to Compress Contexts | |
Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design | |
LMentry: A Language Model Benchmark of Elementary Language Tasks | |
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | |
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners | |
Graph Machine Learning in the Era of Large Language Models (LLMs) | |
NExT: Teaching Large Language Models to Reason about Code Execution | |
"If the Machine Is As Good As Me, Then What Use Am I?" -- How the Use of ChatGPT Changes Young Professionals' Perception of Productivity and Accomplishment | |
Can Language Models Solve Olympiad Programming? | |
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs | |
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models | |
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | |
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | |
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving | |
Make Your LLM Fully Utilize the Context | |
Weak-to-Strong Extrapolation Expedites Alignment | |
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | |
Continual Learning of Large Language Models: A Comprehensive Survey | |
PuLID: Pure and Lightning ID Customization via Contrastive Alignment | |
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding | |
Tele-FLM Technical Report | |
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning | |
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | |
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models | |
MoDE: CLIP Data Experts via Clustering | |
Universal Adversarial Triggers Are Not Universal | |
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models | |
Improving Dictionary Learning with Gated Sparse Autoencoders | |
BASS: Batched Attention-optimized Speculative Sampling | |
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models | |
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data | |
Image Segmentation Using Text and Image Prompts | |
Holistic Safety and Responsibility Evaluations of Advanced AI Models | |
WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction | |
NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models | |
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | |
Efficient Continual Pre-training for Building Domain Specific Large Language Models | |
DeLighT: Deep and Light-weight Transformer | |
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically | |
GeckOpt: LLM System Efficiency via Intent-Based Tool Selection | |
Better Synthetic Data by Retrieving and Transforming Existing Datasets | |
Relational Graph Convolutional Networks for Sentiment Analysis | |
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | |
Foundational Challenges in Assuring Alignment and Safety of Large Language Models | |
Nyonic Technical Report | |
LLM Evaluators Recognize and Favor Their Own Generations | |
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning | |
A Survey of Generative Search and Recommendation in the Era of Large Language Models | |
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs | |
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections | |
A Primer on the Inner Workings of Transformer-based Language Models | |
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF | |
zkLLM: Zero Knowledge Proofs for Large Language Models | |
A Survey on the Memory Mechanism of Large Language Model based Agents | |
Large Language Model Agent as a Mechanical Designer | |
Near to Mid-term Risks and Opportunities of Open Source Generative AI | |
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks | |
Benchmarking Mobile Device Control Agents across Diverse Configurations | |
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark | |
Assessing The Potential Of Mid-Sized Language Models For Clinical QA | |
Conformal Prediction for Natural Language Processing: A Survey | |
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision | |
Dual Modalities of Text: Visual and Textual Generative Pre-training | |
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback | |
Predicting Emergent Abilities with Infinite Resolution Evaluation | |
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding | |
Hallucination of Multimodal Large Language Models: A Survey | |
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | |
Benchmarking Benchmark Leakage in Large Language Models | |
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations | |
ChuXin: 1.6B Technical Report | |
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models | |
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval | |
LEGENT: Open Platform for Embodied Agents | |
From Persona to Personalization: A Survey on Role-Playing Language Agents | |
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | |
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models | |
Autonomous LLM-driven research from data to human-verifiable research papers | |
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | |
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare | |
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration | |
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation | |
Beyond Words: A Mathematical Framework for Interpreting Large Language Models | |
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers | |
Ranked List Truncation for Large Language Model-based Re-Ranking | |
Building a Large Japanese Web Corpus for Large Language Models | |
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases | |
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | |
DOCCI: Descriptions of Connected and Contrasting Images | |
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | |
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | |
Better & Faster Large Language Models via Multi-token Prediction | |
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | |
Extending Llama-3's Context Ten-Fold Overnight | |
Octopus v4: Graph of language models | |
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics | |
ChatGPTest: opportunities and cautionary tales of utilizing AI for questionnaire pretesting | |
How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library | |
Faster Convergence for Transformer Fine-tuning with Line Search Methods | |
Linear Transformers Are Secretly Fast Weight Programmers | |
FLAME: Factuality-Aware Alignment for Large Language Models | |
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | |
In-Context Learning Creates Task Vectors | |
WildChat: 1M ChatGPT Interaction Logs in the Wild | |
"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval" | |
LLM-AD: Large Language Model based Audio Description System | |
PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval | |
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report | |
Self-Play Preference Optimization for Language Model Alignment | |
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 | |
A Careful Examination of Large Language Model Performance on Grade School Arithmetic | |
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge | |
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models | |
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound | |
Automatic Creative Selection with Cross-Modal Matching | |
Harmonic LLMs are Trustworthy | |
On Training a Neural Network to Explain Binaries | |
In-Context Learning with Long-Context Models: An In-Depth Exploration | |
Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning | |
Aligning LLM Agents by Learning Latent Preference from User Edits | |
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis | |
Neural Networks Learn Statistics of Increasing Complexity | |
Emerging Properties in Self-Supervised Vision Transformers | |
Advancing Multimodal Medical Capabilities of Gemini | |
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust | |
D2PO: Discriminator-Guided DPO with Response Evaluation Models | |
Controllable Text Generation in the Instruction-Tuning Era | |
MANTIS: Interleaved Multi-Image Instruction Tuning | |
A Philosophical Introduction to Language Models - Part II: The Way Forward | |
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing | |
How do Large Language Models Handle Multilingualism? | |
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models | |
Modeling Emotions and Ethics with Large Language Models | |
Structured Chemistry Reasoning with Large Language Models | |
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks | |
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO | |
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference | |
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences | |
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving | |
Characterising the Creative Process in Humans and Large Language Models | |
ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction | |
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering | |
AlphaMath Almost Zero: process Supervision without process | |
MAmmoTH2: Scaling Instructions from the Web | |
Is Flash Attention Stable? | |
ImageInWords: Unlocking Hyper-Detailed Image Descriptions | |
What matters when building vision-language models? | |
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | |
Understanding LLMs Requires More Than Statistical Generalization | |
Efficient and Economic Large Language Model Inference with Attention Offloading | |
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law | |
Large Language Models are Inconsistent and Biased Evaluators | |
101 Billion Arabic Words Dataset | |
What is Sentiment Meant to Mean to Language Models? | |
GPT-4 passes most of the 297 written Polish Board Certification Examinations | |
Text Quality-Based Pruning for Efficient Training of Language Models | |
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant | |
On the Evaluation of Machine-Generated Reports | |
Automatic Programming: Large Language Models and Beyond | |
Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs | |
Multi-hop Question Answering over Knowledge Graphs using Large Language Models | |
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference | |
Parallel Structures in Pre-training Data Yield In-Context Learning | |
BooookScore: A systematic exploration of book-length summarization in the era of LLMs | |
Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | |
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders | |
Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions | |
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training | |
Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning | |
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
ReZero is All You Need: Fast Convergence at Large Depth | |
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | |
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | |
A Transformer with Stack Attention | |
xLSTM: Extended Long Short-Term Memory | |
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions | |
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | |
The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring | |
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform | |
Granite Code Models: A Family of Open Foundation Models for Code Intelligence | |
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference | |
Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches | |
Assemblage: Automatic Binary Dataset Construction for Machine Learning | |
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application | |
Modeling Caption Diversity in Contrastive Vision-Language Pretraining | |
CLLMs: Consistency Large Language Models | |
You Only Cache Once: Decoder-Decoder Architectures for Language Models | |
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context | |
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control | |
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals | |
Chain of Thoughtlessness: An Analysis of CoT in Planning | |
LLMs Can Patch Up Missing Relevance Judgments in Evaluation | |
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures | |
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | |
How Susceptible are Large Language Models to Ideological Manipulation? | |
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | |
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers | |
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | |
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control | |
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models | |
Can We Use Large Language Models to Fill Relevance Judgment Holes? | |
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | |
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models | |
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics | |
The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models | |
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models | |
Automating the Enterprise with Foundation Models | |
Enhancing Q-Learning with Large Language Model Heuristics | |
Can Nuanced Language Lead to More Actionable Insights? Exploring the Role of Generative AI in Analytical Narrative Structure | |
Language Modeling Using Tensor Trains | |
PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation | |
Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models | |
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis | |
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations | |
Large Language Models (LLMs) as Agents for Augmented Democracy | |
Scaling Laws for Forgetting When Fine-Tuning Large Language Models | |
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence | |
Natural Language Processing RELIES on Linguistics | |
Probing Multimodal LLMs as World Models for Driving | |
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | |
Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | |
A Causal Explainable Guardrails for Large Language Models | |
In-Context Symbolic Regression: Leveraging Language Models for Function Discovery | |
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models | |
Value Augmented Sampling for Language Model Alignment and Personalization | |
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology | |
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models | |
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory | |
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | |
Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages | |
Sub-goal Distillation: A Method to Improve Small Language Agents | |
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning | |
Linearizing Large Language Models | |
Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | |
LMD3: Language Model Data Density Dependence | |
State-Free Inference of State-Space Models: The Transfer Function Approach | |
Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance | |
Masked Structural Growth for 2x Faster Language Model Pre-training | |
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | |
A Generalist Learner for Multifaceted Medical Image Interpretation | |
The Platonic Representation Hypothesis | |
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | |
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | |
Zero-Shot Tokenizer Transfer | |
RLHF Workflow: From Reward Modeling to Online RLHF | |
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation | |
SUTRA: Scalable Multilingual Language Model Architecture | |
ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization | |
Large Language Models as Planning Domain Generators | |
Explaining Text Similarity in Transformer Models | |
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | |
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent | |
Exposing Attention Glitches with Flip-Flop Language Modeling | |
CodeT5+: Open Code Large Language Models for Code Understanding and Generation | |
CinePile: A Long Video Question Answering Dataset and Benchmark | |
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | |
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory | |
Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models | |
Understanding the performance gap between online and offline alignment algorithms | |
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models | |
SpeechVerse: A Large-scale Generalizable Audio Language Model | |
Compositional Text-to-Image Generation with Dense Blob Representations | |
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | |
People cannot distinguish GPT-4 from a human in a Turing test | |
LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities | |
What Can Natural Language Processing Do for Peer Review? | |
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | |
Improving Transformers with Dynamically Composable Multi-Head Attention | |
Word2World: Generating Stories and Worlds through Large Language Models | |
Ask Again, Then Fail: Large Language Models' Vacillations in Judgement | |
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models | |
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model | |
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis | |
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | |
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | |
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models | |
Measuring Implicit Bias in Explicitly Unbiased Large Language Models | |
UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | |
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | |
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | |
Chameleon: Mixed-Modal Early-Fusion Foundation Models | |
Many-Shot In-Context Learning in Multimodal Foundation Models | |
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model | |
LoRA Learns Less and Forgets Less | |
Using ChatGPT for Thematic Analysis | |
Are Large Pre-Trained Language Models Leaking Your Personal Information? | |
Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre | |
HMT: Hierarchical Memory Transformer for Long Context Language Processing | |
Air Gap: Protecting Privacy-Conscious Conversational Agents | |
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models | |
LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages | |
MarkLLM: An Open-Source Toolkit for LLM Watermarking | |
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations | |
Towards Uncertainty-Aware Language Agent | |
Observational Scaling Laws and the Predictability of Language Model Performance | |
Layer-Condensed KV Cache for Efficient Inference of Large Language Models | |
CELA: Cost-Efficient Language Model Alignment for CTR Prediction | |
RDRec: Rationale Distillation for LLM-based Recommendation | |
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | |
INDUS: Effective and Efficient Language Models for Scientific Applications | |
Dynamic data sampler for cross-language transfer learning in large language models | |
Grounded 3D-LLM with Referent Tokens | |
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | |
Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining | |
MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization | |
WavCraft: Audio Editing and Generation with Large Language Models | |
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives | |
Transformers learn to implement preconditioned gradient descent for in-context learning | |
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting | |
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | |
Imp: Highly Capable Large Multimodal Models for Mobile Devices | |
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | |
Towards Modular LLMs by Building and Reusing a Library of LoRAs | |
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework | |
Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks | |
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | |
Latent State Estimation Helps UI Agents to Reason | |
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | |
Large Language Models Meet NLP: A Survey | |
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference | |
Your Transformer is Secretly Linear | |
Can AI Relate: Testing Large Language Model Response for Mental Health Support | |
Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! | |
Large Language Models are Biased Reinforcement Learners | |
ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | |
SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks | |
Keep It Private: Unsupervised Privatization of Online Text | |
Generative AI and Large Language Models for Cyber Security: All Insights You Need | |
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | |
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations | |
Leveraging Reinforcement Learning and Large Language Models for Code Optimization | |
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models | |
Large Language Models Are Not Robust Multiple Choice Selectors | |
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment