Collections
Discover the best community collections!
Collections including paper arxiv:2402.10200
-
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs
Paper • 2407.00653 • Published • 11 -
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Paper • 2406.18629 • Published • 42 -
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Paper • 2406.14562 • Published • 28 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper • 2406.04271 • Published • 29
-
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Paper • 2405.07990 • Published • 18 -
Large Language Models as Planning Domain Generators
Paper • 2405.06650 • Published • 11 -
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
Paper • 2404.12753 • Published • 42 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 48
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 116 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 61 -
Do language models plan ahead for future tokens?
Paper • 2404.00859 • Published • 2
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 107 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 54 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 47
-
Linearity of Relation Decoding in Transformer Language Models
Paper • 2308.09124 • Published • 2 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
Mission: Impossible Language Models
Paper • 2401.06416 • Published • 3
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 5 -
Attention Is All You Need
Paper • 1706.03762 • Published • 50 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 10 -
Language Model Evaluation Beyond Perplexity
Paper • 2106.00085 • Published
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 38 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 52 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Paper • 2211.00593 • Published • 2 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 11