Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.08491

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 3
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54

Tailor-made LLM evaluations: custom evaluations for your LLM

Collection of articles and resources focusing on automatic evaluation for LLM's and their role as unbiased judges in assessing other LLMs' outputs

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 33
Generative Judge for Evaluating Alignment

Paper • 2310.05470 • Published Oct 9, 2023 • 1
Humans or LLMs as the Judge? A Study on Judgement Biases

Paper • 2402.10669 • Published Feb 16, 2024
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 34

Papers: LLM as a Judge

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 34
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 33
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Paper • 2303.16634 • Published Mar 29, 2023 • 3
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54

Evaluation of LLM agents paper

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

Paper • 2401.15391 • Published Jan 27, 2024 • 6
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 25
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 34
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 34
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 20

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 41
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 20
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15, 2024 • 37

🧑‍⚖️ LLM-as-a-judge

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 33
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 38
Leveraging Large Language Models for NLG Evaluation: A Survey

Paper • 2401.07103 • Published Jan 13, 2024 • 4
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 30
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

The Feedback Collection

Dataset and Model for "Prometheus: Inducing Fine-grained Evaluation Capability in Language Models"

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54
prometheus-eval/Feedback-Collection

Viewer • Updated Oct 14, 2023 • 100k • 538 • 107
prometheus-eval/prometheus-7b-v1.0

Text2Text Generation • Updated Oct 14, 2023 • 119 • 30
prometheus-eval/prometheus-13b-v1.0

Text2Text Generation • Updated Oct 14, 2023 • 2.49k • 134

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs