--- base_model: - meta-llama/Llama-3.1-8B-Instruct library_name: transformers language: - en - de - fr - it - pt - es pipeline_tag: text-generation tags: - llama - atla - evaluation - llm-as-a-judge - meta - conversational - lm-judge license: apache-2.0 ---
🛝 Selene Mini Playground | 🧑⚖️ Atla Blog | 📄 Technical report | 💻 GitHub
# Model Summary Atla Selene Mini is a **state-of-the-art small language model-as-a-judge (SLMJ)**. Selene Mini achieves comparable performance to models 10x its size, **outperforming GPT-4o on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench), EvalBiasBench, and AutoJ**. Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini **outperforms prior small models overall across 11 benchmarks covering three different types of tasks:** - Absolute scoring, e.g. "Evaluate the harmlessness of this response on a scale of 1-5" - Classification, e.g. "Does this response address the user query? Answer Yes or No." - Pairwise preference. e.g. "Which of the following responses is more logically consistent - A or B?" It is also the **#1 8B generative model on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench)**. We are launching the large version of this model soon. Sign up [here](https://www.atla-ai.com/sign-up-waitlist?utm_source=huggingface&utm_medium=community&utm_campaign=WL_HF_modelcard_communitypost_sel1minilaunch) to be first to access it.## Model Details - **Developed by:** [Atla](https://www.atla-ai.com/sign-up-waitlist?utm_source=huggingface&utm_medium=community&utm_campaign=WL_HF_modelcard_communitypost_sel1minilaunch) - **Model type:** Post-trained from Llama-3.1-8B - **Language(s) (NLP):** Primarily English but supports German, French, Italian, Portuguese, Hindi, Spanish, Thai ## Model Use Selene Mini can be used as a **general-purpose evaluation model**. It supports different inputs & scoring scales, generates structured evaluation outputs, and provides qualitative critiques with reasoning. Try our cookbooks to get started with two popular use cases below: - [Absolute scoring](https://colab.research.google.com/github/atla-ai/selene-mini/blob/main/cookbooks/HF_Quickstart_Absolute_Scoring.ipynb) - [RAG hallucination](https://colab.research.google.com/github/atla-ai/selene-mini/blob/main/cookbooks/HF_Quickstart_Hallucination.ipynb) To achieve best results, **we provide the prompts we used for training [here](https://github.com/atla-ai/selene-mini/tree/main/prompt-templates).** Remember to apply the conversation template of Llama 3 - not doing so might lead to unexpected behaviors. You can find the conversation class at this [link](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) or you can refer to the below code that will apply it. ## Quickstart (HF Transformers): ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model_id = "AtlaAI/Selene-1-Mini-Llama-3.1-8B" model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) prompt = "I heard you can evaluate my responses?" # replace with your prompt / we provide prompt templates used during training at github.com/atla-ai/selene-mini/tree/main/prompt-templates messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True) generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Contact support@atla-ai.com