EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval

About the Model

This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.

The base model for this model is yanolja/EEVE-Korean-Instruct-10.8B-v1.0.

Prompt Template

์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜. 

### ์งˆ๋ฌธ: 
{question}

### ์ •๋ณด: 
{context}

### ํ‰๊ฐ€: 

How to Use it

import torch
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
)

model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)

prompt_template = '์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.\n์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.\n\n### ์งˆ๋ฌธ:\n{question}\n\n### ์ •๋ณด:\n{context}\n\n### ํ‰๊ฐ€:\n'
query = {
    "question": "๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?",
    "context": "์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค."
}

model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)

Example Output

์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.

### ์งˆ๋ฌธ:
๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?

### ์ •๋ณด:
์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค.

### ํ‰๊ฐ€:
์˜ˆ<|end_of_text|>

Training Data

Metrics

Korean LLM Benchmark

Model Average Ko-ARC Ko-HellaSwag Ko-MMLU Ko-TruthfulQA Ko-CommonGen V2
EEVE-Korean-Instruct-10.8B-v1.0 56.08 55.2 66.11 56.48 49.14 53.48
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 56.1 55.55 65.95 56.24 48.66 54.07

Generated Dataset

Model Accuracy F1 Precision Recall
EEVE-Korean-Instruct-10.8B-v1.0 0.824 0.800 0.885 0.697
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 0.892 0.875 0.903 0.848
Downloads last month
2,192
Safetensors
Model size
10.8B params
Tensor type
FP16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Space using sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 1