โ€ป ORPO ํ•™์Šต ๊ณผ์ •์—์„œ ํ…œํ”Œ๋ฆฟ์— ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์žฌํ•™์Šต ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

์„ค๋ช…

  • ์Šคํƒ€์ผ ๋ฐ ๋‹จ์–ด์žฅ์„ ์„ธํŒ…ํ•  ์ˆ˜ ์žˆ๋Š” ๋กœ์ปฌ ์˜->ํ•œ ๋ฒˆ์—ญ ๋ชจ๋ธ.
๋‹ค์Œ ํ…์ŠคํŠธ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ์ฃผ์„ธ์š”.
๋ฒˆ์—ญ ์Šคํƒ€์ผ: ์ผ๋ฐ˜ ๋Œ€์ค‘, ๋ฐ˜๋ง, ๋…ธ๋ž˜ ๊ฐ€์‚ฌ, ๋ถ€๋“œ๋Ÿฌ์›€, ~ํ•˜๋„ค
๋‹จ์–ด์žฅ: {'the director':'๋Œ€์ฐฝ์„ญ', 'back to normal':'์ •์ƒํ™”'}

The director finally gets Maple back to normal.

# ์ถœ๋ ฅ
๋Œ€์ฐฝ์„ญ์€ ๋งˆ์นจ๋‚ด ๋ฉ”์ดํ”Œ์„ ์ •์ƒํ™”์‹œํ‚จ๋‹ค๋„ค.'
  • ์Šคํƒ€์ผ์€ ์ „๋‹ฌ ๋Œ€์ƒ, ์กด๋Œ“๋ง/๋ฐ˜๋ง ์—ฌ๋ถ€, ๋ฌธ์ฒด, ์–ด๋ฏธ ๋“ฑ์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ์œ ํ˜•: [๋ช…์‚ฌํ˜•(Nominal), ํ‰์„œ๋ฌธ (Declarative), ์˜๋ฌธ๋ฌธ (Interrogative), ๋ช…๋ น๋ฌธ (Imperative), ๊ฐํƒ„๋ฌธ (Exclamatory), ์ฒญ์œ ๋ฌธ (Propositive)]
    • ๋Œ€์ƒ: [์ผ๋ฐ˜ ๋Œ€์ค‘ (General), ์ „๋ฌธ๊ฐ€ ์ง‘๋‹จ (Specialist), ์•„๋™ (Children), ๊ฐœ์ธ (Individual)]
    • ๋ฌธ์ฒด: [๊ฒฉ์‹์ฒด (Formal), ๋น„๊ฒฉ์‹์ฒด (Informal), ๋”ฑ๋”ฑํ•จ (Stiff), ๋ถ€๋“œ๋Ÿฌ์›€ (Soft), ์นœ๊ทผํ•จ (Friendly), ์ •์ค‘ํ•จ (Polite)]
    • ๋ถ„์•ผ: [ํ•™์ˆ ์  (Academic), ๋ฒ•๋ฅ ์  (Legal), ์—…๋ฌด์  (Professional), ๊ธฐ์ˆ ์  (Technical), ๋ฌธํ•™์  (Literary), ์ผ์ƒ์  (Casual)]
    • ์–ดํˆฌ: [๋ฐ˜๋ง, ์กด๋Œ“๋ง]
    • ์–ด๋ฏธ: [~๋‹ค, ~๋‹ˆ๋‹ค, ~์˜ค, ~์š”, ~ํ•ด]
    • EXAONE-3.5-7.8B ๋ณธ์—ฐ์˜ ๋Šฅ๋ ฅ ๋•๋ถ„์—, ํ•™์Šตํ•˜์ง€ ์•Š์€ ์Šคํƒ€์ผ ์„ค์ •๋„ ์–ด๋Š ์ •๋„ ๋ฐ˜์˜ํ•ด ์ค๋‹ˆ๋‹ค.
  • ๋‹จ์–ด์žฅ์˜ ๊ฒฝ์šฐ, dictionary ํ˜•ํƒœ๋กœ ์ œ๊ณต๋˜์–ด์•ผ ํ•˜๋ฉฐ, ์ž…๋ ฅ๋œ ๋‹จ์–ด์žฅ์„ ๊ณผ์ž‰ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฏ€๋กœ ์‹ ์ค‘ํ•˜๊ฒŒ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • ์ˆœ์ˆ˜ ๋ฒˆ์—ญ ์„ฑ๋Šฅ ์ž์ฒด๋Š” ์˜ˆ์ƒ๋ณด๋‹ค ๋‚ฎ์€ ํŽธ์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐฉ๋ฒ•

  • SFT
  1. werty1248/Open-KoEn-Parallel-Style-Tag์— ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ ์†Œ์Šค ๋ฐ AI Hub์—์„œ ์ด 3M๊ฐœ์˜ ํ•œ์˜ ๋ฒˆ์—ญ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
  2. ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ 10%๋ฅผ ์ถ”์ถœํ•˜์—ฌ werty1248/Open-KoEn-Parallel-Style-Tag ์—์„œ ์†Œ๊ฐœ๋œ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ์Šคํƒ€์ผ ํƒœ๊ทธ ์ƒ์„ฑ (300K๊ฐœ)
  3. ์ „์ฒด 3M ๋ฐ์ดํ„ฐ๋ฅผ EXAONE-3.5-7.8B-Instruct ๋ชจ๋ธ ๊ธฐ๋ฐ˜์œผ๋กœ LoRA ํ•™์Šต
Axolotl Config
base_model: beomi/EXAONE-3.5-7.8B-Instruct-Llamafied
model_type: AutoModelForCausalLM
tokenizer_config: beomi/EXAONE-3.5-7.8B-Instruct-Llamafied
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: werty1248/KoEn-Parallel-Full-Conv
    field_messages: conversations
    train_on_eos: turn
    type: chat_template
    chat_template: tokenizer_default

dataset_prepared_path: ./data_preparation
output_dir: /workspace/data

hf_use_auth_token: true

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.1
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_rslora: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 1
optimizer: paged_ademamix_32bit
lr_scheduler: cosine
learning_rate: 0.000005
weight_decay: 0.1

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: 
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 1
eval_table_size:

deepspeed: ./deepspeed_configs/zero3_bf16.json
  • RL
  1. werty1248/Open-KoEn-Parallel-Style-Glossary-DPO ๋ฐ์ดํ„ฐ๋ฅผ ORPO๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต
  2. SFT์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ LoRA๋กœ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•จ
Axolotl Config

โ€ป ์ด๋Œ€๋กœ ์‹คํ–‰ํ•˜๋ฉด ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. Axolotl์˜ ORPO์ชฝ ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ & chat_template ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•ด์„œ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

base_model: werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag
model_type: AutoModelForCausalLM
tokenizer_config: werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

rl: orpo
datasets:
  - path: werty1248/Open-KoEn-Parallel-Style-Glossary-DPO
    name: wo_system
    type: chat_template.default
    field_messages: messages
    field_chosen: chosen
    field_rejected: rejected
    message_field_role: role
    message_field_content: content

dataset_prepared_path: ./data_preparation
output_dir: /workspace/data

hf_use_auth_token: true

sequence_len: 8192

sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.1
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_rslora: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 1

num_epochs: 1

optimizer: adamw_torch

lr_scheduler: cosine
learning_rate: 0.000005

train_on_inputs: false
group_by_length: false
bf16: auto

gradient_checkpointing: true
flash_attention: true
saves_per_epoch: 1

logging_steps: 1
warmup_steps: 20

#deepspeed: ./deepspeed_configs/zero3_bf16.json

vLLM ์˜ˆ์‹œ (๊ธฐ๋ณธ ๋ฒˆ์—ญ vs ์Šคํƒ€์ผ ์ถ”๊ฐ€ ๋ฒˆ์—ญ vs ๋‹จ์–ด์žฅ ์ถ”๊ฐ€ ๋ฒˆ์—ญ vs ์Šคํƒ€์ผ&๋‹จ์–ด์žฅ ์ถ”๊ฐ€ ๋ฒˆ์—ญ)

!pip install vllm
  • ์‹คํ–‰ ์ฝ”๋“œ
from vllm import LLM, SamplingParams
name = "werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag-DPO"

llm = LLM(model=name, max_model_len=2048)

sampling_params = SamplingParams(temperature=0, max_tokens=512, stop=['[|assistant|]',]) # ํ…œํ”Œ๋ฆฟ ๋ฌธ์ œ



normal_request = """๋‹ค์Œ ํ…์ŠคํŠธ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ์ฃผ์„ธ์š”.

The director finally gets Maple back to normal."""


style_request = """๋‹ค์Œ ํ…์ŠคํŠธ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ์ฃผ์„ธ์š”.
๋ฒˆ์—ญ ์Šคํƒ€์ผ: ์ผ๋ฐ˜ ๋Œ€์ค‘, ๋ฐ˜๋ง, ๋…ธ๋ž˜ ๊ฐ€์‚ฌ, ๋ถ€๋“œ๋Ÿฌ์›€, ~ํ•˜๋„ค

The director finally gets Maple back to normal."""


glossary_request = """๋‹ค์Œ ํ…์ŠคํŠธ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ์ฃผ์„ธ์š”.
๋‹จ์–ด์žฅ: {'the director':'๋Œ€์ฐฝ์„ญ', 'back to normal':'์ •์ƒํ™”'}

The director finally gets Maple back to normal."""


style_glossary_request = """๋‹ค์Œ ํ…์ŠคํŠธ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ์ฃผ์„ธ์š”.
๋ฒˆ์—ญ ์Šคํƒ€์ผ: ์ผ๋ฐ˜ ๋Œ€์ค‘, ๋ฐ˜๋ง, ๋…ธ๋ž˜ ๊ฐ€์‚ฌ, ๋ถ€๋“œ๋Ÿฌ์›€, ~ํ•˜๋„ค
๋‹จ์–ด์žฅ: {'the director':'๋Œ€์ฐฝ์„ญ', 'back to normal':'์ •์ƒํ™”'}

The director finally gets Maple back to normal."""


input_list = [[{"role":"user","content":normal_request}],
             [{"role":"user","content":style_request}],
             [{"role":"user","content":glossary_request}],
             [{"role":"user","content":style_glossary_request}]]


outputs = llm.chat(input_list, sampling_params)
pred_list = [x.outputs[0].text for x in outputs]

print("์˜์–ด: The director finally gets Maple back to normal.\n")
print("\n".join(pred_list))
  • ์‹คํ–‰ ๊ฒฐ๊ณผ
์˜์–ด: The director finally gets Maple back to normal.

๊ฐ๋…์€ ๋งˆ์นจ๋‚ด ๋ฉ”์ดํ”Œ์„ ์ •์ƒ์œผ๋กœ ๋Œ๋ ค๋†“๋Š”๋‹ค. # normal_request
๊ฐ๋…์€ ๋งˆ์นจ๋‚ด ๋ฉ”์ดํ”Œ์„ ์ •์ƒ์œผ๋กœ ๋Œ๋ ค๋†“๋„ค.  # style_request
๋Œ€์ฐฝ์„ญ์€ ๋งˆ์นจ๋‚ด ๋ฉ”์ดํ”Œ์„ ์ •์ƒํ™”์‹œํ‚ต๋‹ˆ๋‹ค.' # glossary_request

๋Œ€์ฐฝ์„ญ์€ ๋งˆ์นจ๋‚ด ๋ฉ”์ดํ”Œ์„ ์ •์ƒํ™”์‹œํ‚จ๋‹ค๋„ค.'  # style_glossary_request
Downloads last month
22
Safetensors
Model size
7.82B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag-DPO

Finetuned
(3)
this model
Quantizations
1 model