werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag-DPO

※ ORPO 학습 과정에서 템플릿에 문제가 있었습니다. 재학습 예정입니다.

설명

스타일 및 단어장을 세팅할 수 있는 로컬 영->한 번역 모델.

다음 텍스트를 한국어로 번역해 주세요.
번역 스타일: 일반 대중, 반말, 노래 가사, 부드러움, ~하네
단어장: {'the director':'대창섭', 'back to normal':'정상화'}

The director finally gets Maple back to normal.

# 출력
대창섭은 마침내 메이플을 정상화시킨다네.'

스타일은 전달 대상, 존댓말/반말 여부, 문체, 어미 등을 설정할 수 있습니다.
- 유형: [명사형(Nominal), 평서문 (Declarative), 의문문 (Interrogative), 명령문 (Imperative), 감탄문 (Exclamatory), 청유문 (Propositive)]
- 대상: [일반 대중 (General), 전문가 집단 (Specialist), 아동 (Children), 개인 (Individual)]
- 문체: [격식체 (Formal), 비격식체 (Informal), 딱딱함 (Stiff), 부드러움 (Soft), 친근함 (Friendly), 정중함 (Polite)]
- 분야: [학술적 (Academic), 법률적 (Legal), 업무적 (Professional), 기술적 (Technical), 문학적 (Literary), 일상적 (Casual)]
- 어투: [반말, 존댓말]
- 어미: [~다, ~니다, ~오, ~요, ~해]
- EXAONE-3.5-7.8B 본연의 능력 덕분에, 학습하지 않은 스타일 설정도 어느 정도 반영해 줍니다.
단어장의 경우, dictionary 형태로 제공되어야 하며, 입력된 단어장을 과잉 반영하는 경향이 있으므로 신중하게 사용해야 합니다.
순수 번역 성능 자체는 예상보다 낮은 편입니다.

학습 방법

werty1248/Open-KoEn-Parallel-Style-Tag에 포함된 데이터 소스 및 AI Hub에서 총 3M개의 한영 번역 데이터 수집
전체 데이터의 10%를 추출하여 werty1248/Open-KoEn-Parallel-Style-Tag 에서 소개된 방법론으로 스타일 태그 생성 (300K개)
전체 3M 데이터를 EXAONE-3.5-7.8B-Instruct 모델 기반으로 LoRA 학습

Axolotl Config

base_model: beomi/EXAONE-3.5-7.8B-Instruct-Llamafied
model_type: AutoModelForCausalLM
tokenizer_config: beomi/EXAONE-3.5-7.8B-Instruct-Llamafied
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: werty1248/KoEn-Parallel-Full-Conv
    field_messages: conversations
    train_on_eos: turn
    type: chat_template
    chat_template: tokenizer_default

dataset_prepared_path: ./data_preparation
output_dir: /workspace/data

hf_use_auth_token: true

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.1
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_rslora: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 1
optimizer: paged_ademamix_32bit
lr_scheduler: cosine
learning_rate: 0.000005
weight_decay: 0.1

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: 
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 1
eval_table_size:

deepspeed: ./deepspeed_configs/zero3_bf16.json

werty1248/Open-KoEn-Parallel-Style-Glossary-DPO 데이터를 ORPO를 사용하여 학습
SFT와 마찬가지로 LoRA로 학습을 수행함

Axolotl Config

※ 이대로 실행하면 에러가 발생합니다. Axolotl의 ORPO쪽 데이터 로딩 & chat_template 코드를 수정해서 사용했습니다.

base_model: werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag
model_type: AutoModelForCausalLM
tokenizer_config: werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

rl: orpo
datasets:
  - path: werty1248/Open-KoEn-Parallel-Style-Glossary-DPO
    name: wo_system
    type: chat_template.default
    field_messages: messages
    field_chosen: chosen
    field_rejected: rejected
    message_field_role: role
    message_field_content: content

dataset_prepared_path: ./data_preparation
output_dir: /workspace/data

hf_use_auth_token: true

sequence_len: 8192

sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.1
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_rslora: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 1

num_epochs: 1

optimizer: adamw_torch

lr_scheduler: cosine
learning_rate: 0.000005

train_on_inputs: false
group_by_length: false
bf16: auto

gradient_checkpointing: true
flash_attention: true
saves_per_epoch: 1

logging_steps: 1
warmup_steps: 20

#deepspeed: ./deepspeed_configs/zero3_bf16.json

vLLM 예시 (기본 번역 vs 스타일 추가 번역 vs 단어장 추가 번역 vs 스타일&단어장 추가 번역)

!pip install vllm

실행 코드

from vllm import LLM, SamplingParams
name = "werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag-DPO"

llm = LLM(model=name, max_model_len=2048)

sampling_params = SamplingParams(temperature=0, max_tokens=512, stop=['[|assistant|]',]) # 템플릿 문제



normal_request = """다음 텍스트를 한국어로 번역해 주세요.

The director finally gets Maple back to normal."""


style_request = """다음 텍스트를 한국어로 번역해 주세요.
번역 스타일: 일반 대중, 반말, 노래 가사, 부드러움, ~하네

The director finally gets Maple back to normal."""


glossary_request = """다음 텍스트를 한국어로 번역해 주세요.
단어장: {'the director':'대창섭', 'back to normal':'정상화'}

The director finally gets Maple back to normal."""


style_glossary_request = """다음 텍스트를 한국어로 번역해 주세요.
번역 스타일: 일반 대중, 반말, 노래 가사, 부드러움, ~하네
단어장: {'the director':'대창섭', 'back to normal':'정상화'}

The director finally gets Maple back to normal."""


input_list = [[{"role":"user","content":normal_request}],
             [{"role":"user","content":style_request}],
             [{"role":"user","content":glossary_request}],
             [{"role":"user","content":style_glossary_request}]]


outputs = llm.chat(input_list, sampling_params)
pred_list = [x.outputs[0].text for x in outputs]

print("영어: The director finally gets Maple back to normal.\n")
print("\n".join(pred_list))

실행 결과

영어: The director finally gets Maple back to normal.

감독은 마침내 메이플을 정상으로 돌려놓는다. # normal_request
감독은 마침내 메이플을 정상으로 돌려놓네.  # style_request
대창섭은 마침내 메이플을 정상화시킵니다.' # glossary_request

대창섭은 마침내 메이플을 정상화시킨다네.'  # style_glossary_request

werty1248
/

EXAONE-3.5-7.8B-SFT-Translation-Style-Tag-DPO

※ ORPO 학습 과정에서 템플릿에 문제가 있었습니다. 재학습 예정입니다.

설명

학습 방법

vLLM 예시 (기본 번역 vs 스타일 추가 번역 vs 단어장 추가 번역 vs 스타일&단어장 추가 번역)

Model tree for werty1248/EXAONE-3.5-7.8B-SFT-Translation-Style-Tag-DPO