|
--- |
|
title: ko-translation-leaderbaord |
|
app_file: leaderboard.py |
|
sdk: gradio |
|
sdk_version: 3.50.2 |
|
--- |
|
# Iris Translation |
|
![iris-icon.jpeg](assets%2Firis-icon.jpeg) |
|
|
|
Welcome to Iris Translation, a project designed to evaluate Korean-to-English translation models. Our project provides a comprehensive framework for evaluating the Iris model that we have developed. |
|
|
|
|
|
|
|
## Models |
|
|
|
๋ฒ์ญ ํ์ง์ ๋น๊ตํ๊ธฐ ์ํด ์ฌ์ฉํ ๋ชจ๋ธ์
๋๋ค. ๋ชจ๋ ์คํ ๊ฐ๋ฅํ๋ฉฐ ๊ฒฐ๊ณผ๋ฅผ ํ์ธํ ์ ์์ต๋๋ค. |
|
|
|
- [davidkim205/iris-7b](https://huggingface.co/davidkim205/iris-7b) |
|
- [squarelike/Gugugo-koen-7B-V1.1](https://huggingface.co/squarelike/Gugugo-koen-7B-V1.1) |
|
- [maywell/Synatra-7B-v0.3-Translation](https://huggingface.co/maywell/Synatra-7B-v0.3-Translation) |
|
- [Unbabel/TowerInstruct-7B-v0.1](https://huggingface.co/Unbabel/TowerInstruct-7B-v0.1) |
|
- [jbochi/madlad400-10b-mt](https://huggingface.co/jbochi/madlad400-10b-mt) |
|
- [facebook/mbart-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt) |
|
- [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) |
|
|
|
|
|
|
|
## Installation |
|
|
|
``` |
|
conda create -n translation python=3.10 |
|
conda activate translation |
|
|
|
pip install -r requirements.txt |
|
``` |
|
|
|
|
|
## Usage |
|
|
|
์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง๋ ๊ธฐ๋ณธ ํ์ผ์ `./data/komt-1810k-test.jsonl`์
๋๋ค. ๋ค์์ ๋ฐ์ดํฐ์ JSON ์คํค๋ง ์์์
๋๋ค. |
|
|
|
```json |
|
{ |
|
"conversations":[ |
|
{ |
|
"from":"human", |
|
"value":"๋ค์ ๋ฌธ์ฅ์ ํ๊ธ๋ก ๋ฒ์ญํ์ธ์.\nLet's make a graph here showing different levels of interest in activities." |
|
}, |
|
{ |
|
"from":"gpt", |
|
"value":"ํ๋์ ๋ํ ๋ค์ํ ์์ค์ ๊ด์ฌ์ ๋ณด์ฌ์ฃผ๋ ๊ทธ๋ํ๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค." |
|
} |
|
], |
|
"src":"aihub-MTPE" |
|
} |
|
``` |
|
|
|
### translate(Bleu) |
|
|
|
๋ชจ๋ธ์ ์ฌ์ฉํ ๋ฒ์ญ ๊ฒฐ๊ณผ์ ์ค์ ๋ฒ์ญ ๊ฒฐ๊ณผ๋ฅผ ๋น๊ตํ์ฌ bleu score๋ฅผ ๊ตฌํฉ๋๋ค. |
|
|
|
``` |
|
python translation.py --model davidkim205/iris-7b |
|
``` |
|
|
|
๊ฒฐ๊ณผ ํ์ผ์ ๊ฒฝ๋ก๋ `results_bleu/iris-7b-result.jsonl`์
๋๋ค. |
|
|
|
JSON ์คํค๋ง ์์ |
|
|
|
- reference: ์ค์ ์ ๋ต ๋ฒ์ญ๋ฌธ |
|
- generation: ๋ชจ๋ธ์ด ์์ฑํ ๋ฒ์ญ๋ฌธ |
|
|
|
```json |
|
{ |
|
"index":0, |
|
"reference":"ํ๋์ ๋ํ ๋ค์ํ ์์ค์ ๊ด์ฌ์ ๋ณด์ฌ์ฃผ๋ ๊ทธ๋ํ๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค.", |
|
"generation":"์ฌ๊ธฐ์ ํ๋์ ๋ํ ๋ค์ํ ์์ค์ ๊ด์ฌ์ ๋ณด์ฌ์ฃผ๋ ๊ทธ๋ํ๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค.", |
|
"bleu":0.917, |
|
"lang":"en", |
|
"model":"davidkim205/iris-7b", |
|
"src":"aihub-MTPE", |
|
"conversations":[ |
|
{ |
|
"from":"human", |
|
"value":"๋ค์ ๋ฌธ์ฅ์ ํ๊ธ๋ก ๋ฒ์ญํ์ธ์.\nLet's make a graph here showing different levels of interest in activities." |
|
}, |
|
{ |
|
"from":"gpt", |
|
"value":"ํ๋์ ๋ํ ๋ค์ํ ์์ค์ ๊ด์ฌ์ ๋ณด์ฌ์ฃผ๋ ๊ทธ๋ํ๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค." |
|
} |
|
] |
|
} |
|
``` |
|
|
|
### translate_self(SBleu) |
|
|
|
๋ชจ๋ธ ๋ฒ์ญ ๊ฒฐ๊ณผ๋ฅผ ๋ค์ ๋ฒ์ญํ์ฌ ์๋ฌธ๊ณผ์ bleu score๋ฅผ ๋น๊ตํฉ๋๋ค. |
|
|
|
``` |
|
python translation_self.py --model davidkim205/iris-7b |
|
``` |
|
|
|
๊ฒฐ๊ณผ ํ์ผ์ ๊ฒฝ๋ก๋ `results_self/iris-7b-result.jsonl`์
๋๋ค. |
|
|
|
JSON ์คํค๋ง ์์ |
|
|
|
- reference: ์๋ฌธ |
|
- generation: ๋ชจ๋ธ ์ฌ๋ฒ์ญ ๊ฒฐ๊ณผ |
|
- generation1: ๋ชจ๋ธ ๋ฒ์ญ๋ฌธ |
|
|
|
```json |
|
{ |
|
"index":0, |
|
"reference":"Let's make a graph here showing different levels of interest in activities.", |
|
"generation":"let's create a graph that shows different levels of interest in activities here", |
|
"generation1":"์ฌ๊ธฐ์ ํ๋์ ๋ํ ๋ค์ํ ์์ค์ ๊ด์ฌ์ ๋ณด์ฌ์ฃผ๋ ๊ทธ๋ํ๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค.", |
|
"bleu":0.49, |
|
"lang":"en", |
|
"model":"davidkim205/iris-7b", |
|
"src":"aihub-MTPE", |
|
"conversations":[ |
|
{ |
|
"from":"human", |
|
"value":"๋ค์ ๋ฌธ์ฅ์ ํ๊ธ๋ก ๋ฒ์ญํ์ธ์.\nLet's make a graph here showing different levels of interest in activities." |
|
}, |
|
{ |
|
"from":"gpt", |
|
"value":"ํ๋์ ๋ํ ๋ค์ํ ์์ค์ ๊ด์ฌ์ ๋ณด์ฌ์ฃผ๋ ๊ทธ๋ํ๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค." |
|
} |
|
] |
|
} |
|
``` |
|
|
|
### translate2(Bleu and SBleu) |
|
translate์ translate_self๋ฅผ ๋ชจ๋ ์ํํ์ฌ bleu ๋ฐ sbleu๋ฅผ ๋ชจ๋ ๋น๊ตํ ์ ์์ต๋๋ค. |
|
|
|
``` |
|
python translation2.py --model davidkim205/iris-7b |
|
``` |
|
|
|
- translate๋ฅผ ์ํํ์ฌ `results_bleu/iris-7b-result.jsonl`์ ์ ์ฅ |
|
- translate_self๋ฅผ ์ํํ์ฌ `results_self/iris-7b-result.jsonl`์ ์ ์ฅ |
|
|
|
๊ฐ ํ์ผ์ ์์์ ์์ฑํ ๋ ํ์ผ๊ณผ ๋์ผํ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ์ต๋๋ค. |
|
|
|
|
|
|
|
## Evaluation |
|
|
|
๋ ๊ฐ์ง ๋ฐฉ์์ผ๋ก ๋ฒ์ญ ๊ฒฐ๊ณผ๋ฅผ ๊ฒ์ฆํฉ๋๋ค. |
|
|
|
1. ์ค์ ๋ฒ์ญ๊ณผ ๋ชจ๋ธ ๋ฒ์ญ์ ๋น๊ตํ์ฌ ํ๊ฐ |
|
|
|
``` |
|
python evaluate.py results_bleu/ |
|
``` |
|
|
|
output |
|
|
|
``` |
|
bleu scores |
|
result_bleu-nllb200.jsonl: 0.26, out_of_range_count=3, duplicate=1 |
|
result_bleu-madlad400.jsonl: 0.29, out_of_range_count=6, duplicate=3 |
|
result_bleu-TowerInstruct.jsonl: 0.32, out_of_range_count=9, duplicate=1 |
|
result_bleu-gugugo.jsonl: 0.32, out_of_range_count=3, duplicate=1 |
|
result_bleu-Synatra-7B-v0.3-Translation.jsonl: 0.35, out_of_range_count=2, duplicate=1 |
|
result_bleu-deepl.jsonl: 0.39, out_of_range_count=1, duplicate=0 |
|
result_bleu-azure.jsonl: 0.40, out_of_range_count=2, duplicate=0 |
|
result_bleu-google.jsonl: 0.40, out_of_range_count=3, duplicate=0 |
|
result_bleu-papago.jsonl: 0.43, out_of_range_count=3, duplicate=0 |
|
result_bleu-iris_7b.jsonl: 0.40, out_of_range_count=3, duplicate=0 |
|
``` |
|
|
|
2. ์๋ฌธ์ 2๋ฒ ๋ฒ์ญ(์->ํ->์)ํ ๊ฒฐ๊ณผ์ ๋น๊ตํ์ฌ ํ๊ฐ |
|
|
|
``` |
|
python evaluate.py results_self/ |
|
``` |
|
|
|
output |
|
|
|
``` |
|
bleu scores |
|
result_self-nllb200.jsonl: 0.30, out_of_range_count=1, duplicate=1 |
|
result_self-gugugo.jsonl: 0.36, out_of_range_count=1, duplicate=1 |
|
result_self-madlad400.jsonl: 0.38, out_of_range_count=3, duplicate=2 |
|
result_self-TowerInstruct.jsonl: 0.39, out_of_range_count=3, duplicate=0 |
|
result_self-Synatra-7B-v0.3-Translation.jsonl: 0.41, out_of_range_count=2, duplicate=1 |
|
result_self-deepl.jsonl: 0.45, out_of_range_count=0, duplicate=0 |
|
result_self-papago.jsonl: 0.49, out_of_range_count=0, duplicate=0 |
|
result_self-azure.jsonl: 0.49, out_of_range_count=0, duplicate=1 |
|
result_self-google.jsonl: 0.49, out_of_range_count=0, duplicate=0 |
|
result_self-papago.jsonl: 0.51, out_of_range_count=0, duplicate=0 |
|
result_self-iris_7b.jsonl: 0.43, out_of_range_count=1, duplicate=0 |
|
``` |
|
|
|
**ํ๊ฐ ์์** |
|
|
|
- BLEU: ์ค์ ๋ฒ์ญ๊ณผ ๋ชจ๋ธ ๋ฒ์ญ์ bleu score ํ๊ท |
|
- SBLEU: ์๋ฌธ๊ณผ ์ฌ๋ฒ์ญ์ bleu score ํ๊ท |
|
- Duplicate: ๋ฒ์ญ ์ ์ค๋ณต๋ ํ
์คํธ๋ฅผ ์์ฑํ๋ ๊ฒฝ์ฐ |
|
- Length Exceeds: ๋ชจ๋ธ ๋ฒ์ญ๊ณผ ์ค์ ๋ฒ์ญ ๊ธธ์ด์ ๋ถ์ผ์น(0.2 < length < 2 ๊ธฐ์ค) |
|
|
|
### BLEU |
|
|
|
๊ฐ ๋ชจ๋ธ๋ณ๋ก ํ๊ฐํ ๊ฒฐ๊ณผ์
๋๋ค. iris-7b ๋ชจ๋ธ์ ํ๊ฐ๋ ์๋์ ๊ฐ์ต๋๋ค. |
|
|
|
- ๋ชจ๋ ํ๊ฐ์์ ๊ธฐ์กด ๋ชจ๋ธ๋ค๋ณด๋ค ๋์ ๋ฒ์ญ ์ฑ๋ฅ |
|
- ํ๊ท ์ ์ผ๋ก ํด๋ผ์ฐ๋ ๋ฒ์ญ๊ณผ ๋์ผํ ๋ฒ์ญ ์ฑ๋ฅ |
|
- ์ค๋ณต ๋ฌธ์ฅ ์์ฑ ๋ฐ ๊ธธ์ด ์ด๊ณผ ๋ฌธ์ ๋ ํด๋ผ์ฐ๋ ๋ฒ์ญ๊ณผ ๋์ผํ ์์ค |
|
|
|
![plot-bleu.png](assets%2Fplot-bleu.png) |
|
|
|
Duplicate(์ค๋ณต ๋ฌธ์ฅ ์์ฑ)์ Length Exceeds(๊ธธ์ด ์ด๊ณผ)๋ translation(bleu)์ ์งํ์
๋๋ค. |
|
|
|
| TYPE | Model | BLEU | SBLEU | Duplicate | Length Exceeds | |
|
| ----------- | :---------------------------------- | ---- | ----- | --------- | -------------- | |
|
| HuggingFace | facebook/nllb-200-distilled-1.3B | 0.26 | 0.30 | 1 | 3 | |
|
| HuggingFace | jbochi/madlad400-10b-mt | 0.29 | 0.38 | 3 | 6 | |
|
| HuggingFace | Unbabel/TowerInstruct-7B-v0.1 | 0.32 | 0.39 | 1 | 9 | |
|
| HuggingFace | squarelike/Gugugo-koen-7B-V1.1 | 0.32 | 0.36 | 1 | 3 | |
|
| HuggingFace | maywell/Synatra-7B-v0.3-Translation | 0.35 | 0.41 | 1 | 2 | |
|
| Cloud | deepl | 0.39 | 0.45 | 0 | 1 | |
|
| Cloud | azure | 0.40 | 0.49 | 0 | 3 | |
|
| Cloud | google | 0.40 | 0.49 | 0 | 2 | |
|
| Cloud | papago | 0.43 | 0.51 | 0 | 3 | |
|
| HuggingFace | davidkim205/iris-7b (**ours**) | 0.40 | 0.43 | 0 | 3 | |
|
|
|
* SBLEU: Self-evaluation BLEU |
|
|
|
### BLEU by source |
|
|
|
๋ถ์ผ๋ณ๋ก ํ
์คํธ ๋ฐ์ดํฐ์
๋ฒ์ญ ํ์ง์ ํ๊ฐํ ๊ฒฐ๊ณผ์
๋๋ค. iris-7b ๋ชจ๋ธ์ ํ๊ฐ๋ ์๋์ ๊ฐ์ต๋๋ค. |
|
|
|
- ๋ชจ๋ ๋ถ์ผ์์ ๊ธฐ์กด ๋ฒ์ญ๋ชจ๋ธ์ ์๋ํ๋ ์ฑ๋ฅ |
|
- ๋ง์ ๋ถ์ผ์์ ํด๋ผ์ฐ๋ ๋ฒ์ญ๊ณผ ๋น์ทํ๊ฑฐ๋, ๋ ๋์ ์ฑ๋ฅ |
|
- ๊ณผํ ๋ถ์ผ, ์ ์กฐ์ด ๋ถ์ผ์ ๋ฒ์ญ ํ์ง์ด ๋งค์ฐ ์ฐ์ |
|
|
|
![plot-bleu-by-src.png](assets%2Fplot-bleu-by-src.png) |
|
|
|
| Type | Model | Average | MTPE | techsci2 | expertise | humanities | sharegpt-deepl-ko-translation | MT-new-corpus | socialsci | korean-parallel-corpora | parallel-translation | food | techsci | para_pat | speechtype-based-machine-translation | koopus100 | basicsci | broadcast-content | patent | colloquial | |
|
| ----------- | :---------------------------------- | ------- | ---: | -------: | --------: | ---------: | ----------------------------: | ------------: | --------: | ----------------------: | -------------------: | ---: | ------: | -------: | -----------------------------------: | --------: | -------: | ----------------: | -----: | ---------: | |
|
| HuggingFace | facebook/nllb-200-distilled-1.3B | 0.26 | 0.44 | 0.28 | 0.16 | 0.23 | 0.44 | 0.34 | 0.27 | 0.10 | 0.23 | 0.37 | 0.28 | 0.19 | 0.29 | 0.23 | 0.15 | 0.33 | 0.09 | 0.29 | |
|
| HuggingFace | jbochi/madlad400-10b-mt | 0.29 | 0.45 | 0.29 | 0.20 | 0.29 | 0.40 | 0.36 | 0.39 | 0.12 | 0.22 | 0.46 | 0.30 | 0.23 | 0.48 | 0.23 | 0.19 | 0.36 | 0.01 | 0.33 | |
|
| HuggingFace | Unbabel/TowerInstruct-7B-v0.1 | 0.32 | 0.46 | 0.33 | 0.28 | 0.27 | 0.30 | 0.39 | 0.37 | 0.14 | 0.35 | 0.47 | 0.39 | 0.29 | 0.41 | 0.21 | 0.22 | 0.36 | 0.15 | 0.33 | |
|
| HuggingFace | squarelike/Gugugo-koen-7B-V1.1 | 0.32 | 0.46 | 0.27 | 0.28 | 0.22 | 0.66 | 0.33 | 0.36 | 0.10 | 0.29 | 0.45 | 0.34 | 0.24 | 0.42 | 0.22 | 0.23 | 0.42 | 0.20 | 0.26 | |
|
| HuggingFace | maywell/Synatra-7B-v0.3-Translation | 0.35 | 0.43 | 0.36 | 0.27 | 0.23 | 0.70 | 0.37 | 0.31 | 0.13 | 0.34 | 0.52 | 0.35 | 0.29 | 0.44 | 0.21 | 0.24 | 0.46 | 0.28 | 0.37 | |
|
| Cloud | deepl | 0.39 | 0.59 | 0.33 | 0.31 | 0.32 | 0.70 | 0.48 | 0.38 | 0.14 | 0.38 | 0.55 | 0.41 | 0.33 | 0.48 | 0.24 | 0.28 | 0.42 | 0.37 | 0.36 | |
|
| Cloud | azure | 0.40 | 0.57 | 0.36 | 0.35 | 0.29 | 0.63 | 0.46 | 0.39 | 0.16 | 0.38 | 0.56 | 0.39 | 0.33 | 0.54 | 0.22 | 0.29 | 0.52 | 0.35 | 0.41 | |
|
| Cloud | google | 0.40 | 0.62 | 0.39 | 0.32 | 0.32 | 0.60 | 0.45 | 0.45 | 0.14 | 0.38 | 0.59 | 0.43 | 0.34 | 0.45 | 0.22 | 0.28 | 0.47 | 0.39 | 0.36 | |
|
| Cloud | papago | 0.43 | 0.56 | 0.43 | 0.41 | 0.30 | 0.55 | 0.58 | 0.56 | 0.16 | 0.37 | 0.67 | 0.52 | 0.35 | 0.53 | 0.21 | 0.35 | 0.45 | 0.37 | 0.46 | |
|
| HuggingFace | davidkim205/iris-7b (**ours**) | 0.40 | 0.49 | 0.37 | 0.34 | 0.31 | 0.72 | 0.48 | 0.43 | 0.11 | 0.33 | 0.56 | 0.46 | 0.34 | 0.43 | 0.20 | 0.30 | 0.47 | 0.41 | 0.40 | |
|
|
|
### BLEU by sentence length |
|
|
|
ํ
์คํธ์ ๊ธธ์ด์ ๋ฐ๋ผ 4๊ตฌ๊ฐ์ผ๋ก ๋ฐ์ดํฐ๋ฅผ 50๊ฐ์ฉ ์ํ๋งํ์ฌ ๋ฒ์ญํ ํ๊ท ์ ์์
๋๋ค. ํ๊ฐ์ ์ฌ์ฉ๋ ๋ฐ์ดํฐ์
์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค. |
|
|
|
- `data/komt-dataset-100.jsonl` |
|
- `data/komt-dataset-500.jsonl` |
|
- `data/komt-dataset-1000.jsonl` |
|
- `data/komt-dataset-1500.jsonl` |
|
|
|
๋ฒ์ญ ๋ฐ bleu score ๊ฒฐ๊ณผ๋ `results_length/`์๋์ ์ ์ฅ๋์ด ์์ต๋๋ค. |
|
|
|
๋๋๊ฒ๋, iris-7b ๋ชจ๋ธ์ ๋ชจ๋ ๊ตฌ๊ฐ์์ ๋๋ถ๋ถ์ ํด๋ผ์ฐ๋ ๋ฒ์ญ๋ณด๋ค ๋์ ์ฑ๋ฅ์ ๋ณด์
๋๋ค. |
|
|
|
- ~100: (0, 100] |
|
- ~500: (100, 500] |
|
- ~1000: (500, 1000] |
|
- ~1500: (1000, 1500] |
|
|
|
![plot-bleu-by-sentence-length.png](assets%2Fplot-bleu-by-sentence-length.png) |
|
|
|
| Type | Model | Average | ~100(50) | ~500(50) | ~1000(50) | ~1500(50) | |
|
| ----------- | :---------------------------------- | ------- | -------: | -------: | --------: | --------: | |
|
| HuggingFace | facebook/nllb-200-distilled-1.3B | 0.24 | 0.31 | 0.31 | 0.22 | 0.13 | |
|
| HuggingFace | jbochi/madlad400-10b-mt | 0.22 | 0.35 | 0.37 | 0.08 | 0.10 | |
|
| HuggingFace | Unbabel/TowerInstruct-7B-v0.1 | 0.32 | 0.41 | 0.31 | 0.24 | 0.32 | |
|
| HuggingFace | squarelike/Gugugo-koen-7B-V1.1 | 0.45 | 0.37 | 0.48 | 0.52 | 0.43 | |
|
| HuggingFace | maywell/Synatra-7B-v0.3-Translation | 0.50 | 0.41 | 0.57 | 0.57 | 0.51 | |
|
| Cloud | deepl | 0.53 | 0.44 | 0.56 | 0.64 | 0.50 | |
|
| Cloud | azure | 0.47 | 0.46 | 0.47 | 0.52 | 0.44 | |
|
| Cloud | google | 0.51 | 0.50 | 0.49 | 0.54 | 0.51 | |
|
| Cloud | papago | 0.46 | 0.50 | 0.46 | 0.43 | 0.45 | |
|
| HuggingFace | davidkim205/iris-7b (**ours**) | 0.56 | 0.51 | 0.58 | 0.62 | 0.54 | |
|
|
|
|
|
|
|
## test dataset info |
|
|
|
ํ
์คํธ ๋ฐ์ดํฐ์
์ 18๊ฐ์ง ๋ถ์ผ์ ๋ฐ์ดํฐ 10๊ฐ๋ก, ์ด 180๊ฐ๋ก ์ด๋ฃจ์ด์ ธ ์์ต๋๋ค. |
|
|
|
`koopus100` ๋ฐ์ดํฐ์
์ ๊ธธ์ด๊ฐ ์งง๊ณ ์๋ฌธ๊ณผ ๋ฒ์ญ๋ฌธ์ด ์ผ์นํ์ง ์๋ ๋ฐ์ดํฐ๊ฐ ์กด์ฌํ์ฌ ํ์ง์ด ๋ฎ์ต๋๋ค. |
|
|
|
``` |
|
text: All right |
|
translation: ๋ณ๋ก ๊ทธ๋ด ๊ธฐ๋ถ ์๋์ผ - I'm not in the mood. |
|
|
|
text: Do you have a fever? |
|
translation: ๋ญ๋ผ๊ณ ํ์ด? |
|
``` |
|
|
|
`korean-parallel-corpora` ๋ฐ์ดํฐ์
์ ๋ฒ์ญ๋ฌธ์ ํ์์ด ํผ์ฉ๋๊ฑฐ๋, ์๋ชป ๋ฒ์ญ๋์ด ํ์ง์ด ๋ฎ์ต๋๋ค. |
|
|
|
``` |
|
text: S. Korea mulls missile defense system ํ๊ตญ, ์์ฒด์ ๋ฏธ์ฌ์ผ ๋ฐฉ์ด์ฒด๊ณ ์๋ฆฝ ๊ฒํ ย ย ย 2007.03 |
|
translation: South Korea maintains a mandatory draft system under which all able-bodied men over 20 must serve in the military for 24 to 27 months. |
|
|
|
text: A United States intelligence agency has been collecting data on the phone calls of tens of millions of Americans, a report in USA Today has alleged. |
|
translation: NSA collects Americansโphone clall data๋ฏธ ๊ตญ๊ฐ์๋ณด๊ตญ, ๋ฏธ๊ตญ๋ฏผ ํตํ ๋ด์ฉ ์์ง2006.07 |
|
|
|
text: I see the guy as more like John Wayne, which is to say I don't like his politics but he's endearing in a strange, goofy, awkward way, and he did capture the imagination of the country,\" he said. |
|
translation: ๋ฒ ํธ๋จ์ ์ ์ฐธ์ ํ๋ ์คํค ๊ฐ๋
์ ๋นํ์ ์ผ๋ก ํธํ์ ๋ฐ๊ณ ์ ์น์ ์ธ ์ฑํฅ์ด ๋ง์ ์ํ๋ฅผ ์ ์ํ ๊ฒ์ผ๋ก ์ ๋ช
ํ๋ค. |
|
|
|
text: The Sahara is advancing into Ghana and Nigeria at the rate of 3,510 square kilometers per year. |
|
translation: ์นด์ํ์คํ ๋ํ ์ฌ๋งํ๋ก ์ธํด 1980๋
์ดํ ๋๊ฒฝ์ง์ 50%๊ฐ ์ฌ๋ผ์ก์ผ๋ฉฐ ์ฌํ๋ผ ์ฌ๋ง์ ๋งค๋
3510ใข์ฉ ์ปค์ ธ๊ฐ๋ฉฐ ๊ฐ๋์ ๋์ด์ง๋ฆฌ์๋ฅผ ์ํํ๊ณ ์๋ค. |
|
``` |
|
|
|
์๋ ํ์๋ ๊ฐ src์ ๋น์จ๊ณผ ๊ฐ์, ์ค๋ช
์ด ์ ๋ฆฌ๋์ด ์์ต๋๋ค. |
|
|
|
| src | ratio | description | |
|
| ------------------------------------------ | ----- | ------------------------------------------------------------ | |
|
| aihub-MTPE | 5.56% | ๊ธฐ๊ณ๋ฒ์ญ ํ์ง ์ฌํ๊ฒ์ฆ ๋ฐ์ดํฐ์
| |
|
| aihub-techsci2 | 5.56% | ICT, ์ ๊ธฐ/์ ์ ๋ฑ ๊ธฐ์ ๊ณผํ ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-expertise | 5.56% | ์๋ฃ, ๊ธ์ต, ์คํฌ์ธ ๋ฑ ์ ๋ฌธ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-humanities | 5.56% | ์ธ๋ฌธํ ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| sharegpt-deepl-ko-translation | 5.56% | shareGPT ๋ฐ์ดํฐ์
์ ์ง๋ต ํ์์์ ํ์ ๋ฒ์ญ ํ์์ผ๋ก ๋ณํํ ๋ฐ์ดํฐ์
| |
|
| aihub-MT-new-corpus | 5.56% | ๊ธฐ๊ณ ๋ฒ์ญ ์ฑ ๊ตฌ์ถ์ฉ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-socialsci | 5.56% | ๋ฒ๋ฅ , ๊ต์ก, ๊ฒฝ์ ๋ฑ ์ฌํ๊ณผํ ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| korean-parallel-corpora | 5.56% | ํ์ ๋ฒ์ญ ๋ณ๋ ฌ ๋ฐ์ดํฐ์
| |
|
| aihub-parallel-translation | 5.56% | ๋ฐํ ์ ํ ๋ฐ ๋ถ์ผ๋ณ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-food | 5.56% | ์ํ ๋ถ์ผ ์ํ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-techsci | 5.56% | ICT, ์ ๊ธฐ/์ ์ ๋ฑ ๊ธฐ์ ๊ณผํ ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| para_pat | 5.56% | ParaPat ๋ฐ์ดํฐ์
์ ์์ด-ํ๊ตญ์ด subset | |
|
| aihub-speechtype-based-machine-translation | 5.56% | ๋ฐํ ์ ํ๋ณ ์ํ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| koopus100 | 5.56% | OPUS-100 ๋ฐ์ดํฐ์
์ ์์ด-ํ๊ตญ์ด subset | |
|
| aihub-basicsci | 5.56% | ์ํ, ๋ฌผ๋ฆฌํ ๋ฑ ๊ธฐ์ด๊ณผํ ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-broadcast-content | 5.56% | ๋ฐฉ์ก ์ฝํ
์ธ ๋ถ์ผ ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-patent | 5.56% | ํนํ๋ช
์ธ์ ์ํ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
| aihub-colloquial | 5.56% | ์ ์กฐ์ด, ์ฝ์ด ๋ฑ์ ํฌํจํ๋ ๊ตฌ์ด์ฒด ํ์ ๋ฒ์ญ ๋ฐ์ดํฐ์
| |
|
|