|
--- |
|
language: id |
|
license: mit |
|
datasets: |
|
- indonli |
|
- MoritzLaurer/multilingual-NLI-26lang-2mil7 |
|
pipeline_tag: zero-shot-classification |
|
widget: |
|
- text: Saya suka makan kentang goreng. |
|
candidate_labels: positif, netral, negatif |
|
hypothesis_template: Kalimat ini mengandung tema {}. |
|
multi_class: false |
|
example_title: Sentiment |
|
- text: Apple umumkan harga iPhone 14. |
|
candidate_labels: teknologi, olahraga, kuliner, bisnis |
|
hypothesis_template: Kalimat ini mengandung tema {}. |
|
multi_class: true |
|
example_title: News |
|
model-index: |
|
- name: ilos-vigil/bigbird-small-indonesian-nli |
|
results: |
|
- task: |
|
type: natural-language-inference |
|
name: Natural Language Inference |
|
dataset: |
|
name: indonli |
|
type: indonli |
|
config: indonli |
|
split: test_expert |
|
metrics: |
|
- type: accuracy |
|
value: 0.5385388739946381 |
|
name: Accuracy |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWRhZDkxNmI2NzE3MzRlYmNlMWFjZDVmNWUwYmMwN2IxYzNjMWE4YzY4NWI3NDZkYTMzY2NjN2MyZGQ5YzEwZSIsInZlcnNpb24iOjF9.AgizskHeXOzs0v93DNojNoqR_-1bQsYBokL8jcfelFm-zt-r5YXt89WXBDLLg4oKv-Roj8sLhUwe7ei0Mf1-Ag |
|
- type: f1 |
|
value: 0.530444188199697 |
|
name: F1 Macro |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjk2YTFhY2E3NGIzNzgxY2M5YzUzNGUzYTAwOWZkNGU3Y2I5MDA1MTc0YzM4Yjg0MmIzY2Y5M2EzOGYxNjY4NiIsInZlcnNpb24iOjF9.YZ_fTuVftTCM6SFfkFCLPbJWYmYNMYL9PNHUwNFHQXZeknf6OCBgQtr1gF6VM9mX6WuU4OKEl12tsAytlkm7Ag |
|
- type: f1 |
|
value: 0.5385388739946381 |
|
name: F1 Micro |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2MxMGUyZmJhZTYzN2M4NDlkMTZmMzllOGVhMjRiODhkMGVkMGMxMjY2NDBkZWM3ZWY2ZjhmZTNmYWU5ZjEzMyIsInZlcnNpb24iOjF9.f0HQlPRx4VFnOOHsrvMKFni8g1B1OJfheOyADsf47GnrvCcW_dakDgBy5c_yy4TehQYRa6ToYGHnuQnemvhnBg |
|
- type: f1 |
|
value: 0.5299257731385174 |
|
name: F1 Weighted |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTgzZjJkZWU0NDgyMGU5MDFmNzk2OWY1OWY4MzA2NTE3MDAxN2Y2MWExODJkYjdlN2I1YzgzYjljNjdkMTc1YiIsInZlcnNpb24iOjF9.lWB7MZlAiDjskKM-lx-XtLxTQYuWLz3QjyseDuZe_AxtyOKt2GZkP2NDOZxEWketHjRiTCQfBUvSfzFId-FCAg |
|
- type: precision |
|
value: 0.5592571894118881 |
|
name: Precision Macro |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDQxYTFlNTNjNDAwMWIxYmJlMzRkN2U5OWY1NWNjN2YyYTE2NzRjNjM3ZWNhMzM4NjFhYWM4MzJkYjY3MzU0YSIsInZlcnNpb24iOjF9.6OI4_M1wLX1Z1BztKUfZ-382F3coCeJjarsWc-J04TKpsFCddLjuF5ZDuBFmokpz4goRgx-FlH-5jCAsFkzkBg |
|
- type: precision |
|
value: 0.5385388739946381 |
|
name: Precision Micro |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzRmY2I4YTAzMTRkMjFjNTE1NTEwZDlmZGQ4NDUyYTAxY2JhOTliMDRhNWY3OGY4OWRlNTlkNzcxODc0MDMwYyIsInZlcnNpb24iOjF9.X7ekS-JYOXH5eNmSfKQ_no1rNAbuQ3C0pNYvorPVfcna6RU8n6O6FNQor0AWvatAWdefJG6H3J7_GoC6M5zECw |
|
- type: precision |
|
value: 0.5586108016541553 |
|
name: Precision Weighted |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjUwNjMxYjEwMTEzNzAwNzQwZDQwMTRmZDM2ZDk0ZDc3YTUxOTQzNDE5ZWI2NWI4MmJmODAxYTlmN2E0Nzk2MCIsInZlcnNpb24iOjF9.nAO1wRFHMtm5kem9VhuuRg54fpvA2uzwEutjzsnZoyemUHbI2U_1TK_dDmR4bmpPjVnCZt5sF-jEq4oZIaIbDQ |
|
- type: recall |
|
value: 0.5385813032215204 |
|
name: Recall Macro |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzVkNjliYTM0Njc3MTUzMDBmYTE5NDRkNzFjNzg2NzA0NzEyMTg4YTlkNGFlZWMxZWUwOGQzYzY1ZGU0ZmIwNyIsInZlcnNpb24iOjF9.cnEbDBJR8m3UqiuzCq_g4RUFLE8BVzXDebKguVrwPgY-Biu4sBFXVQvFyZScsLGEnaHYsE-R8ctTEGDdQONVBw |
|
- type: recall |
|
value: 0.5385388739946381 |
|
name: Recall Micro |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODZkMmNjZWY4ZDYyYjU3NjQ2ZGNhZjkyNTQyOTg2ZjNmNDgwNDYxYmU2ZDA5M2EwOWRlMjMyYmI4MGU3MGMxNCIsInZlcnNpb24iOjF9.BfMB4_MZ-SYj1YbTES8pqgKNQkNnevSOjAwUqdoL6wsNpsKKWxPHmq0Kt9XufxHoQoyTkGvPfxh-0jEe3B1nBg |
|
- type: recall |
|
value: 0.5385388739946381 |
|
name: Recall Weighted |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmE3Yjg3OTVhMjdlMDk1YWFjMWIwNjMyZTA2Yzc3MjBlNjI1YWY5MzE0MjNkMDNiMmU5ZmIxYWExNmViYWE1NSIsInZlcnNpb24iOjF9.S9Bo-wq3wikFS-FqMQerxahu87PJyYx141G5PCWDtOs2wH1nf4texnJYWfHeVCJKZcKmS2RWn5XOjjJ9RoNJAA |
|
- type: loss |
|
value: 1.062397837638855 |
|
name: loss |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTFmNDI0ZmQ2YmNlZjJlZTdmZTYwOGVkMjdjMjJkMDIzNzhlOWFiNWQzNjFiMmU5NTdiM2Y1YjYxMjU4ZjQ2ZSIsInZlcnNpb24iOjF9.15RsFRkFpbarlU1L8UyV0o0_5WCveO_mT9CdO0UYwvQsOVjScheJ8fOqHBAC-C-CMTlfFNsmMhNrU_np8c_ZCQ |
|
--- |
|
|
|
# Indonesian small BigBird model NLI |
|
|
|
## Source Code |
|
|
|
Source code to create this model and perform benchmark is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian). |
|
|
|
## Model Description |
|
|
|
This model is based on [bigbird-small-indonesian](https://huggingface.co/ilos-vigil/bigbird-small-indonesian) and was finetuned on 2 datasets. It is intended to be used for zero-shot text classification. |
|
|
|
## How to use |
|
|
|
> Inference for ZSC (Zero Shot Classification) task |
|
|
|
```py |
|
>>> pipe = pipeline( |
|
... task='zero-shot-classification', |
|
... model='./tmp/checkpoint-28832' |
|
... ) |
|
>>> pipe( |
|
... sequences='Fakta nomor 7 akan membuat ada terkejut', |
|
... candidate_labels=['clickbait', 'bukan clickbait'], |
|
... hypothesis_template='Judul video ini {}.', |
|
... multi_label=False |
|
... ) |
|
{ |
|
'sequence': 'Fakta nomor 7 akan membuat ada terkejut', |
|
'labels': ['clickbait', 'bukan clickbait'], |
|
'scores': [0.6102734804153442, 0.38972654938697815] |
|
} |
|
>>> pipe( |
|
... sequences='Samsung tuntut balik Apple dengan alasan hak paten teknologi.', |
|
... candidate_labels=['teknologi', 'olahraga', 'bisnis', 'politik', 'kesehatan', 'kuliner'], |
|
... hypothesis_template='Kategori berita ini adalah {}.', |
|
... multi_label=True |
|
... ) |
|
{ |
|
'sequence': 'Samsung tuntut balik Apple dengan alasan hak paten teknologi.', |
|
'labels': ['politik', 'teknologi', 'kesehatan', 'bisnis', 'olahraga', 'kuliner'], |
|
'scores': [0.7390161752700806, 0.6657379269599915, 0.4459509551525116, 0.38407933712005615, 0.3679264783859253, 0.14181996881961823] |
|
} |
|
``` |
|
|
|
> Inference for NLI (Natural Language Inference) task |
|
|
|
```py |
|
>>> pipe = pipeline( |
|
... task='text-classification', |
|
... model='./tmp/checkpoint-28832', |
|
... return_all_scores=True |
|
... ) |
|
>>> pipe({ |
|
... 'text': 'Nasi adalah makanan pokok.', # Premise |
|
... 'text_pair': 'Saya mau makan nasi goreng.' # Hypothesis |
|
... }) |
|
[ |
|
{'label': 'entailment', 'score': 0.25495028495788574}, |
|
{'label': 'neutral', 'score': 0.40920916199684143}, |
|
{'label': 'contradiction', 'score': 0.33584052324295044} |
|
] |
|
>>> pipe({ |
|
... 'text': 'Python sering digunakan untuk web development dan AI research.', |
|
... 'text_pair': 'AI research biasanya tidak menggunakan bahasa pemrograman Python.' |
|
... }) |
|
[ |
|
{'label': 'entailment', 'score': 0.12508109211921692}, |
|
{'label': 'neutral', 'score': 0.22146646678447723}, |
|
{'label': 'contradiction', 'score': 0.653452455997467} |
|
] |
|
``` |
|
|
|
## Limitation and bias |
|
|
|
This model inherit limitation/bias from it's parent model and 2 datasets used for fine-tuning. And just like most language model, this model is sensitive towards input change. Here's an example. |
|
|
|
```py |
|
>>> from transformers import pipeline |
|
>>> pipe = pipeline( |
|
... task='zero-shot-classification', |
|
... model='./tmp/checkpoint-28832' |
|
... ) |
|
>>> text = 'Resep sate ayam enak dan mudah.' |
|
>>> candidate_labels = ['kuliner', 'olahraga'] |
|
>>> pipe( |
|
... sequences=text, |
|
... candidate_labels=candidate_labels, |
|
... hypothesis_template='Kategori judul artikel ini adalah {}.', |
|
... multi_label=False |
|
... ) |
|
{ |
|
'sequence': 'Resep sate ayam enak dan mudah.', |
|
'labels': ['kuliner', 'olahraga'], |
|
'scores': [0.7711364030838013, 0.22886358201503754] |
|
} |
|
>>> pipe( |
|
... sequences=text, |
|
... candidate_labels=candidate_labels, |
|
... hypothesis_template='Kelas kalimat ini {}.', |
|
... multi_label=False |
|
... ) |
|
{ |
|
'sequence': 'Resep sate ayam enak dan mudah.', |
|
'labels': ['kuliner', 'olahraga'], |
|
'scores': [0.7043636441230774, 0.295636385679245] |
|
} |
|
>>> pipe( |
|
... sequences=text, |
|
... candidate_labels=candidate_labels, |
|
... hypothesis_template='{}.', |
|
... multi_label=False |
|
... ) |
|
{ |
|
'sequence': 'Resep sate ayam enak dan mudah.', |
|
'labels': ['kuliner', 'olahraga'], |
|
'scores': [0.5986711382865906, 0.4013288915157318] |
|
} |
|
|
|
``` |
|
|
|
## Training, evaluation and testing data |
|
|
|
This model was finetuned with [IndoNLI](https://huggingface.co/datasets/indonli) and [multilingual-NLI-26lang-2mil7](https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7). Although `multilingual-NLI-26lang-2mil7` dataset is machine-translated, this dataset slightly improve result of NLI benchmark and extensively improve result of ZSC benchmark. Both evaluation and testing data is only based on IndoNLI dataset. |
|
|
|
## Training Procedure |
|
|
|
The model was finetuned on single RTX 3060 with 16 epoch/28832 steps with accumulated batch size 64. AdamW optimizer is used with LR 1e-4, weight decay 0.05, learning rate warmup for first 6% steps (1730 steps) and linear decay of the learning rate afterwards. Take note while model weight on epoch 9 has lowest loss/highest accuracy, it has slightly lower performance on ZSC benchmark. Additional information can be seen on Tensorboard training logs. |
|
|
|
## Benchmark as NLI model |
|
|
|
Both benchmark show result of 2 different model as additional comparison. Additional benchmark using IndoNLI dataset is available on it's paper [IndoNLI: A Natural Language Inference Dataset for Indonesian](https://aclanthology.org/2021.emnlp-main.821/). |
|
|
|
| Model | bigbird-small-indonesian-nli | xlm-roberta-large-xnli | mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 | |
|
| ------------------------------------------ | ---------------------------- | ---------------------- | -------------------------------------------- | |
|
| Parameter | 30.6M | 559.9M | 278.8M | |
|
| Multilingual | | V | V | |
|
| Finetuned on IndoNLI | V | | V | |
|
| Finetuned on multilingual-NLI-26lang-2mil7 | V | | | |
|
| Test (Lay) | 0.6888 | 0.2226 | 0.8151 | |
|
| Test (Expert) | 0.5734 | 0.3505 | 0.7775 | |
|
|
|
## Benchmark as ZSC model |
|
|
|
[Indonesian-Twitter-Emotion-Dataset](https://github.com/meisaputri21/Indonesian-Twitter-Emotion-Dataset/) is used to perform ZSC benchmark. This benchmark include 4 different parameter which affect performance of each model differently. Hypothesis template for this benchmark is `Kalimat ini mengekspresikan perasaan {}.` and `{}.`. Take note F1 score measurement only calculate label with highest probability. |
|
|
|
| Model | Multi-label | Use template | F1 Score | |
|
| -------------------------------------------- | ----------- | ------------ | ------------ | |
|
| bigbird-small-indonesian-nli | V | V | 0.3574 | |
|
| | V | | 0.3654 | |
|
| | | V | 0.3985 | |
|
| | | | _0.4160_ | |
|
| xlm-roberta-large-xnli | V | V | _**0.6292**_ | |
|
| | V | | 0.5596 | |
|
| | | V | 0.5737 | |
|
| | | | 0.5433 | |
|
| mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 | V | V | 0.5324 | |
|
| | V | | _0.5499_ | |
|
| | | V | 0.5269 | |
|
| | | | 0.5228 | |
|
|