nguyenvulebinh
commited on
Commit
·
3ce9c96
1
Parent(s):
05b18be
add description
Browse files
README.md
CHANGED
@@ -32,10 +32,12 @@ widget:
|
|
32 |
|
33 |
This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
|
34 |
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
|
|
|
|
39 |
|
40 |
[MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
|
41 |
|
@@ -55,6 +57,7 @@ QA_input = {
|
|
55 |
}
|
56 |
res = nlp(QA_input)
|
57 |
print('pipeline: {}'.format(res))
|
|
|
58 |
```
|
59 |
|
60 |
- More accurate infer process ([**Using sum features strategy**](https://github.com/nguyenvulebinh/extractive-qa-mrc))
|
@@ -80,4 +83,5 @@ outputs = model(**inputs_ids)
|
|
80 |
answer = extract_answer(inputs, outputs, tokenizer)
|
81 |
|
82 |
print(answer)
|
|
|
83 |
```
|
|
|
32 |
|
33 |
This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
|
34 |
|
35 |
+
|
36 |
+
| Model | EM | F1 |
|
37 |
+
| ------------- | ------------- | ------------- |
|
38 |
+
| [base](https://huggingface.co/nguyenvulebinh/vi-mrc-base) | 76.43 | 84.16 |
|
39 |
+
| [large](https://huggingface.co/nguyenvulebinh/vi-mrc-large) | 77.32 | 85.46 |
|
40 |
+
|
41 |
|
42 |
[MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
|
43 |
|
|
|
57 |
}
|
58 |
res = nlp(QA_input)
|
59 |
print('pipeline: {}'.format(res))
|
60 |
+
#{'score': 0.5782045125961304, 'start': 45, 'end': 68, 'answer': 'xử lý ngôn ngữ tự nhiên'}
|
61 |
```
|
62 |
|
63 |
- More accurate infer process ([**Using sum features strategy**](https://github.com/nguyenvulebinh/extractive-qa-mrc))
|
|
|
83 |
answer = extract_answer(inputs, outputs, tokenizer)
|
84 |
|
85 |
print(answer)
|
86 |
+
# answer: Google Developer Expert. Score start: 0.9926977753639221, Score end: 0.9909810423851013
|
87 |
```
|