nguyenvulebinh
/

vi-mrc-large

Question Answering

Inference Endpoints

Model card Files Files and versions Community

nguyenvulebinh commited on Jul 21, 2021

Commit

3ce9c96

·

1 Parent(s): 05b18be

add description

Files changed (1) hide show

README.md +8 -4

README.md CHANGED Viewed

@@ -32,10 +32,12 @@ widget:
 This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
-Model | EM | F1 |
-|:---: |:---:  |:---: |
-base    | 76.43  | 84.16  |
-large   | 77.32 | 85.46 |
 [MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
@@ -55,6 +57,7 @@ QA_input = {
 }
 res = nlp(QA_input)
 print('pipeline: {}'.format(res))
 ```
 - More accurate infer process ([**Using sum features strategy**](https://github.com/nguyenvulebinh/extractive-qa-mrc))
@@ -80,4 +83,5 @@ outputs = model(**inputs_ids)
 answer = extract_answer(inputs, outputs, tokenizer)
 print(answer)
 ```

 This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
+| Model  | EM | F1 |
+| ------------- | ------------- | ------------- |
+| [base](https://huggingface.co/nguyenvulebinh/vi-mrc-base)  | 76.43  | 84.16  |
+| [large](https://huggingface.co/nguyenvulebinh/vi-mrc-large)  | 77.32  | 85.46  |
 [MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
 }
 res = nlp(QA_input)
 print('pipeline: {}'.format(res))
+#{'score': 0.5782045125961304, 'start': 45, 'end': 68, 'answer': 'xử lý ngôn ngữ tự nhiên'}
 ```
 - More accurate infer process ([**Using sum features strategy**](https://github.com/nguyenvulebinh/extractive-qa-mrc))
 answer = extract_answer(inputs, outputs, tokenizer)
 print(answer)
+# answer: Google Developer Expert. Score start: 0.9926977753639221, Score end: 0.9909810423851013
 ```