Finnish-NLP
/

roberta-large-finnish

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

aapot commited on Sep 21, 2021

Commit

8b141e7

·

1 Parent(s): fa9a284

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,9 +8,9 @@ tags:
 datasets:
 - mc4
 - wikipedia
-pipeline_tag: fill-mask
 widget:
 - text: "Moikka olen <mask> kielimalli."
 ---
 # RoBERTa large model for Finnish
@@ -105,7 +105,7 @@ neutral. Therefore, the model can have biased predictions.
 ## Training data
 This Finnish RoBERTa model was pretrained on the combination of five datasets:
-- [mc4](https://huggingface.co/datasets/mc4), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. Based on Common Crawl dataset. We used the Finnish subset of the mC4 dataset
 - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
 - [Yle Finnish News Archive](http://urn.fi/urn:nbn:fi:lb-2017070501)
 - [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)

 datasets:
 - mc4
 - wikipedia
 widget:
 - text: "Moikka olen <mask> kielimalli."
 ---
 # RoBERTa large model for Finnish
 ## Training data
 This Finnish RoBERTa model was pretrained on the combination of five datasets:
+- [mc4](https://huggingface.co/datasets/mc4), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. We used the Finnish subset of the mC4 dataset
 - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
 - [Yle Finnish News Archive](http://urn.fi/urn:nbn:fi:lb-2017070501)
 - [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)