ybracke's picture
Update README.md
f309b85 verified
|
raw
history blame
2.3 kB
---
license: apache-2.0
base_model: google/byt5-small
language: de
model-index:
- name: ybracke/transnormer-19c-beta-v02
results:
- task:
name: Historic Text Normalization
type: translation
dataset:
name: DTA EvalCorpus
type: N/A
split: test
metrics:
- name: Word Accuracy
type: accuracy
value: 0.98878
- name: Word Accuracy (case insensitive)
type: accuracy
value: 0.99343
---
# Transnormer 19th century (beta v01)
This model normalizes spelling variants in historical German text to the modern spelling.
We fine-tuned [google/byt5-small](https://huggingface.co/google/byt5-small) on a modified version of the [DTA EvalCorpus](https://kaskade.dwds.de/~moocow/software/dtaec/) (1780-1901).
## Model description
### Demo Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("ybracke/transnormer-19c-beta-v02")
model = AutoModelForSeq2SeqLM.from_pretrained("ybracke/transnormer-19c-beta-v02")
sentence = "Die Königinn ſaß auf des Pallaſtes mittlerer Tribune."
inputs = tokenizer(sentence, return_tensors="pt",)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# >>> ['Die Königin saß auf des Palastes mittlerer Tribüne.']
```
Or use this model with the [pipeline API](https://huggingface.co/transformers/main_classes/pipelines.html) like this:
```python
from transformers import pipeline
transnormer = pipeline('text2text-generation', model='ybracke/transnormer-19c-beta-v02')
sentence = "Die Königinn ſaß auf des Pallaſtes mittlerer Tribune."
print(transnormer(sentence))
# >>> [{'generated_text': 'Die Königin saß auf des Palastes mittlerer Tribüne.'}]
```
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
### Framework versions
- Transformers 4.31.0
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.13.3