# Classifying Text into DB07 Codes

This model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) fine-tuned to classify descriptions of activities into [NACE Rev. 2](https://ec.europa.eu/eurostat/web/nace-rev2) codes.


## Data
The data used to fine-tune the model consist of 2.5 million descriptions of activities from Norwegian and Danish businesses. To improve the model's multilingual performance, random samples were machine translated into the following languages:
- English
- German
- Spanish
- French
- Finnish


## Quick Start

```python
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("erst/xlm-roberta-base-finetuned-db07")
model = AutoModelForSequenceClassification.from_pretrained("erst/xlm-roberta-base-finetuned-db07")

pl = pipeline(
    "sentiment-analysis",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False,
)

pl("We sell clothes")
```