# Classifying Text into DB07 Codes This model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) fine-tuned to classify descriptions of activities into [NACE Rev. 2](https://ec.europa.eu/eurostat/web/nace-rev2) codes. ## Data The data used to fine-tune the model consist of 2.5 million descriptions of activities from Norwegian and Danish businesses. To improve the model's multilingual performance, random samples were machine translated into the following languages: - English - German - Spanish - French - Finnish ## Quick Start ```python from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("erst/xlm-roberta-base-finetuned-db07") model = AutoModelForSequenceClassification.from_pretrained("erst/xlm-roberta-base-finetuned-db07") pl = pipeline( "sentiment-analysis", model=model, tokenizer=tokenizer, return_all_scores=False, ) pl("We sell clothes") ```