(NER) roberta-base : conll2012_ontonotesv5-english-v4

This roberta-base NER model was finetuned on conll2012_ontonotesv5 version english-v4 dataset.
Check out NER-System Repository for more information.

Dataset

conll2012_ontonotesv5
- Language : English
- Version : v4
Dataset Examples

Training 75187

Testing 9479

Dataset	Examples
Training	75187
Testing	9479

Evaluation

Precision: 88.88
Recall: 90.69
F1-Score: 89.78

check out this eval.log file for evaluation metrics and classification report.

                precision    recall  f1-score   support

    CARDINAL       0.84      0.85      0.85       935
        DATE       0.85      0.90      0.87      1602
       EVENT       0.67      0.76      0.71        63
         FAC       0.74      0.72      0.73       135
         GPE       0.97      0.96      0.96      2240
    LANGUAGE       0.83      0.68      0.75        22
         LAW       0.66      0.62      0.64        40
         LOC       0.74      0.80      0.77       179
       MONEY       0.85      0.89      0.87       314
        NORP       0.93      0.96      0.95       841
     ORDINAL       0.81      0.89      0.85       195
         ORG       0.90      0.91      0.91      1795
     PERCENT       0.90      0.92      0.91       349
      PERSON       0.95      0.95      0.95      1988
     PRODUCT       0.74      0.83      0.78        76
    QUANTITY       0.76      0.80      0.78       105
        TIME       0.62      0.67      0.65       212
 WORK_OF_ART       0.58      0.69      0.63       166

   micro avg       0.89      0.91      0.90     11257
   macro avg       0.80      0.82      0.81     11257
weighted avg       0.89      0.91      0.90     11257

Usage

from transformers import pipeline

ner_pipeline = pipeline(
    'token-classification', 
    model=r'djagatiya/ner-roberta-base-ontonotesv5-englishv4',
    aggregation_strategy='simple'
)

TEST 1

ner_pipeline("India is a beautiful country")

# Output
[{'entity_group': 'GPE',
  'score': 0.99186057,
  'word': ' India',
  'start': 0,
  'end': 5}]

TEST 2

ner_pipeline("On September 1st George won 1 dollar while watching Game of Thrones.")

# Output
[{'entity_group': 'DATE',
  'score': 0.99720246,
  'word': ' September 1st',
  'start': 3,
  'end': 16},
 {'entity_group': 'PERSON',
  'score': 0.99071586,
  'word': ' George',
  'start': 17,
  'end': 23},
 {'entity_group': 'MONEY',
  'score': 0.9872978,
  'word': ' 1 dollar',
  'start': 28,
  'end': 36},
 {'entity_group': 'WORK_OF_ART',
  'score': 0.9946732,
  'word': ' Game of Thrones',
  'start': 52,
  'end': 67}]

djagatiya
/

ner-roberta-base-ontonotesv5-englishv4

(NER) roberta-base : conll2012_ontonotesv5-english-v4

Dataset

Evaluation

Usage

Dataset used to train djagatiya/ner-roberta-base-ontonotesv5-englishv4