Post
2041
📢 If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content:
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py
What do you need to know about Spacy NER models:
☑️ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
☑️ Library has a pipeline for optimized request handling in batches.
☑️ Architecture: DNN embedding-based models (not transformers)
🤖 List of models (or see screenshot below):
https://huggingface.co/spacy
📋 Supported NER types:
https://github.com/explosion/spaCy/discussions/9147
⚠️ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism
🚀 Performance for sentences (en):
Model: spacy/en_core_web_sm 🔥 530 sentences per second 🔥 (similar to larger solutions)
🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py
What do you need to know about Spacy NER models:
☑️ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
☑️ Library has a pipeline for optimized request handling in batches.
☑️ Architecture: DNN embedding-based models (not transformers)
🤖 List of models (or see screenshot below):
https://huggingface.co/spacy
📋 Supported NER types:
https://github.com/explosion/spaCy/discussions/9147
⚠️ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism
🚀 Performance for sentences (en):
Model: spacy/en_core_web_sm 🔥 530 sentences per second 🔥 (similar to larger solutions)
🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner