Pierre-Carl Langlais

Pclanglais

AI & ML interests

Open data & open LLMs

Recent Activity

published a model about 14 hours ago
LLMDH/350m_ocr
updated a model 1 day ago
LLMDH/350m_treasoning_complete
published a model 1 day ago
LLMDH/350m_treasoning_complete
View all activity

Organizations

AgentPublic's profile picture BigScience Data's profile picture Kheops SAS's profile picture Blog-explorers's profile picture OpenLLM France's profile picture ZeroGPU Explorers's profile picture INAGUA's profile picture PleIAs's profile picture :probabl.'s profile picture Social Post Explorers's profile picture LLM - Digital Humanities's profile picture

Posts 6

view post
Post
2862
We release today our first foundation model and experiment with a new category: specialized pre-training.

OCRonos-Vintage is a 124m parameters model trained end-to-end by Pleias on llm.c from 18 billion tokens from cultural heritage archives. Despite its small size it achieve nearly state of the art results for OCR correction of historical English sources. OCRonos-Vintage is also an historical model with an unusual cut-off date: December 29th, 1955…

We look forward to replicate this approach very soon on other "hard" tasks commonly associated with generalist LLMs/SLMs: RAG, function calling, summarization, document segmentation…

OCRonos-Vintage: PleIAs/OCRonos-Vintage
CPU Demo: PleIAs/OCRonos-Vintage-CPU
GPU Demo: PleIAs/OCRonos-Vintage-GPU
Our annoncement and call for specialized pre-training: https://huggingface.co/blog/Pclanglais/specialized-pre-training

Articles 7

Article
79

They Said It Couldn’t Be Done