DIY AI For Journalists
Compiling resources useful for journalists building prototypes with AI
174🔥Note This Space provides a version of Whisper (a speech to text model) with speaker diarization. This allows you to transcribe audio containing speech along with information about who is speaking.
pyannote/speaker-diarization
Automatic Speech Recognition • Updated • 6.51M • 938Note This model allows you to perform diarization (identification of who is speaking in audio)
copenlu/scientific-exaggeration-detection
Text Classification • Updated • 16 • 3Note This model can measure the causal claim strength of a scientific sentence, which can be used to compare two sentences for exaggeration in causal claim strength.
149PDF OCR
📝Convert PDF to text using OCR
Note A space that allows you to perform OCR on PDF documents
5Grobid CRF only
🌍Extract bibliographic data from academic papers and patents
Note GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications.
3Coconut
🥥Explore text data with various visualization tools
Note Coconut Library Tool is an all-in-one data mining and textual analysis tool
tomaarsen/span-marker-bert-base-uncased-keyphrase-inspec
Token Classification • Updated • 20 • 11Note This is a Named Entity Recognition model trained to extract keywords from a text.
38Argilla Space
✍Note Sometimes it may be useful to create your own training data for training or evaluating machine learning models. Tools like Argilla can help with the process of creating these annotations.