PDF to Dataset
Convert PDFs to a dataset and upload to Hugging Face
Spaces and utilities for creating datasets and getting them on the Hub
Convert PDFs to a dataset and upload to Hugging Face
Note This Space extracts embeeded text from PDFs and pushes the resulting text to a Hugging Face Hub dataset
Convert PDFs to page images for dataset creation
Note This Spaces will convert a PDF(s) to a set of images per page and optionally push the images to a Hugging Face Dataset. Can be useful to help generate an initial dataset for annotation or further processing.
Create a Hugging Face dataset from text files
Note Corpus Creator is a tool for transforming a collection of text files into a Hugging Face dataset, perfect for various natural language processing (NLP) tasks. Whether you're preparing data for synthetic generation, building pipelines, or setting up annotation tasks, this app simplifies the process.