Papers
arxiv:2502.02904

ScholaWrite: A Dataset of End-to-End Scholarly Writing Process

Published on Feb 5
Authors:
,
,
,
,

Abstract

Writing is a cognitively demanding task involving continuous decision-making, heavy use of working memory, and frequent switching between multiple activities. Scholarly writing is particularly complex as it requires authors to coordinate many pieces of multiform knowledge. To fully understand writers' cognitive thought process, one should fully decode the end-to-end writing data (from individual ideas to final manuscript) and understand their complex cognitive mechanisms in scholarly writing. We introduce ScholaWrite dataset, the first-of-its-kind keystroke logs of an end-to-end scholarly writing process for complete manuscripts, with thorough annotations of cognitive writing intentions behind each keystroke. Our dataset includes LaTeX-based keystroke data from five preprints with nearly 62K total text changes and annotations across 4 months of paper writing. ScholaWrite shows promising usability and applications (e.g., iterative self-writing) for the future development of AI writing assistants for academic research, which necessitate complex methods beyond LLM prompting. Our experiments clearly demonstrated the importance of collection of end-to-end writing data, rather than the final manuscript, for the development of future writing assistants to support the cognitive thinking process of scientists. Our de-identified dataset, demo, and code repository are available on our project page.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.02904 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.02904 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.02904 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.