Here you find links to different datasets for various LLiS projects.



Dataset contents: Human gaze data during self-paced reading of real-world English text (5247 tokens) containing interruptions, pre- and post-test scores

Number of participants: 50

Contact: Francesca Zermiani,

The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:

Francesca Zermiani, Prajit Dhar, Ekta Sood, Fabian Koegel, Andreas Bulling, and Maria Wirzberger. 2024. InteRead: An Eye Tracking Dataset of Interrupted Reading. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9154–9169, Torino, Italy. ELRA and ICCL.

Dataset license agreement

This dataset - with all the files it contains - is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (CC BY-NC-SA 4.0). By using this dataset, you agree to the license terms. The major license terms include:
  • Attribution: You must give appropriate credit to the original creators of the dataset.
  • Non-Commercial: You may not use the dataset for commercial purposes.
  • Share Alike: If you remix, transform, or build upon the dataset, you must distribute your contributions under the same license as the original.


The full dataset can be downloaded here

Zum Seitenanfang