Transkribus: Automated Text Recognition for Historical Documents

Louise Seaward, University College London (United Kingdom)

Abstract:

Transkribus (https://transkribus.eu/Transkribus/) is a research infrastructure for the automated recognition, transcription and searching of handwritten historical collections. Transkribus is the main output of the EU-funded Recognition and Enrichment of Archival Documents (READ) (http://read.transkribus.eu/) project. READ’s mission is to make archival material more accessible through the development and dissemination of Handwritten Text Recognition (HTR) and other cutting-edge technologies. These innovations make it possible for computers to automatically recognise documents written at different time periods and in various languages and formats. The recognition of text is just the beginning. From structured pages, information such as dates, titles and page numbers can be extracted and this will be of huge benefit to researchers, archivists and members of the public who wish to explore large collections of historical documents. This paper will summarise the technical workings of handwriting recognition, providing evidence of its accuracy and exploring some of the issues which arise when processing documents with difficult handwriting or complex layouts. It will discuss real-world examples to show how numerous scholars and archives are already benefiting from this technology: from providing full-text search of digital collections to establishing a more efficient workflow for scholarly editing projects.

 

« back to „DARIAH-CZ Workshop on Digital Humanities 2018“