Transkribus
Transkribus is a powerful AI-based transcription software designed to convert scanned handwritten and print documents into editable text. Transkribus offers itself as a comprehensive digital solution (intuitive dashboard and controls, cloud storage, multiple formats). Transkribus is a popular choice for those working with historical documents, especially medieval texts and handwritten manuscripts.
- uses OCR and HTR technology to convert images into text
- can handle complex structures and document layouts
- 300+ public models to handle historical documents
- ability to train new AI models
- smart search across the entire collection
Pricing Tiers: Free (limited to 50 credits per month and reduced feature set); Scholar (19.99€/month or 99€/year); Team (69€/month or 399€/year). 1 credit = 1 handwritten page or 2 printed pages. An organization level license is also available.
Recommendations: For smaller projects, Transkribus may be a great intuitive solution. For projects involving a lot of pages or many documents, it may be better to go with a less expensive or free solution.
Arkindex
https://teklia.com/our-solutions/arkindex/
Arkindex is a platform developed by Teklia for the automatic processing of large collections of scanned documents. It can execute any document processing algorithm: OCR, HTR, feature extraction, captioning, translation, etc. Its architecture has been designed to be generic, enabling it to store any type of result, with generic and configurable types.
- can apply OCR and HTR to images
- customizable workflow design
- seamless integration with custom and open source components
- IIIF-based and accessible through REST API
- can host locally or in the cloud.
Pricing Tiers: 1) Arkindex Open-Source (user-managed) is Free for public and private projects. 2) Hosted by TEKLIA cost is dependent on volume. 3) Enterprise Edition (user-managed, but integrated with High-Performance Computing) offers more granular permissions and security for larger projects. Enterprise costs 1500€/month.
Recommendations: Arkindex might be a good solution for those interested in a cost-effective way of working with a large collection of texts or a corpus that spans many pages.
eScriptorium
https://escriptorium.rich.ru.nl/
eScriptorium is a web application offering a workplace to manage the various sets of a transcription campaign. Steps can be manual or automatic processes, and can be applied to printed documents or handwritten ones. The application uses Kraken as a segmentation/transcription engine.
- can apply OCR/HTR to images of printed and handwritten documents
- manual transcription through browser interface for editing segmentations and transcriptions
- can create new models or finetune existing ones to improve automatic recognition
- can import and export models and text transcriptions in a variety of formats
- accessible through a full REST API
Pricing Tiers: eScriptorium is provided as Free and Open Source Software, not as a service.
Recommendations: This is another promising free alternative to Transkribus, but it is currently only available by invitation.
MS Azure Document Intelligence
Azure AI Document Intelligence is an AI service that applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents automatically and accurately. Document Intelligence provides prebuilt models that will work for most scenarios, but can also accommodate custom models tailored for specific document layouts (usually for more complex business or scientific texts).
- ready-to-use models for common document types
- can handle complex structure and layout in documents
- can extract text, key-value pairs, and tables
- REST API provided for custom code solutions
- can extract formulae and barcodes
Pricing Tiers: At the free tier, 500 pages. At S0, $1.50/1000 pages for the first 1 million pages.
Recommendations: A good choice for relatively contemporary documents, including ones with fairly complex layouts. Document Intelligence does a good job of separating different regions of a page, identifying which text elements belong together (eg. for court transcripts, it will isolate the line numbers running down the left margin and keep them separate from the paragraphs of the reported dialog).
AWS Textract
Amazon Textract is a machine learning (ML) service that extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple OCR to identify, understand, and extract specific data (usually for business purposes). Relying on pretrained models, Amazon Textract can extract data in minutes instead of hours or days.
Pricing Tiers: Free tier (1000 pages/month for general text) for the first three months. Otherwise, $1.50/1000 pages for the first 1 million pages in a month.
Recommendations: A good choice for printed texts and more contemporary handwritten texts. Will likely struggle with historical handwritten documents since the training models are largely made up of contemporary documents of a business or technical nature.




