Cloud-based simulation of page curling in the copying of documents
The Challenge
Digitisation of books is an important process, both for commercialisation and for preservation of older texts. The Bookscanner© product can automatically and physically turn pages and automate the scanning process. However, this process results in a 'page curling' effect where the pages are attached to the spine. The scanned pages need to be digitally flattened, a tedious and expensive process that this experiment aimed at improving and commercializing.
The Solution
A Deep Neural Network (DNN) was trained with simulated page curling of 1 million images. The algorithm takes two already cropped pages from a book image and outputs an artificially curled book page. Using the artificially curled book pages, a state-of-the-art deep Convolutional Encoder-Decoder (CED) Neural Network was trained in order to apply the de-curling process. After training, the DNN can de-curl newly scanned pages with very good success rate. Evaluation of the page de-curling problem showed accuracy of curling correction in over 90% in most cases. With the computational power provided by HPC, the training procedure of the DNN was at least 30 times faster than using a typical workstation.
Business Impact
The only current method for page curling correction is based on a projected laser grid that requires each page to be scanned twice. The solution (called CURLO) removes the need for additional laser grid projection equipment and provides a 50% improvement on the standard curling correction procedure.
As a result of this experiment, the CURLO solution can be offered as a post-processing service to accompany the Bookscanner© product. The collaboration with Arctur has allowed an improvement in the quality of batch-mode scanning. This will be offered as a Software as a Service (SaaS) framework for scanned page de-curling.
In addition to the digital content market, the partners in this experiment are ready to address the digitisation needs of a paperless economy, e.g. insurance and paperless banking. Especially regarding the banking sector, recent studies estimate that productivity improves up to 39% when electronic forms replace paper and workflows are used to streamline processes.
Benefits
- CURLO has been added to the Bookscanner software, allowing automatic de-curling of pages, saving time compared to the previous method
- This is a scalable solution thanks to the use of HPC and can provide many scanning application domains with automatic capabilities for the amelioration of digital artefacts introduced during the scanning process.
- This is the first such service to be offered via the Fortissimo Marketplace, paving the road to further business development opportunities.
- The monetary benefits for the next 3-5 years can lead to an increase of 10-15% in revenues for participating SMEs. In this case, the payback period for the CURLO investments is 3-4 years. This excess revenue is estimated at approximately €300,000 - €350,000 by the end of year 3 after the end of the project.
Organizations involved
End User: Bookscanner S.A
ISV: VERTOYO O.E
HPC Expert: Laboratory of Robotics and Automation, Democritus University of Thrace
HPC Provider & Host Centre: Arctur