Cloud-based simulation of page curling in the copying of documents

Bookscanner is a Greek SME whose business is the automatic digitisation of printed material with precision and reliability using state-of-the-art methods. Modern scanners can scan hundreds of pages per minute. However, they require physical alteration of the books to be copied. The holding edge must be cut and the books should be converted into packs of individual pages. This is a manual and destructive process. The new Bookscanner© product aims at making this physical alteration obsolete because its scanner can automatically turn pages and automate the whole process of scanning books without the need for manual intervention or damage. The new scanner is operational. However, one major improvement is planned which is addressed by this experiment: the removal of a page curling artifact that is introduced in images because of the physical properties of the books. The only current method for the correction of page curling is a painstaking process that slows down the scanning process and requires expensive equipment. This experiment addresses the development of a new method using Cloud-based HPC which will be faster and decoupled from the physical scanning. This will enable the enhancement of the current Bookscanner© software to offer this unique “de-curling” feature. This experiment will demonstrate a Proof of Concept based on the SaaS model. This will readily demonstrate the potential of the enhanced software for commercialisation and use by third party vendors. Vertoyo, also from Greece, is an SME which specializes in technology development and software service provisioning in the digitization field. HPC expertise was provided by The Laboratory of Robotics and Automation, Democritus University of Thrace. Arctur, the Slovenian HPC centre, was the HPC Provider.

Fortissimo_SuccessStory_808_Bookscanner.pdf

The Challenge

Digitisation of books is an important process, both for commercialisation and for preservation of older texts. The Bookscanner© product can automatically and physically turn pages and automate the scanning process. However, this process results in a 'page curling' effect where the pages are attached to the spine. The scanned pages need to be digitally flattened, a tedious and expensive process that this experiment aimed at improving and commercializing.

The Solution

A Deep Neural Network (DNN) was trained with simulated page curling of 1 million images. The algorithm takes two already cropped pages from a book image and outputs an artificially curled book page. Using the artificially curled book pages, a state-of-the-art deep Convolutional Encoder-Decoder (CED) Neural Network was trained in order to apply the de-curling process. After training, the DNN can de-curl newly scanned pages with very good success rate. Evaluation of the page de-curling problem showed accuracy of curling correction in over 90% in most cases. With the computational power provided by HPC, the training procedure of the DNN was at least 30 times faster than using a typical workstation.

Business Impact

The only current method for page curling correction is based on a projected laser grid that requires each page to be scanned twice. The solution (called CURLO) removes the need for additional laser grid projection equipment and provides a 50% improvement on the standard curling correction procedure.

As a result of this experiment, the CURLO solution can be offered as a post-processing service to accompany the Bookscanner© product. The collaboration with Arctur has allowed an improvement in the quality of batch-mode scanning. This will be offered as a Software as a Service (SaaS) framework for scanned page de-curling.

In addition to the digital content market, the partners in this experiment are ready to address the digitisation needs of a paperless economy, e.g. insurance and paperless banking. Especially regarding the banking sector, recent studies estimate that productivity improves up to 39% when electronic forms replace paper and workflows are used to streamline processes.

Benefits

  • CURLO has been added to the Bookscanner software, allowing automatic de-curling of pages, saving time compared to the previous method
  • This is a scalable solution thanks to the use of HPC and can provide many scanning application domains with automatic capabilities for the amelioration of digital artefacts introduced during the scanning process.
  • This is the first such service to be offered via the Fortissimo Marketplace, paving the road to further business development opportunities.
  • The monetary benefits for the next 3-5 years can lead to an increase of 10-15% in revenues for participating SMEs. In this case, the payback period for the CURLO investments is 3-4 years. This excess revenue is estimated at approximately €300,000 - €350,000 by the end of year 3 after the end of the project.

Organizations involved

End User: Bookscanner S.A
ISV: VERTOYO O.E
HPC Expert: Laboratory of Robotics and Automation, Democritus University of Thrace
HPC Provider & Host Centre: Arctur