Deform-and-Recover: Scene Text Recognition Robust to Corruptions

1 Hanbat National University 2 Chungnam National University
Teaser figure
Dataset that can contain various noises that can occur in the logistics environment In this paper, we introduce a text recognition dataset with logistics environment corruption.

Abstract The modern scenes text recognition (STR) technology based on deep neural networks achieved high accuracy in a noise-free environment, but performance decreased significantly in an environment containing noise generated in the real world. The reason is that if the distributions of the training data and the test data are mismatched, the performance in the test environment is greatly reduced. A few techniques have been proposed to improve performance in an test environment where noise not included in the training data may occur, but studies have not been applied or analyzed in the field of STR. In this work, we propose a technique to be robust against noise that is not included in the training data in the text recognition task, and disclose an inventory text dataset (ITD) dataset containing real world noise. Deform-and-Recover (DAR) we proposed is an adversarial data processing technique. It makes the text recognition model robust to real world noise without additional computational overhead during testing. The DAR algorithm not only significantly improves accuracy in the ours new ITD dataset, but also improves accuracy in the ICDAR2015 2077 public benchmark dataset.


Our ITD is collected through smartphones in various stores with real-world information. We crop approximately 20,000 words from 7,502 item-tag, box, and statement images and then annotate them with corresponding text labels. Since we have directly compiled a real-world inventory, our data contain various damaged images.

Dataset statistics

Sample images in our (a) A normal image without damage. (b) An image with motion blur caused by the user's movement. (c) An image with viewpoint distortion caused by the user's shooting angle. (d) An image damaged by various illumination conditions. (e) An inverted image

Dataset statistics

Circular charts of the statistical analysis of each corruption type in the IMD dataset. (a) Viewpoint changes, (b) various illumination conditi ons, (c) motion blur, and (d) IMD category.

Dataset statistics

Comparison of existing real-world datasets to our ITD. Our ITD does not contain cursive text, hard occlusion, and low-resolution images, but it is a unique dataset containing user-recorded closeup shots of target objects using smartphones


It is very important to construct a systematic and accurate inventory management algorithm in the modern distribution industry. If a suitable inventory management algorithm is not established, it is difficult to accurately determine the logistics inventory, which directly affects production management and sales.

However, it is very expensive to establish an accurate and fast inventory management algorithm. For example, a logistics management algorithm relying on a laser-based barcode reader exhibits a very high initial cost, and a laser-based barcode reader cannot recognize multiple barcodes, so this approach is not preferred in small-scale stores. Therefore, we propose an optical character recognition (OCR)-based inventory management algorithm to resolve this practical problem.

Since the OCR-based inventory management algorithm is a camera-based algorithm, it does not require a sensor such as a laser because it only needs RGB information, and in contrast to a laser-based barcode reader, it scans the entire logistics at once. However, despite the advantages of this RGB information-based algorithm, RGB information contains a major disadvantage in that it is highly sensitive to various damaged images in the real-world.