DROIDI in a Nutshell
Resolving the address text in a mail-piece image has to be done in two phases. First, you must locate the area where the address block is, and then pass that area into an optical character recognition tool that converts the area to machine readable text. You cannot usually pass the whole of the mail-piece to optical character recognition as it usually takes too long, and, more importantly, you lose the important location information with which to distinguish the recipient's address from the sender's address. In mail sorting, confusing those two is counter-productive and thus expensive.
Finding the address block in a letter with a few simple rules is easy: apply some smart binarization that filters out non-relevant smears and logos, find clusters of connected components, i.e., blobs, of a suitable size, ignore the top x % of the mail-piece (unless dealing with a flat), in case of multiple areas favor the lower one or the one closest to the centre, etc. Depending on the mail-flow, a set of simple rules might recognize up to 70 percent of the address blocks, and tweaking it some more, you might get to 80 percent. However, for a mail-sorting system that is not good enough. Furthermore, the layout of mail-pieces varies from country to country, which means the manual rules have to be adjusted for each deployment. Manual work is expensive and it does not generalize very well.
Syslore approaches the problem of locating the address block, or region-of-interest in postal lingo, from deep learning angle. We train a deep learning system with a set of sample images. The system generalizes the training samples into a model. Syslore DROIDI (an acronym for Deep-learning Region-Of-Interest Detection Instrument) is fundamentally a probabilistic system. There are many phases and mechanisms in the process, but the essential decision boils down to this: given this kind of mail-piece, where is the desired field likeliest to be.
This parcel image depicts a mail-piece from the trial data set, where Syslore DROIDI has found several ROI areas. Real life application is to find and recognise data fields from CN22 and CN23 labels of international consignments for customs.
DROIDI employs three different effectiveness metrics: precision (or positive predictive value), recall (or sensitivity) and F1-score. Here, they are used to measure area, that is, pixels with respect to the minimal rectangle encompassing the address block. A 100% precision means that the result contains exactly the address text and no additional barcodes, customer codes or any other non-address text or blob. A 100% recall means the result contains all the required address pixels and we miss none. Obviously, these two have a trade-off: we get 100% recall by returning the whole letter image, but the precision would be very low. By returning just one correct pixel we obtain 100% precision, but a very low recall. For system comparisons, we need a single evaluation measure. To this end, we combine precision and recall into F1-score, which is just the harmonic mean of the two. Unlike normal average, the harmonic mean is small if either of its components is small.