BROIDI in a Nutshell
Resolving the address text in a mail-piece image has to be done in two phases. First, you must locate the area where the address block is, and then pass that area into an optical character recognition tool that converts the area to machine readable text. You cannot usually pass the whole of the mail-piece to optical character recognition as it usually takes too long, and, more importantly, you lose the important location information with which to distinguish the recipient's address from the sender's address. In mail sorting, confusing those two is counter-productive and thus expensive.
Finding the address block in a letter with a few simple rules is easy: apply some smart binarization that filters out non-relevant smears and logos, find clusters of connected components, i.e., blobs, of a suitable size, ignore the top x % of the mail-piece (unless dealing with a flat), in case of multiple areas favor the lower one or the one closest to the centre, etc. Depending on the mail-flow, a set of simple rules might recognize up to 70 percent of the address blocks, and tweaking it some more, you might get to 80 percent. However, for a mail-sorting system that is not good enough. Furthermore, the layout of mail-pieces varies from country to country, which means the manual rules have to be adjusted for each deployment. Manual work is expensive and it does not generalize very well.
Syslore approaches the problem of locating the address block, or region-of-interest in postal lingo, from machine learning angle. We train a machine learning system with a set of sample images. The system generalizes the training samples into a model. Syslore BROIDI (an acronym for Bayesian Region-Of-Interest Detection Instrument) is fundamentally a probabilistic system. There are many phases and mechanisms in the process, but the essential decision boils down to this: given this kind of mail-piece, where is the recipient's address likeliest to be.
This letter image depicts a mail-piece from the trial data set, where Syslore BROIDI has found several candidate areas. The likeliest address block is plotted and bordered in blue, and the weaker candidates are in green.
To gauge the generalization ability of Syslore BROIDI, we trained it with various amounts of data. The graph below illustrates the effectiveness of Syslore BROIDI when trained with training data randomly sampled from a national-level postal operator's mail-flow. The evaluation was done with a separate 8000 mail-piece evaluation set randomly drawn from the same flow. We employ three different effectiveness metrics: precision (or positive predictive value), recall (or sensitivity) and F1-score. Here, they are used to measure area, that is, pixels with respect to the minimal rectangle encompassing the address block. A 100% precision means that the result contains exactly the address text and no additional barcodes, customer codes or any other non-address text or blob. A 100% recall means the result contains all the required address pixels and we miss none. Obviously, these two have a trade-off: we get 100% recall by returning the whole letter image, but the precision would be very low. By returning just one correct pixel we obtain 100% precision, but a very low recall. For system comparisons, we need a single evaluation measure. To this end, we combine precision and recall into F1-score, which is just the harmonic mean of the two. Unlike normal average, the harmonic mean is small if either of its components is small.
So, how much data is needed? In this evaluation, you get a decent model with just 100 samples. After that improvements require more and more training data. Still, one percentage point improvement translates to millions of letters in a year in a large mail-flow. Although quantity has a quality of its own, the quality of the training data is more important. The training data must be a representative sample of the real mail-flow. In other words, it must show the system what kind of mail-pieces are to be expected.
The training data does not come about by itself, it needs to be made. The data associates the orientation of the image and the address block coordinates with the mail-piece image. The illustration on the right depicts the Syslore tool with which the training data is produced (available for licensed users). Integrating Syslore BROIDI to it is effectively bootstrapping: based on earlier examples, the tool suggests a candidate address block for a new image. In most cases it is correct, and the user merely needs to acknowledge it. In some cases the address block must be tagged manually.