Matching, Recognition, Results

The Best-Performing Address Matching and Recognition Solutions for
Postal and Logistics Companies

Syslore BROIDI

Syslore BROIDI is the smartest Region of Interest block detector in the industry

Product Overview

In postal business, Region of Interest (ROI) is defined as an area in a mail piece image that contains or depicts some specific kind of information. Typically, the region of interest is the recipient address, but there are all kinds of interesting regions, such as stamps, logos, bar codes, special symbols, or sender addresses, which may be interesting in terms of revenue protection or some special delivery options. 

BROIDI is a system designed for detecting ROI automatically from an image in 40-100 milliseconds. The B in BROIDI stands for Bayesian, a form of modeling which makes the system data-driven: the system parameters are tuned automatically in a training process involving little manual work. The training data consists of mail piece images with annotated ROI areas. BROIDI simply reads these images and generalizes probabilistic models for ROI location.

As an example, with a carefully sampled data-set of 8000 mail-pieces, BROIDI reaches 94-95% accuracy on the total Finnish mail-flow.

Download an evaluation version

Commercially Proven Technology

Syslore BROIDI is part of Syslore OCR and used in several high volume installations, both in national and private postal companies. BROIDI is patent pending technology.

Product Benefits

BROIDI is a system for Region-Of-Interest detection that is:

  • Data-driven: All modeling is based on generalizations of training data using applied machine learning techniques.
  • Simple: Only three parameters required for training and only the BROIDI model for production.
  • Effective: Depending on the dataset, the accuracy hovers around 90-95%.
  • Efficient: Depending on the resolution of the images, a decision takes on average 20-40 milliseconds.
  • Versatile: Can deal with hand-written and machine printed data.
  • Flexible: all the algorithms and models are easily replaceable.

Licensing and Evaluation

The evaluation version is fully functional (with the exception that you cannot create new models) and allows you as much time as you need to ensure that it meets your requirements. We offer free ad-hoc support during development so if you have any technical queries please do drop us an email.

For commercial licensing options, please view the licensing structure.

Frequently Asked Questions

Please see the FAQ for most often asked questions.


BROIDI in a Nutshell

Resolving the address text in a mail-piece image has to be done in two phases. First, you must locate the area where the address block is, and then pass that area into an optical character recognition tool that converts the area to machine readable text. You cannot usually pass the whole of the mail-piece to optical character recognition as it usually takes too long, and, more importantly, you lose the important location information with which to distinguish the recipient's address from the sender's address. In mail sorting, confusing those two is counter-productive and thus expensive.


Candidate Areas

Finding the address block in a letter with a few simple rules is easy: apply some smart binarization that filters out non-relevant smears and logos, find clusters of connected components, i.e., blobs, of a suitable size, ignore the top x % of the mail-piece (unless dealing with a flat), in case of multiple areas favor the lower one or the one closest to the centre, etc. Depending on the mail-flow, a set of simple rules might recognize up to 70 percent of the address blocks, and tweaking it some more, you might get to 80 percent. However, for a mail-sorting system that is not good enough. Furthermore, the layout of mail-pieces varies from country to country, which means the manual rules have to be adjusted for each deployment. Manual work is expensive and it does not generalize very well.

Syslore approaches the problem of locating the address block, or region-of-interest in postal lingo, from machine learning angle. We train a machine learning system with a set of sample images. The system generalizes the training samples into a model. Syslore BROIDI (an acronym for Bayesian Region-Of-Interest Detection Instrument) is fundamentally a probabilistic system. There are many phases and mechanisms in the process, but the essential decision boils down to this: given this kind of mail-piece, where is the recipient's address likeliest to be.

This letter image depicts a mail-piece from the trial data set, where Syslore BROIDI has found several candidate areas. The likeliest address block is plotted and bordered in blue, and the weaker candidates are in green.


F1 Score

To gauge the generalization ability of Syslore BROIDI, we trained it with various amounts of data. The graph below illustrates the effectiveness of Syslore BROIDI when trained with training data randomly sampled from a national-level postal operator's mail-flow. The evaluation was done with a separate 8000 mail-piece evaluation set randomly drawn from the same flow. We employ three different effectiveness metrics: precision (or positive predictive value), recall (or sensitivity) and F1-score. Here, they are used to measure area, that is, pixels with respect to the minimal rectangle encompassing the address block. A 100% precision means that the result contains exactly the address text and no additional barcodes, customer codes or any other non-address text or blob. A 100% recall means the result contains all the required address pixels and we miss none. Obviously, these two have a trade-off: we get 100% recall by returning the whole letter image, but the precision would be very low. By returning just one correct pixel we obtain 100% precision, but a very low recall. For system comparisons, we need a single evaluation measure. To this end, we combine precision and recall into F1-score, which is just the harmonic mean of the two. Unlike normal average, the harmonic mean is small if either of its components is small.


Training Tool

So, how much data is needed? In this evaluation, you get a decent model with just 100 samples. After that improvements require more and more training data. Still, one percentage point improvement translates to millions of letters in a year in a large mail-flow. Although quantity has a quality of its own, the quality of the training data is more important. The training data must be a representative sample of the real mail-flow. In other words, it must show the system what kind of mail-pieces are to be expected.

The training data does not come about by itself, it needs to be made. The data associates the orientation of the image and the address block coordinates with the mail-piece image. The illustration on the right depicts the Syslore tool with which the training data is produced (available for licensed users). Integrating Syslore BROIDI to it is effectively bootstrapping: based on earlier examples, the tool suggests a candidate address block for a new image. In most cases it is correct, and the user merely needs to acknowledge it. In some cases the address block must be tagged manually.

syslore_broidi_147x28_color_web.png

Download a free evaluation version