• Best Font for Optical Character Recognition?

    Author
    Topic
    #399642

    What are the best fonts for OCR, based on your experience rather than theory.

    I got into an argument today (what a surprise!) by saying that resumes, if printed, should use a commonly recognised font; I quoted Courier and Times New Roman, because I remembered those two from (15?) years ago with ZyIndex software, and felt that the serifs aided OCR, whereas Arial, while common, might suffer inaccuracy during recognition

    I don’t use OCR, but if YOU do, I’d like to hear the names of fonts which work fir YOU. I’m not too interested in hearing that Moldavia or Yilgarn or WingDings don’t work. I’d like a half-dozen fonts which I could suggest as being suitable for sending your hard-copy off into the world without knowing the explicit characteristics of the OCR in use by the recipient.

    Viewing 3 reply threads
    Author
    Replies
    • #773207

      Hi Chris

      From my very limited experience with OCR, I’ve noticed that serif fonts suffer from difficulty in recognizing a 1 (one) properly, which is often identified as an “i”, and similarly 0 (zero) is confused with “o”. I’m guessing the top serif on the “1” is confused with the dot over the “i”, and the aspect ratios of the “0” and “o” are too close to distinguish. Sans Serif come up better to my recollection. Australia Post recommends using a fixed pitch type font such as Courier 12 point, 10 pitch – but this is possibly due to other considerations.

      But the biggest variation I’ve seen is a result of the software quality and scanner settings. Some OCR software I’ve seen is not worth the medium it’s recorded on, and increasing scanning resolution from 200dpi to 400 has sometimes made the world of difference to with the software that does work (sometimes grin ). This stuff is all witchcraft to me though witch.

      Alan

      • #773221

        Hi Alan,

        > Australia Post recommends

        For my current purposes I’m suspicious of any recommendations along these lines (also “Canada Post recommends, JobMart recommends, Brill Agencies recommends etc.) because I suspect that it might be a limitation imposed by conservative management, or unknowledgeable workers.
        As you say, it could be due to other considerations, such as extremely old OCR software. I figure things have got better over the years.

        It is witchcraft to me too. That’s why I’m wondering who amongst the loungers is “up” on this and can give a suitable general recommendadtion.

        Ten years ago I toured the Canada Post sorting plant in Toronto; I was told then that they had hand-writing recognition stuff. Whether they have ever updated is a good guess, though. Their policy for typed addresses that failed to be recognised was to pass them through the machine twice more, in case that removed enough gloss from the envelope, then send them for three passes through the older, slower machines, in case reduced speed helped, … it makes sense to exhaust your automation before falling back to manual.

        • #773223

          Todays versions of OCR software, I have NOT had any problems with any font, even some of those obscure ones have worked.

          DaveA I am so far behind, I think I am First
          Genealogy....confusing the dead and annoying the living

          • #773470

            > NOT had any problems with any font,

            Dave, thanks for this feedback. I’m obviously out of date.

            Perhaps I should assemble a test-deck of documents and trry them out at a sales centre.

          • #773471

            > NOT had any problems with any font,

            Dave, thanks for this feedback. I’m obviously out of date.

            Perhaps I should assemble a test-deck of documents and trry them out at a sales centre.

          • #773921

            I think it depends on the application you require the OCR for.

            We’re currently implimenting a new cashiering system at work (in house development), and we’re using hardware to scan a line of text that includes the account number, amount due, and other small fields are represented.

            We’ve discovered, with the hardware we’re using, only a few fonts will work correctly. It’s just a matter of find the right font, and the right placement on the page, for the hardware. This hardware we’re using, connects between the keyboard, and the PS/2 port on the PC. I believe this is called a ‘wedge’ device.

            Now, I’d otherwise agree with you that they’re all mostly compatible, if you’re using OCR software to decode from flatbed scanner.

          • #773922

            I think it depends on the application you require the OCR for.

            We’re currently implimenting a new cashiering system at work (in house development), and we’re using hardware to scan a line of text that includes the account number, amount due, and other small fields are represented.

            We’ve discovered, with the hardware we’re using, only a few fonts will work correctly. It’s just a matter of find the right font, and the right placement on the page, for the hardware. This hardware we’re using, connects between the keyboard, and the PS/2 port on the PC. I believe this is called a ‘wedge’ device.

            Now, I’d otherwise agree with you that they’re all mostly compatible, if you’re using OCR software to decode from flatbed scanner.

        • #773224

          Todays versions of OCR software, I have NOT had any problems with any font, even some of those obscure ones have worked.

          DaveA I am so far behind, I think I am First
          Genealogy....confusing the dead and annoying the living

        • #773230

          [indent]


          … it makes sense to exhaust your automation before falling back to manual.


          [/indent]Reminds me of working in Indonesia many moons ago, when calculators first started to become common/ affordable there. Vendors seemed not to trust them, so results would often be double-checked on the ever-reliable abacus… in half the time.

          Alan

        • #773231

          [indent]


          … it makes sense to exhaust your automation before falling back to manual.


          [/indent]Reminds me of working in Indonesia many moons ago, when calculators first started to become common/ affordable there. Vendors seemed not to trust them, so results would often be double-checked on the ever-reliable abacus… in half the time.

          Alan

      • #773222

        Hi Alan,

        > Australia Post recommends

        For my current purposes I’m suspicious of any recommendations along these lines (also “Canada Post recommends, JobMart recommends, Brill Agencies recommends etc.) because I suspect that it might be a limitation imposed by conservative management, or unknowledgeable workers.
        As you say, it could be due to other considerations, such as extremely old OCR software. I figure things have got better over the years.

        It is witchcraft to me too. That’s why I’m wondering who amongst the loungers is “up” on this and can give a suitable general recommendadtion.

        Ten years ago I toured the Canada Post sorting plant in Toronto; I was told then that they had hand-writing recognition stuff. Whether they have ever updated is a good guess, though. Their policy for typed addresses that failed to be recognised was to pass them through the machine twice more, in case that removed enough gloss from the envelope, then send them for three passes through the older, slower machines, in case reduced speed helped, … it makes sense to exhaust your automation before falling back to manual.

    • #773208

      Hi Chris

      From my very limited experience with OCR, I’ve noticed that serif fonts suffer from difficulty in recognizing a 1 (one) properly, which is often identified as an “i”, and similarly 0 (zero) is confused with “o”. I’m guessing the top serif on the “1” is confused with the dot over the “i”, and the aspect ratios of the “0” and “o” are too close to distinguish. Sans Serif come up better to my recollection. Australia Post recommends using a fixed pitch type font such as Courier 12 point, 10 pitch – but this is possibly due to other considerations.

      But the biggest variation I’ve seen is a result of the software quality and scanner settings. Some OCR software I’ve seen is not worth the medium it’s recorded on, and increasing scanning resolution from 200dpi to 400 has sometimes made the world of difference to with the software that does work (sometimes grin ). This stuff is all witchcraft to me though witch.

      Alan

    • #773474

      Chris

      Since most of the known universe produces documents in Arial, you will be aware of the problems with
      smiling million adornment marmalade
      (not words from the subject of a recent spam message, but chosen to show the difficulty of picking out “i”(small i) and “l”(small l) when adjacent, and that “r” next to “n” looks like “m”), because of the letter shapes and unhappy choice of kerning. These cause real problems on my HP 3330MFP with Iris OCR software.

      On to speculation:
      I would say the easiest font to recognise is a fixed-pitch one, the only common one being Courier. This is also heavily serifed, which is a Good Thing (for OCR).
      After that, I would suspect that TNR is comparatively simple to recognise.

      Just for interest, does anyone know when the first attempts at OCR started? Or voice recognition? I’m suspecting we’re talkimg about at least 30 years… and they still haven’t got it 100% right!

      • #773546

        > common one being Courier. … serifed, … TNR ….
        These were the mainstays of my argument, and I was arguing from my guesses, speculation, and logic. I had no hard evidence to back up my theories.

        > when the first attempts at OCR started?

        If you include the mechanical font that appears on the foot of cheques, Control Data Corporation (CDC) used what seemed to be a recognition font in their 3300 series reference manuals. I was reading those back in 1970.

        • #773915

          Forgive me for being pedantic but the printing on the foot of cheques is MICR or Magnetic Ink Character Recognition not optical. Although perhaps it’s optically read now. I did have experience with optical character recognition of a similar font on NCR adding machine tapes used for data input by Reynolds & Reynolds an automotive dealership accounting service supplier I worked for back in the early 70s’

          • #774844

            > printing on the foot of cheques is MICR

            Quite so. The MICR was the closest I could think of to describe the appearance of the CDC material.

            Looking back, I remember that the CDC material seemed so futuristic, we were on the wave of computers, the threshold, and here was the Brave New Typeface to suit the times. It never did catch on, really.

          • #774845

            > printing on the foot of cheques is MICR

            Quite so. The MICR was the closest I could think of to describe the appearance of the CDC material.

            Looking back, I remember that the CDC material seemed so futuristic, we were on the wave of computers, the threshold, and here was the Brave New Typeface to suit the times. It never did catch on, really.

        • #773916

          Forgive me for being pedantic but the printing on the foot of cheques is MICR or Magnetic Ink Character Recognition not optical. Although perhaps it’s optically read now. I did have experience with optical character recognition of a similar font on NCR adding machine tapes used for data input by Reynolds & Reynolds an automotive dealership accounting service supplier I worked for back in the early 70s’

      • #773547

        > common one being Courier. … serifed, … TNR ….
        These were the mainstays of my argument, and I was arguing from my guesses, speculation, and logic. I had no hard evidence to back up my theories.

        > when the first attempts at OCR started?

        If you include the mechanical font that appears on the foot of cheques, Control Data Corporation (CDC) used what seemed to be a recognition font in their 3300 series reference manuals. I was reading those back in 1970.

    • #773475

      Chris

      Since most of the known universe produces documents in Arial, you will be aware of the problems with
      smiling million adornment marmalade
      (not words from the subject of a recent spam message, but chosen to show the difficulty of picking out “i”(small i) and “l”(small l) when adjacent, and that “r” next to “n” looks like “m”), because of the letter shapes and unhappy choice of kerning. These cause real problems on my HP 3330MFP with Iris OCR software.

      On to speculation:
      I would say the easiest font to recognise is a fixed-pitch one, the only common one being Courier. This is also heavily serifed, which is a Good Thing (for OCR).
      After that, I would suspect that TNR is comparatively simple to recognise.

      Just for interest, does anyone know when the first attempts at OCR started? Or voice recognition? I’m suspecting we’re talkimg about at least 30 years… and they still haven’t got it 100% right!

    Viewing 3 reply threads
    Reply To: Best Font for Optical Character Recognition?

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: