Topic: Moving documents from paper to bits to text @ AskWoody

Moving documents from paper to bits to text

This topic has 4 replies, 5 voices, and was last updated 11 years, 5 months ago.

Author

Topic

New Reply

Kathleen Atkins

Member

December 18, 2013 at 12:37 pm #492518

BEST SOFTWARE

Moving documents from paper to bits to text

By Lincoln Spector

Remember the paperless office? Never happened; we still have piles of paper that take up too much room, can be difficult to search, and can’t be encrypted. OCR software lets you scan important documents and turn them into searchable PDFs. But the technology is still far from perfect.
On the other hand, you’ll miss a lot of junk, too.

The full text of this column is posted at windowssecrets.com/best-software/moving-documents-from-paper-to-bits-to-text/ (paid content, opens in a new window/tab).

Columnists typically cannot reply to comments here, but do incorporate the best tips into future columns.[/td]

[/tr][/tbl]

Viewing 2 reply threads

Author

Replies

WSDavidFB
AskWoody Lounger

December 19, 2013 at 4:57 pm #1430193

Thanks, Lincoln.
Another option you didn’t mention is getting the software with a scanner. I got Fujitsu’s ScanSnap S1300i. It’s a full duplex scanner (both sides) that makes scanning office docs from business cards to 8.5×14 docs fast and easy. Much easier than a flatbed. Auto-corrects and adjusts all the typical things. (colour or not, duplex or not, straightening, etc.) You can set it to default to various formats, and with or without OCR. I typically have it scan to PDF, then OCR as a separate step those documents that would benefit by it. (I also scan photos, notes etc) It came with ABBYY – the only limitation being that it checks the Meta tags to ensure it was scanned with the ScanSnap.

I’ve now processed thousands of pages with it – the old file cabinets, archives, shurlock books, and binders of grad work. Vastly easier than a page by page flatbed. And now all fully searchable and quotable.

I’ve also used ABBYY and Fujitsu scanners professionally in a shop that processed thousands of pages to PDF daily so I knew they were both excellent and high quality.

Reply | Quote
MrJimPhelps
AskWoody MVP

December 19, 2013 at 5:08 pm #1430198

How accurate are your scans? My experience and thought is that you don’t get perfect accuracy with the scans, that you always need to proofread after scanning.

I haven’t done it in a good while, so maybe things have improved.

Group "L" (Linux Mint)
with Windows 10 running in a remote session on my file server

Reply | Quote
- Anonymous
  Inactive
  
  December 20, 2013 at 6:12 pm #1430424
  
  I think you’re wrong about “Window’s own search tool” not finding words inside pdfs. I just checked that with Windows 7 SP1 by typing an unusual word which occurs in a dozen of my pdfs into the Start Button’s search box. It instantly popped up all of those pdfs in the search results window under “documents”. (That said, I still greatly prefer X1 Search over Windows search – far more flexible, and also instantly displays, with the search term highlighted, the files it finds if they are a common file type (.doc, .xls, pdf, etc).) It will also index and search Outlook, although I don’t use that feature.
  
  Also, for those applications that require proofing, editing, etc, you might also consider Omnipage, from Nuance. I haven’t used ABBYY FineReader, but from reading your description, it sounds like Omnipage is about the same price ($150 list) and will do everything you describe, plus more. Omnipage may be a bit more complicated to use because of the extra capabilities. It includes the ability to scan multipage documents, including large books, where is worth while to optimize the recognition accuracy to include the specific peculiarities of the font used in the specific document being scanned. I’ve found this particularly valuable when scanning old geneological documents found in libraries or on the web, where the documents may have already been copied or scanned in a sub-optimal manner, with relatively poor quality.
  
  Did you check how ABBYY FineReader handles multi-column documents? If a document is to undergo further editing, it’s really important (and not easy) for the OCR app to properly interpret the column-to-column flow.
  
  And a final comment: If a document is undergo further editing after OCR, I recommend using a plain text output from the OCR application, rather than .doc. I’ve found that, while the .doc file may look just fine, it is very difficult to do additional formatting, because the styles applied by Omnipage (and I assume also ABBYY FineReader – I think this is a fundamental problem) are very complex and also somewhat haphazard – changing to fit the local context.
  
  Reply | Quote
WSmiketate
AskWoody Lounger

January 18, 2014 at 11:13 am #1434404

I also think Lincoln is wrong about “Window’s own search tool” not finding words inside PDFs.
However, to get it working may need installation of the “Adobe PDF iFilter”.
I think the latest is version 11, and there are 32-bit and 64-bit variants, but a Google search will reveal all.

Reply | Quote

Viewing 2 reply threads

Plus Membership

Donations from Plus members keep this site going. You can identify the people who support AskWoody by the Plus badge on their avatars.

AskWoody Plus members not only get access to all of the contents of this site -- including Susan Bradley's frequently updated Patch Watch listing -- they also receive weekly AskWoody Plus Newsletters (formerly Windows Secrets Newsletter) and AskWoody Plus Alerts, emails when there are important breaking developments.

Welcome to our unique respite from the madness.

It's easy to post questions about Windows 11, Windows 10, Win8.1, Win7, Surface, Office, or browse through our Forums. Post anonymously or register for greater privileges. Keep it civil, please: Decorous Lounge rules strictly enforced. Questions? Contact Customer Support.

Moving documents from paper to bits to text

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Key Links

Remembering Woody

Moving documents from paper to bits to text

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Login and Registration

Key Links

Remembering Woody