<< My question, oh list, is this: Have any of you had occasion to use selective OCR software that can be told not only "pull this information from here" but also can be told "this will be two letters and 8 digits"?>>

There are several solutions on the market that allow zonal OCR with templates that do exactly what you are asking.  For example, you can setup templates for the different types of timesheets.  Each template would have zones identified with the information you need to capture.  The individual zones can be configured to look for specific information and/or compare data to a known source and either force changes (OR is read as 0R) or send results not meeting the specified criteria (10 characters, ORNNNNNNNN, with a match to one of the EMP_NUM values from Emp.db.  

For the best results, look for OCR software that uses multiple OCR engines to evaluate the data.  OCR engines are like anything else, some are better at some things than others.  For the example above, I would like a software that would default the OR and then use an OCR engine that is extremely good at numeric characters to read the subsequent 8 numeric digits.  I would also consider whether the data being read is constrained/unconstrained machine/ hand print.  

For the very best performance, I would redesign the timesheet and optimize it for data capture.  Design would include anchors, constrained print, barcodes (3of9 for form recognition, PDF417 for data capture), automated form recognition, voting OCR engines, auto-calculation and comparison, and the ability to submit electronic timesheet data.  One tool that I used in a past life that was extremely good at doing this type of work was Cardiff (now HP) Teleform).  Properly done, processing of 6000 timesheets could be done in a couple of hours, including all corrections and updates.  Most of that time would be waiting for the processing to be completed.  Correction of data not meeting defined confidence parameters could be efficiently completed using ribbon editing.  Extracted data can be used to name the document for storage, pushed to your payroll application, and used to validate that you have received all of the required timesheets for the pay period.  

Bill Roach, CRM

Opinions are my own and not those of my employer or any other individual or entity.

This message may contain confidential information. If you are not the intended recipient, please notify the sender immediately and delete this email from your system.

List archives at
Contact [log in to unmask] for assistance
To unsubscribe from this list, click the below link. If not already present, place UNSUBSCRIBE RECMGMT-L or UNSUB RECMGMT-L in the body of the message.
mailto:[log in to unmask]