Print

Print


We are getting close to implementing an imaging program and we, too, have
decided that PDF is the best format for us. The deciding factor for us was
the ability to bookmark places within a large document for quick access to
the exact point. We had decided against OCR, not only because of the
"garbage" issue, but because the terms we would search for appear repeatedly
in the same document. As a result, we'd have dozens of hits for a given term
and would have to wade, page by page, through them to get to the spot we
wanted to see. One product, Laserfiche, did give a list of the hits in
context (showing 5-10 words on either side of the selected term), which was
better than just going to the first instance in the document, then hitting
"next" until you found your spot. However, our users still found that to be
too much to see. In addition, some of the phraseology we use isn't the exact
same wording as is in the document (e.g. although we call it the "radius
clause" that exact term doesn't appear anywhere in our lease documents). We
have discussed the possibility of adding hidden-text or OCR just for the
additional functionality to be available to those who may want to use it --
the jury's still out on that issue though.

Nolene Sherman
[log in to unmask]

Visit OCARMA's Website at <http://www.ocarma.org/>



-----Original Message-----
From: Mike Mackey [mailto:[log in to unmask]]
Sent: Friday, March 31, 2000 6:50 AM
To: [log in to unmask]
Subject: Re: IMAGING v. OCR (OR A COMBINATION)


I have found that using Adobe Acrobat Capture provides the best of both
worlds in this typical dilemma.

There are three kinds of PDF files: Standard, Image, and Image+Hidden Text.
Acrobat Capture can convert single- multi-page TIFF files to any of these
formats.

The Image format is simply an image file inside a PDF wrapper.  It is about
the same size as the TIFF source file, and the only advantage to this format
is that you can view it easily with the free Acrobat Reader or Acrobat
browser plug-in.

The Image+Hidden Text format includes OCR information from the scanned
image, but displays the image, not the converted text.  A location map is
maintained in the background, allowing you to search for words and jump to
them in the document, even though you are viewing an image of the original
document.

The Standard PDF format is a word-processor-like representation of the
document based on OCR results and font matching.  The nice part about this
format is that Acrobat Capture will insert a graphic in any location that
can't be reliably recognized as text.  The result is a document that looks
very much like the original, although not exactly.  The file size is
typically quite a bit smaller than the original TIFF image, especially for
high-quality, legible images.



> -----Original Message-----
> From: Records Management Program [mailto:[log in to unmask]]On
> Behalf Of Love, Tom (NIP)
> Sent: Friday, March 31, 2000 7:21 AM
> To: [log in to unmask]
> Subject: Re: IMAGING v. OCR (OR A COMBINATION)
>
>
> Bob and Listers, I am very interested in this topic, especially on two
> fronts: Error rates for state of the art OCR, and some basic cost
> comparisons for the two types of systems.
>
> Tom B. Love, CRM
> National Immunization Program
> (404) 639-8093
>
>
> -----Original Message-----
> From: Robert A Ottaway [mailto:[log in to unmask]]
> Sent: Thursday, March 30, 2000 6:58 PM
> To: [log in to unmask]
> Subject: IMAGING v. OCR (OR A COMBINATION)
>
>
> I am interested in peoples views on the merits of using Imaging or OCR
> software
> in electronic records systems.
>
> I have recently moved from a Local Government Office (in Queensland,
> Australia)
> where we were using Imaging within an electronic Records System called
> AusInfo.
> This was quite successful and we dispensed with the great majority of hard
> copy
> files and were operating almost a complete electronic system. Users were
> able to
> view complete images of Records from their desktops.
>
> At the State Government Office (Main Roads) where I now work we
> use a system
> called Texpress. Documents are scanned in, OCR software is used and the
> documents are converted to text files. Searching for text is very
> effective,
> but
> we need to continue to use all hard copy files.
>
> One advantage with OCR is that the complete text of the document is
> searchable.
> We are not relying on Records Officers to summarize the content of the
> document
> and fear missing out on key words etc.  Converting to text means we only
> need a
> fraction of the disk space that we would for imaging.
> Some disadvantages with OCR are the time involved and that
> complete accuracy
> is
> not guaranteed.
>
> Has anyone got any statistics/comments on the relative fors and
> againsts for
> each method of capturing documents?
> I believe the ideal system would incorporate both. Do you agree?
> Does anyone know where I can see a system here in Queensland that
> uses both
> OCR
> & Imaging?
>
> Bob Ottaway
> Business Development Officer (Support Services)
> Main Roads North Coast-Hinterland District
> Gympie, Queensland, Ausralia.
>
>
>
>
>
>
> *************************************************************
> Opinions contained in this e-mail do not necessarily reflect
> the opinions of the Queensland Department of Main Roads, or
> of Queensland Transport. If you have received this electronic
> mail message in error, please immediately notify the sender
> and delete the message from your computer.
>