Sunday, July 15, 2007

Converting a Scanned Image to Text

The exact way to scan an image into Adobe Acrobat depends on which version of Acrobat that you are using. With earlier versions you go into the File menu and select Import then Scan. With more recent versions you go into the File menu and select Create PDF then From Scanner. You can then select the device driver to use for the scan and commence scanning the page. You can continue scanning as many pages as you want and they will load in one below the other. Just let Acrobat know when you are finished. You should now save the scanned image to avoid having to rescan if your computer crashes during the following process.

To convert the text in your scanned image from a picture of the text into actual text (and hence reduce the size of the resultant file) is also slightly different between versions. With earlier versions you go into the tools menu and select Paper Capture then Capture Pages while with later versions you go into the Document menu and select Paper Capture then Start Capture. This will run the character recognition process and attempt to convert everything on the pages that you select into text.

Acrobat cannot do a perfect job of converting your text so anything it cannot properly identify will be left as a graphic. To convert these to text you go into Paper Capture again and this time select Show Capture Suspects. This will highlight all of the sections of your document that Acrobat thinks are text but which it was unable to convert. You can then go into Paper Capture a third time and select Show First Suspect or Find First OCR Suspect (depending on which version of Acrobat that you are using) to start stepping through these one at a time to manually correct them.

You should now save your document again (which should now be much smaller).

If you want to transfer the text from your acrobat document into some other program, you can do so one page at a time by using edit then select all (or CTRL-A) followed by edit then Copy (or CTRL-C). This copies the text from the current page (without the formatting unfortunately) to the clipboard from which you can paste it into the program of your choice.


Copied from:http://www.felgall.com/dtpac1.htm

No comments: