Are you a user of CONTENTdm 5.x? Have you fooled around with the OCR feature for TIFF images? If you have not or have and found it frustrating, here are some tips. First, it works best with text with basic, easy to read fonts; the larger the better. Like most OCR software, the smaller the font the more likely there will be mistakes in the OCR text. The same goes for fancy fonts. We scanned a yearbook from 1901 that used this font that was similar to Old English and we had to make corrections on almost every line. But even with good clear text, there are bound to be issues and sometimes images can be interpreted as text and so a string of strange characters will be entered into the transcript field.
Here is the BIG TIP!! If you are building a compound object of many pages such as a yearbook and using the OCR feature, edit the transcript fields for each page while still in the CONTENTdm Project Client BEFORE uploading the object and its files. This method is much faster than editoing the transcript fields once it has been uploaded. I have found this out the hard way. I uploaded about 5 yearbooks and then had my students find each page in the web administrator module. This is very inefficient as you have to first search for the page, then open and edit it, and then close it. All this is a slow process for each page. A much quicker way is to do it right in the CONTENTdm Project Client, after you have built the object, but before you upload it. You can edit one page right after another much faster.