New Quarter – Tons of Work Ahead

March 8, 2010

We begin our Spring Quarter at Rose-Hulman today and what a way to start it.  I started with an 8 AM lightening round orientation for a senior design class.  I was given 15 minutes and had to speak fast and to the point.  While it was the quickest class I’ve even done, it certainly helped to wake me up.  Now I am ready to take on the day, the week, and the new quarter.  We’ve got our work cut out for us.  We’ll have training for our new ILS system (Millennium by Innovative) which will be implemented this summer.  We are right now planning for events for National Library Week in April.  We are doing some party on the last day of the week but are also going to do one of those “READ” poster campaigns.  I got it started by creating one of myself.  We will be doing some major weeding of the collection to make room for possible library “modifications.”  We are moving forward with our library liaison program with each of the departments.  I am not sure how that will go, but it’s definitely a worthwhile experiment.  I hope to complete scanning all of the Modulus yearbooks and finally be done with that.  That all depends on how many hours my student worker can work.  There are about 15 more books to do.  I will evaluating some EBSCO products.  I am about to submit a final draft of an ASEE conference paper that has been “accepted pending changes.”  I have to help plan for the IOLUG (Indiana Online Users Group) spring meeting.  I will be presenting on using mobile devices to access commercial databases.  I am getting an iPhone tomorrow to begin planning for that.  I will also have to plan for the ASEE conference in June.  And of course, I will have to continue to update those AtoZ records.  OK, now I’ve scared the crap out of myself writing all that down.  Time get to work.  CHOP CHOP!


Digitization of Modulus (student yearbook) Now Complete Through 1980

February 4, 2010

Bookmark and Share

Rose-Hulman Modulus Yearbooks

The digitization process of the Modulus, the student yearbook for Rose-Hulman Institute of Technology, is moving along at a swift pace due our diligent student worker.  Elizabeth is producing high quality scans at a very fast pace.

We have scanned and put online all yearbooks through 1980.  1982, 1985 and 1987 are already online.  1981 and 1983 are scanned and 1984 is on the works.  We hope to have the 1980s finished by the end of the month and the remaining 10 completed by the end of the school year.

We are able to conduct this process much faster than in the past because we are only doing one scan, a high resolution TIFF from which the OCR feature in CONTENTdm does a decent job importing the text.

http://www.rose-hulman.edu/Archives/modulus.html


UPDATE on “Editing OCR Transcript Field in CONTENTdm”

December 15, 2009

Bookmark and Share

http://thisthatotherthing.wordpress.com/2009/12/15/update-on-edit…d-in-contentdm/
To update an earlier post titled “Editing OCR Transcript Field in CONTENTdm” I would like to confirm that this process is at least 10 times faster!  It is faster because within the CONTENTdm Project Client, you can go from one page to another VERY quickly.  With normal fonts, most pages do not need editing at all.  If you wait until after you have upload a compound object and edit it each page in the web administrator module, you spend lots of time waiting for each page to open and close, far more time then you actually spend editing any pages.  I just uploaded and checked the OCR of our 1948 yearbook and it took no time at all.


Editing OCR Transcript Field in CONTENTdm

October 22, 2009

Bookmark and Share

Are you a user of CONTENTdm 5.x?  Have you fooled around with the OCR feature for TIFF images?  If you have not or have and found it frustrating, here are some tips.  First, it works best with text with basic, easy to read fonts; the larger the better.  Like most OCR software, the smaller the font the more likely there will be mistakes in the OCR text.  The same goes for fancy fonts.  We scanned a yearbook from 1901 that used this font that was similar to Old English and we had to make corrections on almost every line.  But even with good clear text, there are bound to be issues and sometimes images can be interpreted as text and so a string of strange characters will be entered into the transcript field.

Here is the BIG TIP!! If you are building a compound object of many pages such as a yearbook and using the OCR feature, edit the transcript fields for each page while still in the CONTENTdm Project Client BEFORE uploading the object and its files.  This method is much faster than editoing the transcript fields once it has been uploaded.  I have found this out the hard way.  I uploaded about 5 yearbooks and then had my students find each page in the web administrator module.  This is very inefficient as you have to first search for the page, then open and edit it, and then close it.  All this is a slow process for each page.  A much quicker way is to do it right in the CONTENTdm Project Client, after you have built the object, but before you upload it.  You can edit one page right after another much faster.


Everything you need to know about USB 3.0

October 8, 2009

Bookmark and Share

http://thisthatotherthing.wordpress.com/2009/10/08/everything-you-need-to-know-about-usb-3-0/
This article explains the ins and outs of the new USB version 3.0.  For those who don’t know the difference between version 1 and 2, version 2 was 40 times as fast as version 1.  Yes, VERY fast.  That is why Firewire lost some of its wind.  Well, USB 3.0 os 10 times faster than USB 2.0.  Lets out this into perspective.

The new specification is rated 10 times faster than USB 2.0, which has a maximum transfer speed of 480Mbps.
In comparison, USB 3.0 has a theoretical peak throughput of 5Gbps. This means that USB 3.0 is capable of transferring a 25GB file in approximately 70 seconds.
If that doesn’t warrant a shout of “whoosh!” then what does? In contrast, USB 2.0 would take around 14 minutes to perform the same task. And you’d be twiddling your thumbs for around 9 hours if you used USB 1.1.

I can see this coming in handy for our Digital Archives project when I have to transfer folders of huge TIFF images from the workstations my students work on to my PC for upload into CONTENTdm.  It will also come in handy for when I back up my data at home onto an external hard drive.  I’ll be building a new PC next Feb or March.  I’ll have to make sure I get a motherboard that has USB 3.0.  For the full article, go to http://www.techradar.com/news/computing/everything-you-need-to-know-about-usb-3-0-638185


Digital Archives display at Homecoming a Huge Success

September 27, 2009
Homecoming 2009 Digital Archive Tent

Homecoming 2009 Digital Archive Tent

Yesterday (Saturday Sept 26th), Rachel (my library director) and I had a booth at Homecoming to show off our Digital Archives project to anyone interested in learning more about it.  Our target audience (those we expected to be interested in it) was alumni and they loved it.  Most were not aware of its existence and were fascinated.  Many, especially the older alum were interested in telling stories which I enjoyed.  The most fascinating story was from a 1949 grad who started as a Freshman in 1946 right out of World War II on the GI Bill.  Back then the only dorm was Deming Hall which was mostly for Freshmen.  He said that first night he hardly slept because all night he was awakened by screams from former soldiers suffering from nightmares and flashbacks.  There were a dozen or so that they eventually had to make space for in the basement so the others could sleep.  I don’t know if that is documented in any way in our Archives, or how many people living today know that, but to me that is an incredible story and a part of Rose-Hulman history most people don’t know about.

All of our traffic came before the game and was a nice steady pace for two people.  This Digital Archives is a very valuable asset to the Institute and one that we need to do a better job at promoting.  We would definitely like to do this again next year and maybe include something that is also library related.


Modulus YearBooks being uploaded again

September 14, 2009

I am finally at the point where I am able to start uploading yearbooks again.  Last week I uploaded 1905, 1907, 1940, and 1975.  Still to come in the coming week(s) are 1901, 1909, 1913, 1928, 1936, and Oct 1943.  Our goal is to have every yearbook through 1945 completed by Homecoming on Sept 26th for our portion of the Wabash Valley Visions and Voices Extravaganza.  Once we are finished through 1945, we will continue through each decade, filling in the missing years chronologically.  The time consuming part of this is that I need to do some page sorting before and after each book is uploaded.  The completed yearbooks can be viewed at http://www.rose-hulman.edu/Archives/modulus.html


CONTENTdm and LOTS of Patience

September 10, 2009

If you are a user of CONTENTdm and have recently switched to version 5, you no doubt have discovered the trials and tribulations of trying to replicate some of your routine processes under older versions.  Wracked with bugs galore, the more complex your projects are, the more problems you have run into.  Since its release this past Spring, OCLC has released two version updates that I am aware of and another one is expected this coming fall.  At Rose-Hulman, we do not operate our own server, but rather piggy back on Indiana State University’s collaborative project, the Wabash Valley Visions and Voices, a digital memory project dedicated to the documentation and the preservation of the region’s history and heritage in print, pictures, and sound in the Wabash Valley region in west central Indiana and east central Illinois.  Our biggest undertaking over the last several years has been to digitize our entire collection of student yearbooks.  With about two part time student workers at any time during the school year and myself conducting quality control, it is a slow process.  To make the process even more time consuming, the process consists of scanning each page three times, once for pdf, once for a master tiff, and once for OCR of the text files which needed to be formatted with break tags for better viewing in page and text layout.  I know there are better ways of doing this, but measuring the quality of the scans, files sizes, and display options, I have chosen the long and tedious route.

Although a slow process, this has worked out quite well for us until the release of version 5.  While I have been faced with countless road blocks, newly discovered bugs, and lots of hair pulling and teeth gnashing, I may just have discovered a better, faster approach to digitizing yearbooks, one that just may allow us to finish the rest of them within the year or shortly thereafter.

First, let me point out a few of the road blocks I have run into.  First, when creating simple document compound objects built from pdf files, I am no longer able to import transcript files.  It used to work and should still work, but it does not; the text simply is not there.  The pdf files were not scanned with the text imbedded into them. I have tried this in the past and was not happy with the OCR results.  I’ve also recently tried uploading the tiff images and telling it to use the pdf files as display images but then it places the pages out of order.  In fact telling it to use any other external display images places the pages out of order.  External text files can only be imported with jpg of tiff images, but not pdf.  So I finally accepted the idea of using the tiff images but still was having problems with pages being out of order.  What I discovered is that with Version 5, if you are going to use transcript files, you have to have a transcript file for every single page, and not just those that have text you would like included.  So the solution was to create a text page for all those pages that didn’t have a transcript before.  So a simple page with the word “the” (a stop word) did the trick.  That one page had to be saved for every page that needed a transcript file.

The good news.  Yes, there is some good news from all of this.  For the remainder of the yearbooks to be completed, we may only need to create master tiff images and then use CONTENTdm’s built in OCR feature.  My initial tests show that clear text with simple fonts works well as where fancy fonts or text that is too small results in lots of errors.  So it may be that for newer yearbooks, we can simply create a tiff image and OCR the text upon import.  For some of the older yearbooks with fancy fonts, we may still need to create external OCR transcript files.  So far, I am encouraged, the true test will come in the coming weeks as I start importing these books.


Follow

Get every new post delivered to your Inbox.