Sunday, October 23, 2011

Volunteer Opportunity: California Digital Newspaper Collection


Here's another volunteer opportunity for those of you who like to work from home in your pajamas: the California Digital Newspaper Collection (CDNC) is looking for a bit of assistance. Please see the following notice to find out how you can help with this worthwhile project.

You just never know what you might find for your personal research while you're working!

*   *   *

The California Digital Newspaper Collection (CDNC) is pleased to announce the implementation of User Text Correction in its archive.

The CDNC is the largest, freely-accessible archive of California newspapers. The collection contains nearly 475,000 pages - and growing - ranging from 1846 to the present. It is available for searching at http://cdnc.ucr.edu. The project is managed and hosted by the Center for Bibliographical Studies and Research (CBSR) at the University of California, Riverside.

User Text Correction (UTC) allows individual users of the CDNC to correct computer-generated text. When newspapers are processed from microfilm or paper originals, optical character recognition (OCR) software is used to generate searchable text. This OCR text, however, is often not perfect, particularly for older newspapers. By correcting this OCR text, users improve the CDNC by making more of the text searchable for other users. The CDNC is the first digital newspaper archive in the US that we are aware of to offer user text correction. To learn more, see the help section of the CDNC.

The CDNC is over six years old and has been supported in part both by the National Digital Newspaper Program, a joint effort by the National Endowment for the Humanities and the Library of Congress, and by the Institute of Museum and Library Services under the provisions of the Library Services and Technology Act, administered in California by the State Librarian. The CDNC has also worked with local institutions around the state to digitize their newspapers, and has started a project to collect current PDFs from California publishers. Please contact us at cbsrinfo@ucr.edu for more information on both projects.

The User Text Correction tool is part of the Veridian software used to host the CDNC. Veridian is developed by DL Consulting and is used by a number of prominent libraries around the world, including the National Library of New Zealand, the Singapore National Library, Princeton, and Cornell. The CDNC is the first archive to make UTC available and has worked closely with DL Consulting on its development, including beta testing by a number of CDNC users.

CDNC users have already corrected thousands of lines of text. Help us make the CDNC a better archive for all, and experience for yourself how fun - and addictive - correcting OCR text can be.

Copyright by © San Luis Obispo County Genealogical Society

0 comments:

Post a Comment

Your comments are appreciated!