Monday, April 2, 2012

Gap Between Public Expectations and Archival Practice: The 1940 U.S. Census

         On the day of this writing, the U.S. National Archives and Records Administration has found itself facing bad press for the simple issue of misunderstanding the power of their own archival records. The non-indexed 1940 U.S. Census, which lists all information about those having survived the Great Depression and provides a snapshot of life the year before Pearl Harbor, has been released as of Monday, April 2, 2012. However, the system which NARA used crashed due to the huge number of hits and the lack of planning or contingency system in place for the Internet traffic flow. FOX News reported that "Nearly 2 million people flocked to the site in just the first few hours after the Archives posted a searchable database of materials from the 1940 national head count" (http://www.foxnews.com/us/2012/04/02/21-million-still-alive-from-140-census/).

       NARA's website and records typically have not received heavy traffic flow, something I would argue is because of the limited nature of online digital records apart from newspapers and some standard historical items over different time periods. Alexia, a website traffic reporting agency, gives the following report for the past 3 months on NARA's website and the regular amount of traffic flow they are accustomed to: "Archives.gov is ranked #13,659 in the world according to the three-month Alexa traffic rankings. About 45% of visits to it are bounces (one pageview only). The site's visitors view 4.2 unique pages each day on average. The fraction of visits to the site referred by search engines is about 19%. The time spent in a typical visit to Archives.gov is about four minutes, with 38 seconds spent on each pageview." Most of the viewers to NARA's site are females over the age of 65 years old, which for those of us in the archives profession take this to mean genealogists for the most part.

             As NARA releases the 1940s census in the most wired period of world history, they have simply overlooked a major tenant taught in any basic digital preservation or digital curation graduate archives course: plan for all possible digital access needs, database system requirements, and anticipate the volume of users who could attempt to access the data.  NARA is not use to the types of traffic flows that entertainment companies such as NBC, Hulu, Amazon, Ebay, or Apple are, and such unfamiliarity is hurting their reputation with the public. What is ironic is that the National Archives in one of the standard bearers for institutions throughout the U.S. on how to model their own digital repositories, and how to manage data storage. As NARA calls in a third-party vendor to fix problems with records that until yesterday were maintained as "confidential due to legal privacy restrictions," one has to wonder about the ability of archival institutions to keep pace in the digital world. Most archives debate metadata schemas while failing to plan for proper storage environments or use pre-formatted archival management systems with poor search engines to manage their online content. In a world where 12-year olds can write apps that are used by millions of people, the gaffe of NARA is in showing itself out of step with the digital world.

        In a poorly-structured step in the access module, NARA selected the 5-year old firm Inflection that runs the Archives.com website, along with Familysearch.org, pay sites for genealogical research, to host the 1940s Census, because NARA does not have the digital repository space or server space to manage the collections. A lot of this has to do with the failure of NARA's 10-year program to develop a national digital repository (see earlier post on national digital repositories on this blog). Interestingly enough, Inflection also runs a search company where individuals can pay to locate current records for individuals. Inflection's Archives.com executive VP made the following stunning comment: "John Spottiswood, executive vice president of site host Archives.com, touted worldwide availability of the massive database to millions of family researchers: 'We just hope not all at the same time.' He may not have gotten his wish" (http://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=H-DC&month=1204&week=a&msg=KsW7eK%2By6%2BeEXO1bri3DcA&user=&pw=). For one of the biggest records unveils in over a half century, the company NARA is using hopes that not everyone accesses the records at once!

     Of a deeper concern for young archival professionals who have been use to the proliferation of "cloud computing" talk in digital preservation and for digital repositories, one of the greatest concerns with cloud storage systems is that stuff is literally just in a cloud. The technology is too new for the large and vital data that many institutions wish to store in such systems. Many digital curators have raised concerns over cloud storage, due to the incompatibility of cloud systems with one another from other agencies. NARA relied on Archives.com's services for this census records launch, but Archives.com uses the cloud computing system of Amazon.com, a for-profit public merchandise vendor. Amazon was one of the first public cloud computing services in the world, but in their first few launches, Amazon's music cloud failed. Now, "Spottiswood said it might take time for the Amazon cloud system the site is using to accommodate all users. About three hours after the launch, the Archives blog advised: 'We are working with Amazon to get the site up to speed'" (http://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=H-DC&month=1204&week=a&msg=KsW7eK%2By6%2BeEXO1bri3DcA&user=&pw=).

    Somehow as a young archivist, I fail to trust Amazon.com to backup the census records from one of the country's most important and anticipated-released census. Why, with all the resources invested by NARA and encouraged by government institutions such as IMLS with grants focused on digitization projects, is the U.S. government so frail when it comes to digital storage of historic records? Currently, the U.S. spends more money and server space on monitoring Americans for terrorist activities than it seems to be on its heritage. One of the founding principles of a National Archives is to make the records easily and readily accessible to the public. I'm shocked that such a high digital curation standard bearer as NARA is not practicing what it preaches. This situation does not bode well for future digital repository systems, nor does it offer any confidence that a majority of the country's digital records will be accessible as we move deeper into the second decade of the 21st century.  

Sunday, February 12, 2012

I Demand A QR! QR Codes and the Archives, Part 2

Well, after a long delay, I offer my thoughts and approaches to utilizing QR codes in the processing and increasing access to archival collections. After three months of research, I located virtually nothing regarding the application of QR code technology to the archival workflow. The challenges of doing such a thing are immense, and at the heart of being able to release a new virtual dream in QR codes revolves around the URL. As QR codes mandate either constant updating of changing URL addresses (for example, in the university library environment, where university IT departments can change sites during redesign processes) or a stable base URL address from which to develop QR codes for archival materials, few mass processing experiments have occurred in this realm. Let's ponder, for a moment, how in a perfect world a QR code could work in archival processing: let's say you have a three-box collection of family letters, and you are processing the collection in an EAD-formatted finding aid. Rather than chose to process at item-level, you arrange the letters by creator then by year (possibly including month), and group those years within folders. After a completed finding aid is ingested into an online EAD finding aid repository at the institution, you print off QR codes on sheets of acid-free paper (regular Georgia-Pacific printer/copier white paper is acid-free and could be used to mitigate increased processing expenses) corresponding to each folder in the EAD finding aid folder list, and place the QR code in the front of the folder for use by patrons and archives' staff. With the added dimension of QR codes connected to the EAD folder list, a processor could attach tags, extra information learned during processing about the letters in the folder, or cross-reference similar content in other folders with the one the patron is exploring. The possibility here is of adding a social and scholarly element to EAD finding aids, not currently possible with paper finding aids or PDF finding aids typically found in most archives.

Beyond this, a processor could attach a QR code on a label to the collection boxes, which leads a patron to the box list or finding aid for that particular box/collection, without having to print out paper finding aids or locate the finding aid in a single master finding aid binder (especially when more than one individual needs access to the finding aid binder). With a simple camera phone, archives can expand tremendously the ability of the patron to freely explore the descriptive content for the collection prior to even exploring individual items within the collection. Applying tags to EAD finding aids (structured tags corresponding to common archival descriptive language/subject terms) will allow users, researchers, and staff to, with the snap of a camera phone, explore which other collections will have similar materials, which collection folders match specific items or folders in another collection, and possible assist archivists at an institution in figuring out what portions of a collection they have not made EAD-compatible. The archives could purchase several inexpensive, security-tagged smart phones with limited internal network/Internet capabilities that can be used to explore the archives' collections. If processing of collections is the point at which both EAD concepts and QR codes are applied, in a 5-10 year period, an archives could explode the scope of their social networking outreach, relevancy to younger generations, keep in-step with current digital researching trends, and save the time it will take years down the road to attempt to make their collections web-relevant or "e-discoverable." Instead of forcing archives staff to know or look up keywords and subject information on subjects they may not know, the staff could use the inter-connectedness of QR-coded EAD finding aids to locate ties between materials in a way that extreme cross-referenced 3"x5" card catalogs previously had in archives (though several major archives still utilize these systems, such as the Filson Historical Society of Louisville, Kentucky). Time will be saved, connections will be found, standardized archival terminology will be maintained, and users will feel like they are on a unique, self-developed treasure hunt for historical records.

Much the same way that genealogists utilize family trees to help trace family history or connections amongst individuals, so QR codes could become the virtual branches in archival processing amongst collections. I would also add that temporary QR codes could be printed off during the records survey and inventory period, ascribed to boxes or piles of material (using a bookmarking system with the QR code) and harnessed along the way to ensure all records or boxes have been investigated (or that you are note accidentally going over the same materials again, adding them to the inventory twice). Automatic lists could be generated to match the QR codes in such a way that will allow an archivist to continue to using the same QR code and merely change the title or name corresponding with it. This process would speed up the creation of EAD find aids, though new QR codes would be needed at the point of conversion to EAD for the finding aid. QR codes used in this manner would demand each institution to have a closed network space with the ability of the archivists to create unique URLs that it could apply at-will during the inventory or processing phase, in relation to QR code creation. A negative of this approach would be added waste of paper and ink as temporary QR code sheets are created.

As the QR code scanner/reader systems get better developed and more robust, they now are holding the ability to track the usage of the code. A New York-based company called BeQRious offers the following abilities with QR codes that correspond quite nicely with the needs or archives in processing: "Our code management dashboard lists all your QR codes for you, what campaigns they belong in, the content they resolve to, whether they are active or not, statistics and an option to mail them. You could even delete or edit your QR code from this interface. You could also specify to view all QR codes for a certain group or campaign. If you are not sure which QR code you’re looking for, you could search for it" (http://beqrious.com/qr-code-tracking/, emphasis added). Dynamic QR codes allow a user to alter the URL information, using the same QR code while relocating the URL--an option that would be greatly beneficial to archives and save on resources from reprinting or re-creating codes to match the archival collections: "Dynamic QR codes allow you to edit the code's destination at any time. These dynamic codes are a great solution to someone who wants to experiment with QR codes without having to constantly re-create these cute little squares!" (http://trakqr.com/).

A fun option that could be both a part of outreach, professional communication, internal institution development and programing, and other similar situations would be to print off a QR code book, containing on sheets of paper the QR codes matching archival collection finding aids or collection abstracts, as well as possibly collection folder QR codes. This book or binder could be carried with the archivist or archives manager to meetings where individuals can utilize their smart phones to look at information simply by passing around the binder. It will allow for interaction and compact advertisement of archival collections holdings in such small gatherings, giving individuals the option to explore more in-depth your holdings if they so desire and at their own rate, while giving other equal opportunity to explore the same materials in their own time.

While this all sounds great, the reality is this is not feasible as the state of archives in the U.S. stands as of now. Meissnner and Greene in their famous article on "More Product, Less Process," noted that a 1998 Association of Research Libraries study of its member institutions' special collections found roughly that one-third of collections are unprocessed. With the limits in staff, technological training, IT staff and network systems, as well as economic factors, having archival collections prepared in such a way to be able to utilize QR codes may not be realistic (http://ahc.uwyo.edu/documents/faculty/greene/papers/Greene-Meissner.pdf). EAD is still mostly utilized by major institutions for electronic finding aids--many small and medium-sized institutions have not developed metadata schemas or electronic finding aid formats (there is still a heavy reliance on PDF and Word document finding aids, if there are finding aids at all). Also, institutional leadership may resist the application of QR code technology in the same way that other more advanced technological tools are being resisted: agism, lack of exposure or technological understanding, and resistance to change can all play factors here. A more simply challenge to using QR codes in processing is a base issue of the smart phone itself. With the challenges archival institutions face regarding copyright laws, intellectual property protection, and other similar privacy concerns, allowing patrons and staff alike to use camera-based phones in an archival setting is a major concern. Not only could images be taken of materials restricted by donors or containing private personal information (such as telephone numbers and Social Security numbers), but replication of images by a patron and posting to social networking sites could harm the financial reliance of archives on charges for photo reproduction/scanned images. As mentioned previously, though, an archives could have closed-system smart phones that will allow users the benefits of QR codes while maintaining the protection of archival collections. There are solutions, but they demand planning and consistent implementation by all staff (including student workers and graduate students). Students who are users especially will have a hard time understand or accepting why they can use the archives' smart phones but not their own for the QR codes, when their phones are more familiar and easier to utilize.

Sunday, January 1, 2012

Online Collection Spotlight: Rubenstein Rare Book and Manuscript Library's "Civil War Women" Materials

A wonderful small online collection of archival material related to U.S. Civil War Women is presented by Duke University's David M. Rubenstein Rare Book and Manuscript Library. Called "Civil War Women," the collection of papers and diaries by three Civil War era women from the South--both Union and Confederate women--provides users and Civil War buffs with the involvement of women beyond the typical charitable groups involved in the Civil War effort. The collection documents have been scanned and transcribed for use and for the ability to see the original materials. The collections show a schoolgirl describing soldier occupation of Gallatin, Tennessee; a wife of a Union Army recruiter; and a female Confederate Civil War spy in Washington, D.C.

Though small in content, the simple formatting of the transcriptions, and the mix of ages of women and their experiences in the war, makes this a very nice collection website for use by libraries for researchers, by scholars, and by cultural institutions looking for multiple perspectives to give visitors during the 150th anniversary of the U.S. Civil War. I've seen a lot of these websites done by families whose ancestors were involved in the Civil War, and the website formatting was very similar. What Duke has done is maintain a simply family-styled format (in-keeping with what many more localized Southern researchers are use to encountering) that makes widely available unique materials. I like the site and its efforts. Check it out: http://library.duke.edu/rubenstein/collections/digitized/civil-war-women/

Active Digital Archival Depots: State Archives of Netherlands e-Depot

In 2007, the Dutch government and the Dutch State Archives took an approach towards active digital archival preservation and widespread access by the development of what they termed an "e-Depot." Rather than acting as many U.S. organizations have, the Dutch wanted to begin some sort of storage and distribution center for digital archival materials to allow ingestion of the materials and access across departments/to the general public. U.S. organizations have this "catch-up" mentality, which is trying to get paper documents into digital formats while managing digital materials currently being created at the same time, with the result that we are constantly trying to get caught up with technology and the volume of digital materials. The Dutch State Archives has shifted its focus from this mentality towards "the realisation of a fully-fledged digital depot, whereby the ongoing accessibility of digital archives can be ensured, whereby digital archival records can be more effectively delivered to a wide audience, and whereby the transfer of digital archives from government departments to the State Archives Service can be made more efficient. The digital depot will enable the State Archives Service to accept and manage digital archival material" (http://en.nationaalarchief.nl/information-management-and-creation-of-archives/sustainable-management-of-digital-archiva-4).

A centrally-accessible repository for all--whether government, citizen, or archivist. Now that sounds extremely democratic to me, much more so than the current state of U.S. digital archival systems. A lot of this has to do obviously with the differences amongst the Dutch government/society compared and the U.S. However, President Obama recently had a memo announced that shows the President wants the U.S. to develop plans by April 2012 to have a more centralized digital repository that will be managed by the National Archives and Records Administration (even though budget cuts by the President have cut NARA jobs or programs for the coming year). Chief U.S. Government Records Officer Paul Wester "said new laws and regulations may be needed to move the process of creating a more unified electronic records system forward" (http://www.computerworld.com/s/article/9222248/Obama_wants_feds_to_digitize_all_records). But as Computerworld, the group that conducted the interview with Wester noted, NARA has officially ended a 10-year project to create an online electronic records repository that would be accessible to all citizens. Heck, historic records aren't even available yet through any centralized database, let alone current records. If it's taken 10 years for a project to fall through, what hope under the U.S.'s current approach to electronic records management will the future citizens and government of this country have when it looks back to the 1990s-2010s? We will have no records, and our citizens will not have access to records the Constitution gives them right to due--not to cover ups or issues of government redaction--but to being unable to keep up with technology (the very technology the U.S. government promotes for archives to begin using through grants and state programs for which little funding is available). We have more money allocated for developing new technology for the military than we do for our nation's internal records access, preservation, and security.

The Dutch State Archives Digital Depot became operational in 2010, and now the nation has a means to preserve electronic records from this point forward, giving the Dutch time to catch up on preserving records in older formats/outdated electronic formats. The National Library of The Netherlands (KB) has been working with the State Archives by storing the nation's scientific and scholarly publications in the e-Depot: "Next to its national deposit collection, the e-Depot contains the digital archive of the Dutch academic institutional repositories, the Dutch web archive from 2008 onwards and the master archive of national digitisation projects" (http://www.openplanetsfoundation.org/members/national-library-netherlands). The KB's goal for their collections within the e-Depot is to move from 15 terabytes of articles as of 2008 to by 2013 having 700 terabytes. The Dutch e-Depot has become the digital storage center for electronic journal publications, and it is no wonder that most scholarly journal publishers in Britain and the Netherlands have used the Dutch State Archives to store the world's past and new electronic scholarly research.

Why has the U.S. lagged so far in their appreciation for records management digitally? Why have the Dutch moved so far ahead of the world when the U.S. basically developed the personal computer and so much of the world's technology out of Silicon Valley? One issue is that the U.S. is a relatively young nation that has not come to appreciate the importance of national and state records as other nations that have faced wars, territorial divisions, records destruction, and other issues such as the Dutch have faced. Between the 1600s and World War II, the Netherlands suffered enormously from conflicts and territorial divisions that made information about the country and its population be more vital in the national consciousness than the U.S. sees it in this country. The U.S. is losing many of their records by simple neglect, by poor security, and by a lack of focus on the benefits of information. Ironically, in the new information age, we are now worried about information loss, but only as it applies to digital information--the current information medium. The U.S. still lacks an appreciation for the preservation of past information. U.S. businesses have been the biggest contributors to this national information loss as mergers and business demands outweigh preserving identity. Companies and state agencies affect society so much, employ the great majority of U.S. citizens, and are a large part in the development of societal interactions. The companies and agencies' records shed light on these developments, but many of these records are gone now or in such a state of disrepair that rebuilding an organizational or cultural identity will lead to incomplete views r knowledge about specific portions of the U.S.'s development. It is a shame the information is being lost and seen as nice to save, but there not being enough time to go back and manage all the past records of the company or government.

I believe this to be the genius of the Dutch system: start now in preservation, no matter how inadequate due to technology changes, and be able to not have to worry 20 years from now about the time it will take to go back over the digital records. In the U.S., after having worked at a county records center and archives, the general population and even government officials do not understand or know what information is available, how to utilize it, where it came from, or how to access it. It does little good to have records digitized without context, which is what many government digitization projects are doing now: digitize this collection or agency, and put it together with other agencies' records. What the Dutch have done is state the importance of all records created within a country--whether by the government or by citizens themselves (either researchers, businessmen, etc.)--should be associated together in a structured system. These records define the nation and the people within the nation. It's also important to remember for the U.S. that if a centralized digital repository was begun today, we would not have as great of a cost to try preserving and ingesting records in old formats down the road. Even though we are in a recession, I believe the U.S. government needs to take an approach to their records like the government did during the Depression with the WPA to preserve historical information, lands, artistic accomplishments, and other such things that would give a nation suffering financially some pride in itself and the informational basis to rebuild after the troubled times.

I do not believe the U.S. government will be up to speed with the Dutch until at least 2020, by than which billions of documents will be in outdated formats, and--much like an issue I face right now at my current institution with U-matic video cassettes that document the organization's advertising approaches--the cost to transfer records further down the road greatly increases and becomes challenging as the equipment is gone. Even the National Archives of Estonia, a former Soviet Republic as of 1989, is in the middle of finishing a project for their national digital archival repository, similar to the Dutch in some ways (http://riigi.arhiiv.ee/en/digital-archive-development/&i=6). Centralized digital records depositories must be developed in the U.S. (whether centralized at the state or national level). The two closest U.S. state systems I can recall that have some centralization of their digital records are in Ohio (OhioLink) and in California (Online Archive of California). Even these, though, are not of the scale being attempted by the Dutch. If the U.S. cannot agree on metadata standards and infrastructure soon, there will be few digital records to describe or store. Although there are information professionals and digital archivists working on this issue, there is too much division in the nation's information systems to allow for a centralized digital depot right now.