On the day of this writing, the U.S.
National Archives and Records Administration has found itself facing bad press
for the simple issue of misunderstanding the power of their own archival
records. The non-indexed 1940 U.S. Census, which lists all information about
those having survived the Great Depression and provides a snapshot of life the
year before Pearl Harbor, has been released as of Monday, April 2, 2012.
However, the system which NARA used crashed due to the huge number of hits and
the lack of planning or contingency system in place for the Internet traffic
flow. FOX News reported that "Nearly 2 million people flocked to the site
in just the first few hours after the Archives posted a searchable database of
materials from the 1940 national head count"
(http://www.foxnews.com/us/2012/04/02/21-million-still-alive-from-140-census/).
NARA's website and records typically
have not received heavy traffic flow, something I would argue is because of the
limited nature of online digital records apart from newspapers and some
standard historical items over different time periods. Alexia, a website
traffic reporting agency, gives the following report for the past 3 months on
NARA's website and the regular amount of traffic flow they are accustomed to:
"Archives.gov is ranked #13,659 in the world according to the three-month
Alexa traffic rankings. About 45% of visits to it are bounces (one pageview
only). The site's visitors view 4.2 unique pages each day on average. The
fraction of visits to the site referred by search engines is about 19%. The
time spent in a typical visit to Archives.gov is about four minutes, with 38
seconds spent on each pageview." Most of the viewers to NARA's site are
females over the age of 65 years old, which for those of us in the archives
profession take this to mean genealogists for the most part.
As NARA releases the 1940s census
in the most wired period of world history, they have simply overlooked a major
tenant taught in any basic digital preservation or digital curation graduate
archives course: plan for all possible digital access needs, database system
requirements, and anticipate the volume of users who could attempt to access
the data. NARA is not use to the types
of traffic flows that entertainment companies such as NBC, Hulu, Amazon, Ebay,
or Apple are, and such unfamiliarity is hurting their reputation with the
public. What is ironic is that the National Archives in one of the standard
bearers for institutions throughout the U.S. on how to model their own digital
repositories, and how to manage data storage. As NARA calls in a third-party
vendor to fix problems with records that until yesterday were maintained as
"confidential due to legal privacy restrictions," one has to wonder
about the ability of archival institutions to keep pace in the digital world.
Most archives debate metadata schemas while failing to plan for proper storage
environments or use pre-formatted archival management systems with poor search
engines to manage their online content. In a world where 12-year olds can write
apps that are used by millions of people, the gaffe of NARA is in showing
itself out of step with the digital world.
In a poorly-structured step in the
access module, NARA selected the 5-year old firm Inflection that runs the
Archives.com website, along with Familysearch.org, pay sites for genealogical
research, to host the 1940s Census, because NARA does not have the digital
repository space or server space to manage the collections. A lot of this has
to do with the failure of NARA's 10-year program to develop a national digital
repository (see earlier post on national digital repositories on this blog).
Interestingly enough, Inflection also runs a search company where individuals
can pay to locate current records for individuals. Inflection's Archives.com
executive VP made the following stunning comment: "John Spottiswood,
executive vice president of site host Archives.com, touted worldwide
availability of the massive database to millions of family researchers: 'We
just hope not all at the same time.' He may not have gotten his wish"
(http://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=H-DC&month=1204&week=a&msg=KsW7eK%2By6%2BeEXO1bri3DcA&user=&pw=).
For one of the biggest records unveils in over a half century, the company NARA
is using hopes that not everyone accesses the records at once!
Of a deeper concern for young archival
professionals who have been use to the proliferation of "cloud
computing" talk in digital preservation and for digital repositories, one
of the greatest concerns with cloud storage systems is that stuff is literally
just in a cloud. The technology is too new for the large and vital data that
many institutions wish to store in such systems. Many digital curators have
raised concerns over cloud storage, due to the incompatibility of cloud systems
with one another from other agencies. NARA relied on Archives.com's services
for this census records launch, but Archives.com uses the cloud computing
system of Amazon.com, a for-profit public merchandise vendor. Amazon was one of
the first public cloud computing services in the world, but in their first few
launches, Amazon's music cloud failed. Now, "Spottiswood said it might
take time for the Amazon cloud system the site is using to accommodate all
users. About three hours after the launch, the Archives blog advised: 'We are
working with Amazon to get the site up to speed'"
(http://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=H-DC&month=1204&week=a&msg=KsW7eK%2By6%2BeEXO1bri3DcA&user=&pw=).
Somehow as a young archivist, I fail to
trust Amazon.com to backup the census records from one of the country's most
important and anticipated-released census. Why, with all the resources invested
by NARA and encouraged by government institutions such as IMLS with grants
focused on digitization projects, is the U.S. government so frail when it comes
to digital storage of historic records? Currently, the U.S. spends more money and
server space on monitoring Americans for terrorist activities than it seems to
be on its heritage. One of the founding principles of a National Archives is to
make the records easily and readily accessible to the public. I'm shocked that
such a high digital curation standard bearer as NARA is not practicing what it
preaches. This situation does not bode well for future digital repository
systems, nor does it offer any confidence that a majority of the country's
digital records will be accessible as we move deeper into the second decade of
the 21st century.
There is a difference between doing a poor job on storage/backup of digital records and having your web site crash because it got too much traffic. Yeah, it sucks that the site could not support the amount of visitors on Day 1 of the 1940 census being available. But, on the other hand, I have got to applaud them for wanting to make that available, freely and digitally, on the very first day that they were legally able to do so.
ReplyDeletePeople want everything, and they want it online, and they want it free, and they want it now. You can't always have everything the way you want it. I have heard people in project management say, "Cost, schedule, or quality. Pick the 2 that are most important to you. I'll do what I can about the third."
I must agree that they should have known that Day 1 access to the 1940 census (for free) would draw a crowd. They either grossly underestimated the size of the crowd or they did what they could (bandwidth-wise) and hoped for the best (which was apparently not good enough in this case).