Friday, September 28, 2007

Why Delete?

Yesterday, I attended a discussion at the Wilson Library posing the question...Storage is Cheap: Why Select? Here's the premise:

Storage media for digital information are extremely cheap and getting
exponentially cheaper over time. The price of a terabyte of hard drive
space is a few hundred dollars, and in a decade it will be less than a
dollar. The cost of the expertise of well-trained information
professionals, on the other hand, is quite high and likely to increase
over time. So shouldn't we stop worrying about selection and just capture
and keep as much material as possible?


There are some drawbacks to keeping everything, including the cost of storage, and maintenance (file formats and media change, for example, moral and ethical questions (do I want this propaganda or hate speech to be available to the public), and impacts of preserving institutional records on liability.

The main point of contention was the value of selection, where the archivist's choice of what to preserve is in itself a valuable data point. In essence, the archivist is applying the values and understanding of material at the time of selection, which can preserve the context of a collection for future viewers. Another key point is that, even if it's preserved, it's not findable. A Google search that returns 5000 hits is great, but if the item shows up as hit 4999, it's not findable.

Fine, but I came away thinking that the game really has changed, and we're relying too much on the present to see how the future is developing. Paul Jones really hit the key point. It's preserved, it is not going away, even if you think you have deleted it...deal with it. There are backups, things leak to the web, or are documented through other channels.

I wonder how much impact an individual archive can have on our understanding of an event, time, or place in the read/write web era! An archive/archivist seems like a throwback idea. This relates to Paul's point, but I see the future of archives as distributed. Storage will be cheap enough to keep everything, search algorithms will improve, and the cost of preserving media will continue to decline (but free the formats!). We will throw the data up on the web in widely distributed formats, and the power of (buzzword points) the long tail, collective wisdom, and the value added by participation will turn the web into a huge, search able, participatory archive of everything. I notice that I've departed from pinning preservation through traditional institutions, because this can be seen more broadly, but I can still see great value in the editing, selection, and context provided by an archivist to a narrow, specialized collection of data.

There are also so many new sources of data, and I wonder how these play into the idea of archives and collections. How about personal archives, life-blogs, uploaded media, records of digital communications, and the coming deluge of data from sensors in the environment. Nobody in the present can imagine how all of these data sources may be used in combination by a future researcher or viewer. Given the emerging participatory web, the way that people use and link information will itself provide context, and assist in find ability, creating spontaneous collections that have nothing to do with the original intent when data was first stored.

So that's a pretty confused picture, and what is surprising to conclude is that being a librarian sounds pretty cool.