Tuesday, February 19, 2002

Archiving and the Transient Nature of the Web


In the as yet brief history of the World Wide Web, surfing the web has been a transient experience. Webmasters update pages, modify layout, add or delete content at the urging of any apparent whim. Some new technology you just gotta use, "oh, I got bored with the current fonts", whipped up a new color scheme -- websites have always been in a constant state of flux. Tomorrow, this site may be different, or it may not be at all.

[Obviously, these remarks pertain more to personal websites than corporate ones.]

Now, if you visit the Internet Archive, you can view web pages that bear no resemblance to their current incarnations, or perhaps no longer exist.

It becomes possible to examine the evolution of a site or aggregate of sites, to ponder the growth of the use of a technology in the context of the Web as society and the Web as technology affecting a larger society. It becomes possible to conduct web archaeology, and ponder questions like:

How have users changed as they've built websites?
Trace the change in technical ability required to maintain and build a website, and its impact on technophobic communities.
How have new technologies affected the ability to build and manage online communities? Does that threaten more traditional geopolitical community boundaries?

With actual data.

An interesting thing about the introduction of a large scale Internet archive is not that it does what Usenet archives have done all along, but the difference between website development and newsgroup or mailing list posting.

Posting to a list or newsgroup has always involved, on the part of the user, the relinquishment of control over the final disposition of the post. A user submits their post to another party [automated, moderated, whatever], which is then distributed to anyone who has signed up to receive it. It gets copied and sent around the world in the blink of an eye. In contrast, putting up a web page on a site does not require giving up any control, instead retaining and using that control -- I delete page X, or replace it with Y, or rewrite it, and those pages sit on my account or my server, and I'm the only one who touches 'em.

What does it mean then, to have a system that copies and stores those pages, and provides a stop-motion film of the evolutionary process?


No comments: