How the Wayback Machine works
Wed, 23 Jan 2002 10:03:51 -0800
D'oh -- just realized this was ./'ed today. It's still a good article. :-)
> -----Original Message-----
> From: firstname.lastname@example.org [mailto:email@example.com]On Behalf Of Jim
> Sent: Wednesday, January 23, 2002 9:58 AM
> To: FoRK
> Subject: How the Wayback Machine works
> A link I picked up from Dave's Scripting News:
> Some quotes:
> Brewster Kahle: In the Wayback Machine, currently there are 10 billion Web
> pages, collected over five years. That amounts to 100 terabytes, which is
> 100 million megabytes. So if a book is a megabyte, which is about what it
> is, and the Library of Congress has 20 million books, that's 20 terabytes.
> This is 100 terabytes. At that size, this is the largest database ever
> built. It's larger than Walmart's, American Express', the IRS. It's the
> largest database ever built. And it's receiving queries -- because every
> page request when people are surfing around is a query to this database --
> at the rate of 200 queries per second. It's a fairly fast database engine.
> And it's built on commodity PCs, so we can do this cost-effectively. It's
> just using clusters of Linux machines and FreeBSD machines.
> Koman: How many machines?
> Kahle: Three hundred, we may be up to 400 machines now. When we first came
> out, we didn't architect it for the load we wound up with, so we had to
> throw another 20 to 30 machines at serving the index.
> We can buy 100 TBs with 250 CPUs to work on it, all on a high-speed switch
> with redundancy built in. Something has changed by using these modern
> constructs that are heavily used at Google, Hotmail, here, Transmeta.
> There's a whole sector of companies that are more
> cost-constrained than say,
> banks, that just buy Oracle and Sun and EMC.
> So if all books are 20 TBs, and 20 TBs are $80,000, that's the Library of
> Congress. Then something big has changed. All music? It's tiny. It looks
> like there're only one million records that have been produced
> over the last
> century. That's tiny. All movies? All theatrical releases have been
> estimated at 100,000, and most of those from India. If you take
> all the rest
> of ephemeral films, that's on the order of a couple hundred thousand. It's
> just not that big. It allows you to start thinking about the whole thing.
> - Jim