Journal of a Programmer: System sizes at the high end.

Friday, October 22, 2010

System sizes at the high end.

Here's a very impressive writeup of the recent Hadoop World conference in New York City, from David Menninger of Ventana Research.

Menninger notes that Hadoop installations are much larger than you might think:

How big is “big data”? In his opening remarks, Mike shared some statistics from a survey of attendees. The average Hadoop cluster among respondents was 66 nodes and 114 terabytes of data. However there is quite a range. The largest in the survey responses was a cluster of 1,300 nodes and more than 2 petabytes of data. (Presenters from eBay blew this away, describing their production cluster of 8,500 nodes and 16 petabytes of storage.) Over 60 percent of respondents had 10 terabytes or less, and half were running 10 nodes or less.

(in the above quote, "Mike" is Mike Olson of Cloudera.)

Curt Monash has been keeping track of some of these stupendous database installations, and shares some of what he's learned in this note.

At my day job, we had an internal presentation the other day from one of our larger customers, who reported that they've constructed a single node with 3 terabytes of RAM-SAN as a cache ahead of their main database disk array.

Our customer didn't think that was particularly large. They were just noting that it was plenty large enough for their use, at the moment.

Journal of a Programmer

Friday, October 22, 2010

System sizes at the high end.

No comments:

Post a Comment

About Me

Blog Archive

Pages