Thursday, January 19, 2012

A scattered collection of database and filesystem news

These things aren't fundamentally related, but they are all interesting in rather similar ways, so I'm noting them all in a single post. Certainly they each deserve much more detailed analyses; who has the time?

First, some filesystem notes:

  • Microsoft are getting a lot of attention for their announcement about ReFS.. Two points that particularly struck me:
    Optimize for extreme scale. Use scalable structures for everything. Don’t assume that disk-checking algorithms, in particular, can scale to the size of the entire file system.

    Never take the file system offline. Assume that in the event of corruptions, it is advantageous to isolate the fault while allowing access to the rest of the volume. This is done while salvaging the maximum amount of data possible, all done live.

    Paul Thurrott says that the new filesystem will be in the server edition of Windows only (at least for now).

    By the way, does one say "Ree Eff Ess"? Or does one say "reffs"?

  • Many people remarked that ReFS seemed to be "bringing ZFS to Windows", so it's interesting to see this recent work on bringing ZFS to Mac OS X. They seem to have spent a lot of time on their web page, but it's hard to find much about the underlying technology. But they're apparently a small young company, so let's give them time.

Now, to complement your filesystems news, here's some database news:

  • First and most important, don't miss all the talk about Amazon's new DynamoDB: And vastly more. There is a huge amount to read about the new DynamoDB work; it will take a while for us all to digest it and understand it. But, as usual with Amazon, there is a lot of "meat" there!
  • And on another, related, front, I see that Google have open-sourced their leveldb library. This, too, is a "NoSQL-style" database:
    • Keys and values are arbitrary byte arrays.
    • Data is stored sorted by key.
    • Callers can provide a custom comparison function to override the sort order.
    • The basic operations are Put(key,value), Get(key), Delete(key).
    • Multiple changes can be made in one atomic batch.
    • Users can create a transient snapshot to get a consistent view of data.
    • Forward and backward iteration is supported over the data.
    • Data is automatically compressed using the Snappy compression library.
    There is additional information to read about leveldb in their documentation pages.

I hope to find time to plow through much of this material; there sure is a lot to learn about in the world!

No comments:

Post a Comment