at 2pm Sage Weil spent an hour+ talking about Ceph, the distributed
storage system. This is the first time I’ve had a serious look at it,
and this was one informative session. The Ceph project’s goals are to
provide a pure open-source storage framework that can run on anything,
and be resilient while doing it. And also, disrupt the storage market
while doing so.
Right now Ceph has support for a variety of access methods:
- Object access through a REST gateway
- Object access through a library, librados
- Block access
- File-level access through Ceph FS
of this, it’s a very attractive system for a IaaS storage foundation.
And by all reports, it’s definitely being used like that.
Another of its goals is to require as little manual configuration as possible, which is a very important thing when building a rapidly scaling system. Hand in hand with that is to be as low impact to disruption, add/delete nodes, as possible. Central to all of this is the Crush Algorithm.
Ceph is built around an object-store basis, rather than a file-store basis. This is for a few reasons:
- Names are in a simple flat namespace
- More scalable that a file-system
- The ‘kajillion file directory’ problem doesn’t exist
- Much easier to parallelize
- Variable size objects
- Access through a simple API with rich semantics
And yet they’re still providing traditional file-level access, and even block-level access through these objects.
Ceph is looking to put out some appliances in the near future, to give integrators an even more minimal configuration option. They’re looking to improve their integration with Samba and Ganesha NFS, hoping to provide direct Ceph access from the respective layers. Also on the list is direct Hadoop integration, and geographic replication.
An audience member raised the question of encryption support. Ceph supports encryption in the authentication layer, but doesn’t provide any at the storage layers. This is a rather complex problem, given the difficulties in key distribution and known problems with dm-crypt. It is being talked about, but no final decisions have been made yet.
This is an evolving project and definitely has some large systems behind it. A GFS clone it ain’t. It’s more than that.