Thursday, November 15, 2007

IBM Blue Cloud Announcement



On the news of IBM's Blue Cloud, would like to mention Dana Gardner's thorough diatribe on the announcement. Let's highlight her summation on IBM providing an infrastructure (BladeCenter, Linux, cloud software), which would be a shared/dedicated services based SOLUTION, provisioned by Tivoli products. A services solution vs. just selling IT inventory assets.
"And so large enterprises will need not just make decisions about technology platform, supplier, and computing models. They will need to make bigger decisions based on broad partnerships that produce services ecologies in niches and industries. For an enterprise to adopt a Blue Cloud approach is not just to pick a vendor -- they are picking much more. The businesses and services and hosting all become mingled. It becomes more about revenue sharing than just a supplier contract."
This is beyond the scope of the announcement, but why not provision strategic solutions, and the adjoining back office, just as one would for an externally managed web site. This is quite the play on the old outsourcing model.

A relevant business case for Blue Cloud would require massive data/processing services to drive future mashup type social solutions requiring huge parallel processing on the back-end with matching data streams. These requirements will most likely call for a greenfield infrastructure. The Blue Cloud is comprised of very unique application services which initially does NOT seem to map to back-office requirements. But if scenarios do start to arise from such a marriage, why not run an entire enterprise on a services based platform which possesses such huge processing and data handling capabilities?

Though not as sexy, but possibly improving an initial ROI by leveraging underutilized hardware, is the fact that IBM has more than one platform option for this scenario. Tivoli can readily provision SUSE Linux on the IBM hardware platforms for which SUSE is certified: BladeCenter & xSeries (natively), and zSeries, iSeries, pSeries (virtually).

Deeper into the Woods
Part of the Cloud offering is Hadoop, an open-source parallel workload-scheduling software originally developed by Yahoo. With it's own distributed file system (HDFS), it leverages the MapReduce file distribution programming model from Google. Yahoo is currently using Hadoop to process 10,000 research jobs a week, on ~10,000 servers with largest cluster at 1,600 nodes, managing 1 petabyte of user data. It might be that Ubuntu is the GNU/Linux distro but this is not verified. Read Tim O'Reily on the subject.

An interesting point about the Google programming model, it relaxes a few POSIX requirements to enable streaming access to distributed file system data by splitting files on arbitrary byte boundaries. But more importantly, it represents Google's recent advances in abstraction allowing them to express simple computation but still hide the messy details of parallelization, fault-tolerance, data distribution and load balancing required in such a library.

So in a quasi-joint offering, Google, Yahoo, open source, and IBM, are now indirectly combining their technologies to provide to the world a services based solution via the cloud which can handle extremely large processing requirements and be provisioned without the corporate and academic worlds having to build out their own infrastructure. HP, Sun, Oracle, Amazon, et al., are heading in the same direction, so the race is on.

No comments: