Wednesday, December 7, 2011

small data, BIG data on clouds

I have been talking about databases on the cloud for quite some time now. In earlier blog posts I've mentioned how cloud-o-nomics can accelerate testing and development of database solutions, get databases up-and-running quickly on public clouds, as well as using private clouds for deploying enterprise class database workloads in-house.

In a recent Chat with the Lab webinar experts in data management and cloud technologies - Leon Katsnelson (@katsnelson) from IBM and Uri Budnik (@uribudnik) from RightScale - reviewed several options for running databases (specifically DB2) in both public and private clouds, and even mentioned no cost options for doing so. They also presented an option for test-driving next generation database technology in the cloud with free credits thrown in by Amazon.

But the thrust of this webinar was around "Big Data". Its not surprising why analysts like IDC rate cloud computing and big data amongst the hottest technologies for 2012. Sure they are hot on their own but combine the two and even more magic happens. When IBM talks about big data you hear about the 3Vs that characterize this space - Volume, Variety, and Velocity (learn more about these when you watch the webcast recording below). The presenters in this webcast went further and talked about the 4th V, i.e. Value that is unleashed for big data by utilizing cloud economics.

The logic behind it is actually quite simple. We know big data sets involve very large volumes of data that require dozens, sometimes hundreds or even thousands of servers to process these data sets in parallel using paradigms like MapReduce for deriving insights. Traditional data center computing would require quite intensive capital investment to purchase and setup a cluster for processing big data, that may be hard to justify if the hardware utilization is low e.g. running a Hadoop job for only a few hours a day. Instead if you use cloud computing you can start up 100 servers costing 30 cents each per hour so an hour long big data job would only cost about $30, and after an hour you could shutdown the servers without incurring further costs.

The speakers also talked about BigInsights (IBM's hadoop powered solution for big data) and setting up a Hadoop cluster in minutes on IBM or Amazon cloud using pre-built, pre-configured BigInsights cloud images and server templates. And if you are interested in big data but don't have the skills or want learn how to quickly run hadoop clusters on the cloud, you can take free courses online on Big Data University.

One question that came up during this webcast was around how do you get your big data sets loaded into the cloud. So when you watch the recording be sure to listen to the Questions and Answers part towards the end.

Below is the recording of this webinar titled Leveraging Clouds for Small and Big Data...

If you want to be informed about new big data webinars from IBM, do sign up for the Big Data Insights Newsletter.

The forecast for dataville is partly cloudy.

The forecast for dataville is partly cloudy.