Facebook
Twitter
Instagram
TikTok
YouTube

Vishnu Nanduri, PHD

About Me
Articles
Awards/Honors
Education
My Courses
Pictures
Podcast
Presentations
Publications
Statistics & Analytics Videos
Contact

wate-R

For all of us who have hit the proverbial “R” wall due to memory size limitations, H2O is a welcome relief. H2O (www.h2o.ai) is an open-source, in-memory, distributed machine learning platform. H2O’s core code is written in Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc., across…

vishnunanduri

18 Feb 2017

1–2 minutes

For all of us who have hit the proverbial “R” wall due to memory size limitations, H2O is a welcome relief. H2O (www.h2o.ai) is an open-source, in-memory, distributed machine learning platform.

H2O’s core code is written in Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc., across all nodes and machines. The algorithms are implemented on top of H2O’s distributed Map/Reduce framework and utilize the Java Fork/Join framework for multi-threading. [see: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html]

The biggest advantage I found was the ease of switching back and forth between what is called an H2O frame and the R dataframe. The moment we switch to H2O frame the code runs on the h2O cluster that we set up. Setting up the H2O cluster, even on your own laptop, is a breeze. The commands to invoke H2O from within the Rstudio are very straightforward, the tutorial: https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/howto/Connecting_RStudio_to_Sparkling_Water.md

You can quickly get started with machine learning in H2O within Rstudio with this easy to use tutorial: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/RBooklet.pdf

H2O does many things that R does: transformations, aggregations, etc. It also claims to have a rapidly expanding library for machine learning. The documentation is easy to follow, which is a big plus. Some of the world’s largest firms have been quoted on h2o’s website as users of their product. H2O also includes an interesting suite of tools with cool sounding names:

Base H2O
Sparkling Water (combining Spark and H2O…nice wordplay)
Steam (end-to-end AI engine to streamline deployment of apps)
Deep-water (state-of-the-art deep learning models in H2O)

I ran a random forest model with 500 trees and 1.8 million records and it ran pretty quickly on my laptop. Obviously the real computational power can be harnessed and experienced only when it is run on a large cluster with several nodes. The H2O billion row machine learning benchmark for solving a logistic regression problem is said to take ~35 seconds on 16EC2 nodes and the performance supposedly get better as more nodes are added (see: http://www.stat.berkeley.edu/~ledell/docs/h2o_hpccon_oct2015.pdf for a detailed performance assessment).

All in all, H2O is a great alternative to try out as you crunch those extremely large datasets, where R cannot help.

Author

Written by

vishnunanduri

Uncategorized

Consuming Content from the Gen AI Firehose

vishnunanduri
Uncategorized

Is this the right time for innovation, digitalization, and AI?

vishnunanduri
Uncategorized

The Impact of Telematics Data and AI on Motor Insurance

vishnunanduri
Uncategorized

2019: A roller coaster ride in books

vishnunanduri

Subscribe to our newsletters. We’ll keep you in the loop.

Type your email…

Facebook
Twitter
Instagram
YouTube

About Me
Articles
Awards/Honors
Education
My Courses
Pictures
Podcast
Presentations
Publications
Statistics & Analytics Videos
Contact

Vishnu Nanduri, PHD

Leave a comment

Consuming Content from the Gen AI Firehose

Is this the right time for innovation, digitalization, and AI?

The Impact of Telematics Data and AI on Motor Insurance

Trending