Distributed R

From Wikipedia Quality
Jump to: navigation, search

Distributed R is an open source, high-performance platform for the R language. It splits tasks between multiple processing nodes to reduce execution time and analyze large data sets. Distributed R enhances R by adding distributed data structures, parallelism primitives to run functions on distributed data, a task scheduler, and multiple data loaders. It is mostly used to implement distributed versions of machine learning tasks. Distributed R is written in C++ and R, and retains the familiar look and feel of R. As of February 2015, Hewlett-Packard (HP) provides enterprise support for Distributed R with proprietary additions such as a fast data loader from the Vertica database.