Not currently logged in. Login now.

Libraries for Category 'Batch Processing'

General
SisyphusMap reduce implemented in Clojure
gantryOperations support and deployment in clojure. Inspired by crane, capistrano, and fabric.

This is a work in progress.
lemurLemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions (aka hooks) and zero or more "steps". A step is Amazon's name for a task or job submitted to the cluster. Lemur reads your jobdef, at the end of your jobdef, you execute (fire! ...) to make things happen. Also keep in mind that the jobdef is an interpreted clj file, so you can insert arbitrary Clojure code to be executed anywhere in the file
clj-sparkA Clojure api for the Spark Project.

It handles many of the initial problems like serializing anonymous functions, converting back and forth between Scala Tuples and Clojure seqs, and converting RDDs to PairRDDs.

Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can run on top of the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager (“standalone mode”).