Running ADAM on Slurm

For those groups with access to a HPC cluster with Slurm managing a number of compute nodes with local and/or network attached storage, it is possible to spin up a temporary Spark cluster for use by ADAM.

The full IO bandwidth benefits of Spark processing are likely best realized through a set of co-located compute/storage nodes. However, depending on your network setup, you may find Spark deployed on HPC to be a workable solution for testing or even production at scale, especially for those applications which perform multiple in-memory transformations and thus benefit from Spark’s in-memory processing model.

Follow the primary instructions for installing ADAM into $ADAM_HOME. This will most likely be at a location on a shared disk accessible to all nodes, but could be at a consistant location on each machine.

Start Spark cluster

A Spark cluster can be started as a multi-node job in Slurm by creating a job file run.cmd such as below:

#!/bin/bash

#SBATCH --partition=multinode
#SBATCH --job-name=spark-multi-node
#SBATCH --exclusive

#Number of seperate nodes reserved for Spark cluster
#SBATCH --nodes=2
#SBATCH --cpus-per-task=12

#Number of excecution slots
#SBATCH --ntasks=2

#SBATCH --time=05:00:00
#SBATCH --mem=248g

# If your sys admin has installed spark as a module
module load spark

# If Spark is not installed as a module, you will need to specifiy absolute path to
# $SPARK_HOME/bin/spark-start where $SPARK_HOME is on shared disk or at a consistant location
start-spark

echo $MASTER
sleep infinity

Submit the job file to Slurm:

sbatch run.cmd

This will start a Spark cluster containing two nodes that persists for five hours, unless you kill it sooner. The file slurm.out created in the current directory will contain a line produced by echo $MASTER above which will indicate the address of the Spark master to which your application or ADAM-shell should connect such as spark://somehostname:7077

Start adam-shell

Your sys admin will probably prefer that you launch your adam-shell or start an application from a cluster node rather than the head node you log in to. You may want to do so with:

sinteractive

Start an adam-shell:

$ADAM_HOME/bin/adam-shell --master spark://hostnamefromslurmdotout:7077

or Run a batch job with adam-submit

$ADAM_HOME/bin/adam-submit --master spark://hostnamefromslurmdotout:7077

You should be able to connect to the Spark Web UI at http://hostnamefromslurmdotout:4040, however you may need to ask your system administrator to open the required ports.