Using SLURM to run jobs at Keck Center

Using SLURM to run jobs at Keck Center

All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SLURM. User accounts not complying with this policy will be suspended.

All jobs must be submitted from w01.keck2.ucsd.edu using the sbatch command:

sbatch script.slurm

where script.slurm is your SLURM script. For examples see below.

Use this guide to migrate from SGE to SLURM: https://srcc.stanford.edu/sge-slurm-conversion

How to submit a job

Login (using ssh) to w01.keck2.ucsd.edu.
Create all necessary input files for your job.
Create a SLURM script which is used to submit the job to the job queue manager.
Submit your job using this command: sbatch script.slurm Where script.slurm is the filename of your SLURM script.
Check on progress of your job in the queue: squeue -u <userid> or just squeue

Make sure that your SLURM script is in the correct Unix file format. You can verify that with this command:

$ file script.slurm 
script.slurm: Bourne-Again shell script, ASCII text executable

If you get something like this:

$ file script.slurm 
script.slurm: Bourne-Again shell script, ASCII text executable, with CRLF line terminators

then you likely created your SLURM script on a Windows machine and you need to convert it on w01 to Unix file format with this command: dos2unix script.slurm.

Keck Center scheduling policies

All jobs must be submitted to the SLURM queue manager. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.
Using more than 8 processors per job is not allowed.

Available partitions (queues)

Partition	Max wall clock time	Max number of CPUs	Max number of nodes
cpu	5 days	8	1
unlimited	no limit	8	1
gpu	5 days	1	1

Partition (queue) limits

The following limits are imposed on all jobs:

max wall-clock time is 5 days in the cpu partition and 5 days in the gpu partition (subject to change, use sinfo to see the current limit)
max number of processors (cores) per job is 8
max number of nodes (workstations) per job is 1
max number of running jobs per user is 5. This is dynamically changed based on the cluster load. To see the current limit: sacctmgr list account -s format=User,MaxJobs
the 'unlimited' partition (queue) has no wall clock time limit and is configured for long-running jobs where 5 day wall-clock time limit is not sufficient. Please do not abuse this queue.

If you have any special requirements please email keck2-help@keck2.ucsd.edu

Basic SLURM commands

Command	Example syntax	Meaning
sbatch	sbatch <jobscript>	Submit a batch job.
srun	srun --pty -t 0-0:5:0 -p cpu /bin/bash -i	Start an interactive session for five minutes in the cpu queue.
squeue	squeue -u <userid>	View status of your jobs in the queue. Only non-completed jobs will be shown.
scontrol	scontrol show job <jobid>	Look at a running job in detail. For more information about the job, add the -dd parameter.
scancel	scancel <jobid>	Cancel a job. scancel can also be used to kill job arrays or job steps.
scontrol	scontrol hold <jobid>	Pause a job
scontrol	scontrol resume <jobid>	Resume a job
sacct	sacct -j <jobid>	Check job accounting data. Running sacct is most useful for completed jobs.
sinfo	sinfo	See node and partition information. Use the -N parameter to see information per node.

SLURM useful commands

Request a node with 12GB of RAM (total): sbatch --mem=12G job_script. To see how much memory is currently available on the nodes: sinfo --Node -l

Request a node with 6GB of RAM per core (CPU): sbatch --mem-per-cpu=6G job_script.

Most of the Keck nodes have 24 GB of RAM (23936 B) but there are two nodes which have 32 GB (31977 B) of RAM (nodes w16 and w17). If your job needs more than 20GB of RAM (but less that 32GB) you can request one of the "high-memory" nodes with the following statements in your SLURM batch file:

#SBATCH --mem=30G               # request allocation of 30GB RAM for the job
#SBATCH --nodelist=w16 (or w17) # request the job to be sent to w16 or w17, pick a node which has no jobs running

canceling jobs:

scancel 1234	cancel job 1234
scancel -u myusername	cancel all my jobs
scancel -u myusername --state=running	cancel all my running jobs
scancel -u myusername --state=pending	cancel all my pending jobs

Example SLURM monitoring commands

squeue -u <userid>	list information about all non-completed jobs for a user, including job ids and what status they're in.
squeue -j <jobid>	list information for a single job
squeue -t RUNNING	list information for only running jobs
squeue -t PENDING	list information only for pending jobs
squeue -p cpu	list information for only jobs in cpu partition
squeue -p gpu -u <userid> -t RUNNING	list information for jobs in gpu partition that are currently running for a user
scontrol show job <jobid> -dd	show details for a running job, -dd requests more detail

sstat -j <jobid>.batch --format JobID,MaxRSS, MaxVMSize,NTasks	show status information for running job you can find all the fields you can specify with the --format parameter by running sstat -e
sacct -j <jobid> --format=JobId,AllocCPUs,State,ReqMem, MaxRSS,Elapsed,TimeLimit,CPUTime,ReqTres	get statistics on a completed job you can find all the fields you can specify with the --format parameter by running sacct -e you can specify the width of a field with % and a number, for example --format=JobID%15 for 15 characters

Best practices

All workstations have a very fast local hard drive mounted under /scratch. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in /scratch at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SLURM example scripts below how this can be simply achieved.

Please note that old files (4 days and older) are regularly purged from /scratch.

Running CPU intensive jobs

To run a CPU intensive job it must be submitted to the cpu partition (queue), which is the default queue. An example of a SLURM script for running CPU intensive jobs (for example, Gaussian jobs) is below.

Gaussian

#!/bin/bash
#SBATCH -n 8                       # Request 8 cores
#SBATCH -t 0-01:30                 # Runtime in D-HH:MM format
#SBATCH -p cpu                     # Partition to run in
#SBATCH --mem=20G                  # Memory total in MB (for all cores)
#SBATCH -o %j.out                  # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err                  # File to which STDERR will be written, including job ID
set -xv
echo Running on host $(hostname)
echo "Job id: ${SLURM_JOB_ID}" 
echo Time is $(date)
echo Directory is $(pwd)
echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition "
cwd=$(pwd)
# create a randomly named scratch directory and copy your files there
export SCRATCH=$(mktemp -d /scratch/${USER}.XXXXXX)
echo "Using SCRATCH: ${SCRATCH}"
export GAUSS_SRCDIR=${SCRATCH}
# copy job files to $SCRATCH
cp -a * ${SCRATCH}
cd ${SCRATCH}

module load gaussian/16.B01-sse4

# start you g16 job (change the input/output file names for your job)
g16 < input.in >& output.out

# copy the results back to $HOME & cleanup
cp -a * ${cwd}
#rm -rf ${SCRATCH}

You can save this script to a file, for example gaussian.slurm and then submit it to the queue:

 sbatch gaussian.slurm

You can verify that the jobs is in the queue:

 squeue

Note: make sure you also have this statement in your Gaussian input file so that you are really using 8 CPUs:

 %nprocshared=8
 %mem=6GB

Orca MPI

This is an example of a SLURM submit script for running the MPI version of orca on 8 processors.

#!/bin/bash
#SBATCH -n 8                       # Request 8 cores
#SBATCH -t 0-00:05                 # Runtime in D-HH:MM format
#SBATCH -p cpu                     # Partition to run in
#SBATCH --mem=20G                  # Memory total in MB (for all cores)
#SBATCH -o %j.out                  # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err                  # File to which STDERR will be written, including job ID
set -xv

echo Running on host $(hostname)
echo "Job id: ${SLURM_JOB_ID}" 
echo Time is $(date)
echo Directory is $(pwd)
echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition "

# create a scratch directory on the SDD and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
echo "Using SCRATCH directory: ${scratch_dir}"
current_dir=`pwd`
cp -a * $scratch_dir
cd $scratch_dir

module load orca/5.0.3

$ORCA_PATH/orca orca_input.inp > orca_output.out 

# copy all data back from the scratch directory
cp -a * $current_dir
rm -rf $scratch_dir

You also have to put this in your orca input file to tell the application to use 8 processors:

%pal nprocs 8 end

Please note that with older versions of Orca you have to load the appropriate MPI library to use it. This is a compatibility table between different Orca and MPI module versions:

orca/4.0.0	openmpi/2.0.1
orca/4.0.1	openmpi/2.0.2
orca/4.2.0	openmpi/3.1/3.1.4
orca/4.2.1	openmpi/3.1/3.1.4
orca/5.0.3	no MPI loading necessary, it is built in

OpenMolcas

The following SLURM script can be used to run OpenMolcas jobs on up to 8 CPUs. Please modify for your specific needs. Kindly contributed by Jeremy Hilgar.

#!/bin/bash
#SBATCH -p cpu
#SBATCH -n 8              # 8 CPUs
#SBATCH --mem=20000       # 20GB
#SBATCH --export=ALL
#SBATCH -t 5-00:00        # Runtime in D-HH:MM format

module purge
module load openmolcas/8.4-dev

mkdir /scratch/$SLURM_JOB_ID
mkdir -p ${PWD}/output

export MOLCAS_WORKDIR=/scratch/$SLURM_JOB_ID
export MOLCAS_OUTPUT=/scratch/$SLURM_JOB_ID
export MOLCAS_MEM=19000
# set project name to current directory name
export MOLCAS_PROJECT=${PWD##*/}

# some modules markedly benefit from parallelization
export MOLCAS_NPROCS=8

# set up integrals
pymolcas ${PWD}/seward.in -oe ${PWD}/output/seward.out -b 1

# perform scf calculations
pymolcas ${PWD}/rasscf.in -oe ${PWD}/output/rasscf.out -b 1

# some modules do not benefit from parallelization,
# so we change the corresponding environment variable before calling them
export MOLCAS_NPROCS=1

# calculate spin-orbit interaction matrix elements
pymolcas ${PWD}/rassi.in -oe ${PWD}/output/rassi.out -b 1

# calculate magnetic properties
pymolcas ${PWD}/single_aniso.in -oe ${PWD}/output/single_aniso.out -b 1

# copy output magnetic properties file for further analysis (poly_aniso)
cp -a /scratch/$SLURM_JOB_ID/${PWD##*/}/ANISOINPUT /scratch/$SLURM_JOB_ID/${PWD##*/}/POLYFILE ${PWD}/output

Running GPU jobs

All GPU jobs must be submitted to the gpu partition (queue) and request gpu consumable resource. Use the following statement in your SLURM script to accomplish that:

#SBATCH -p gpu
#SBATCH --gres=gpu:1

Amber script for running GPU jobs

The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.

#!/bin/bash
#SBATCH -n 1                       # Request 1 cores
#SBATCH -t 0-00:05                 # Runtime in D-HH:MM format
#SBATCH -p gpu                     # Partition to run in
#SBATCH --gres=gpu:1
#SBATCH --mem=8024                 # Memory total in MB (for all cores)
#SBATCH -o %j.out                  # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err                  # File to which STDERR will be written, including job ID
set -xv

module load amber/20

export CUDA_VISIBLE_DEVICES=0

echo Running on host `hostname`
echo "Job id: $j" 
echo Time is `date`
echo Current directory is `pwd`

cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
echo SCRATCH: $SCRATCH
# copy job files to $SCRATCH
cp -a * $SCRATCH

# start your job in $SCRATCH
cd $SCRATCH
pmemd.cuda -O -i md.in -o md.out -p md.top -c md.rst -r md2.rst -x md.netcdf

# copy your results back to $HOME & cleanup
cp -a * $cwd
#rm -rf $SCRATCH

How to report SLURM issues

If you have problems submitting your SLURM jobs you can email keck-help@keck2.ucsd.edu for assistance. Please include the following information in your email:

your jobs' SLURM ID ($SLURM_JOB_ID)
attach your SLURM submission script
cut and paste (no screenshots) the error message you are getting
attach job's .out and .err files

Table of Contents