====== Using SLURM to run jobs at Keck Center ======

<color #FF0000>All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SLURM.</color> User accounts not complying with this policy will be suspended.

All jobs must be submitted from ''w01.keck2.ucsd.edu'' using the ''sbatch'' command:

  sbatch script.slurm


where ''script.slurm'' is your SLURM script. For examples see below.

Use this guide to migrate from SGE to SLURM: https://srcc.stanford.edu/sge-slurm-conversion

===== How to submit a job =====

  * Login (using ssh) to ''w01.keck2.ucsd.edu''.
  * Create all necessary input files for your job.
  * Create a SLURM script which is used to submit the job to the job queue manager.
  * Submit your job using this command: ''sbatch script.slurm'' Where ''script.slurm'' is the filename of your SLURM script.
  * Check on progress of your job in the queue: ''squeue -u <userid>'' or just ''squeue''


Make sure that your SLURM script is in the correct Unix file format. You can verify that with this command: 

<code>
$ file script.slurm 
script.slurm: Bourne-Again shell script, ASCII text executable
</code>

If you get something like this:
<code>
$ file script.slurm 
script.slurm: Bourne-Again shell script, ASCII text executable, with CRLF line terminators
</code>

then you likely created your SLURM script on a Windows machine and you need to convert it on ''w01'' to Unix file format with this command: ''dos2unix script.slurm''.

===== Keck Center scheduling policies =====

  * All jobs must be submitted to the SLURM queue manager. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.
  * Using more than 8 processors per job is not allowed.
===== Available partitions (queues) =====

^ Partition ^ Max wall clock time ^ Max number of CPUs ^ Max number of nodes ^
|  cpu  |  5 days  |  8  |  1  |
| unlimited |  no limit  |  8  |  1  |
|  gpu  |  5 days   |  1  |  1  |

===== Partition (queue) limits =====

The following limits are imposed on all jobs:

  * max wall-clock time is 5 days in the cpu partition and 5 days in the gpu partition (subject to change, use ''sinfo'' to see the current limit)
  * max number of processors (cores) per job is 8
  * max number of nodes (workstations) per job is 1
  * max number of running jobs per user is 5. This is dynamically changed based on the cluster load. To see the current limit: ''sacctmgr list account -s format=User,MaxJobs''
  * the 'unlimited' partition (queue) has no wall clock time limit and is configured for long-running jobs where 5 day wall-clock time limit is not sufficient. Please do not abuse this queue.


If you have any special requirements please email <keck2-help@keck2.ucsd.edu>


===== Basic SLURM commands =====

^ Command ^ Example syntax ^ Meaning ^
|sbatch|sbatch <jobscript>| Submit a batch job.|
|srun|%%srun --pty -t 0-0:5:0 -p cpu /bin/bash -i%%|Start an interactive session for five minutes in the cpu queue.|
|squeue|squeue -u <userid>|View status of your jobs in the queue. Only non-completed jobs will be shown.|
|scontrol|scontrol show job <jobid>| Look at a running job in detail. For more information about the job, add the -dd parameter.|
|scancel|scancel <jobid>|Cancel a job. scancel can also be used to kill job arrays or job steps.|
|scontrol|scontrol hold <jobid>|Pause a job|
|scontrol|scontrol resume <jobid>|Resume a job|
|sacct|sacct -j <jobid>|Check job accounting data. Running sacct is most useful for completed jobs.|
|sinfo|sinfo|See node and partition information. Use the -N parameter to see information per node.|

===== SLURM useful commands =====


  * Request a node with 12GB of RAM (total): ''%%sbatch --mem=12G job_script%%''. To see how much memory is currently available on the nodes: ''%%sinfo --Node -l%%''

  * Request a node with 6GB of RAM per core (CPU): ''%%sbatch --mem-per-cpu=6G job_script%%''.


  * Most of the Keck nodes have 24 GB of RAM (23936 B) but there are two nodes which have 32 GB (31977 B) of RAM (nodes w16 and w17). If your job needs more than 20GB of RAM (but less that 32GB) you can request one of the "high-memory" nodes with the following statements in your SLURM batch file:

  #SBATCH --mem=30G               # request allocation of 30GB RAM for the job
  #SBATCH --nodelist=w16 (or w17) # request the job to be sent to w16 or w17, pick a node which has no jobs running


  * canceling jobs: 

|scancel 1234                           | cancel job 1234|
|scancel -u myusername                  | cancel all my jobs|
|%%scancel -u myusername --state=running%%  | cancel all my running jobs|
|%%scancel -u myusername --state=pending%%  | cancel all my pending jobs|


===== Example SLURM monitoring commands =====

|squeue -u <userid>|list information about all non-completed jobs for a user, including job ids and what status they're in.|
|squeue -j <jobid>|list information for a single job|
|squeue -t RUNNING|list information for only running jobs|
|squeue -t PENDING|list information only for pending jobs|
|squeue -p cpu|list information for only jobs in cpu partition|
|squeue -p gpu -u <userid> -t RUNNING|list information for jobs in gpu partition that are currently running for a user|
|scontrol show job <jobid> -dd|show details for a running job, -dd requests more detail|

|%%sstat -j <jobid>.batch --format JobID,MaxRSS, MaxVMSize,NTasks%% | show status information for running job you can find all the fields you can specify with the %%--format%% parameter by running sstat -e|
|%%sacct -j <jobid> --format=JobId,AllocCPUs,State,ReqMem, MaxRSS,Elapsed,TimeLimit,CPUTime,ReqTres%%|get statistics on a completed job you can find all the fields you can specify with the %%--format%% parameter by running sacct -e you can specify the width of a field with % and a number, for example %%--format=JobID%15%% for 15 characters|


===== Best practices =====

All workstations have a very fast local hard drive mounted under ''/scratch''. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in ''/scratch'' at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SLURM example scripts below how this can be simply achieved.

Please note that old files (4 days and older) are regularly purged from ''/scratch''.

===== Running CPU intensive jobs =====

To run a CPU intensive job it must be submitted to the ''cpu'' partition (queue), which is the default queue. An example of a SLURM script for running CPU intensive jobs (for example, Gaussian jobs) is below. 

==== Gaussian ====

<code>
#!/bin/bash
#SBATCH -n 8                       # Request 8 cores
#SBATCH -t 0-01:30                 # Runtime in D-HH:MM format
#SBATCH -p cpu                     # Partition to run in
#SBATCH --mem=20G                  # Memory total in MB (for all cores)
#SBATCH -o %j.out                  # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err                  # File to which STDERR will be written, including job ID
set -xv
echo Running on host $(hostname)
echo "Job id: ${SLURM_JOB_ID}" 
echo Time is $(date)
echo Directory is $(pwd)
echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition "
cwd=$(pwd)
# create a randomly named scratch directory and copy your files there
export SCRATCH=$(mktemp -d /scratch/${USER}.XXXXXX)
echo "Using SCRATCH: ${SCRATCH}"
export GAUSS_SRCDIR=${SCRATCH}
# copy job files to $SCRATCH
cp -a * ${SCRATCH}
cd ${SCRATCH}

module load gaussian/16.B01-sse4

# start you g16 job (change the input/output file names for your job)
g16 < input.in >& output.out

# copy the results back to $HOME & cleanup
cp -a * ${cwd}
#rm -rf ${SCRATCH}
</code>

You can save this script to a file, for example ''gaussian.slurm'' and then submit it to the queue:

   sbatch gaussian.slurm

You can verify that the jobs is in the queue:

   squeue

Note: make sure you also have this statement in your Gaussian input file so that you are really using 8 CPUs: 

   %nprocshared=8
   %mem=6GB

==== Orca MPI ====

This is an example of a SLURM submit script for running the MPI version of orca on 8 processors.

<code>
#!/bin/bash
#SBATCH -n 8                       # Request 8 cores
#SBATCH -t 0-00:05                 # Runtime in D-HH:MM format
#SBATCH -p cpu                     # Partition to run in
#SBATCH --mem=20G                  # Memory total in MB (for all cores)
#SBATCH -o %j.out                  # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err                  # File to which STDERR will be written, including job ID
set -xv

echo Running on host $(hostname)
echo "Job id: ${SLURM_JOB_ID}" 
echo Time is $(date)
echo Directory is $(pwd)
echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition "

# create a scratch directory on the SDD and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
echo "Using SCRATCH directory: ${scratch_dir}"
current_dir=`pwd`
cp -a * $scratch_dir
cd $scratch_dir

module load orca/5.0.3

$ORCA_PATH/orca orca_input.inp > orca_output.out 

# copy all data back from the scratch directory
cp -a * $current_dir
rm -rf $scratch_dir
</code>

You also have to put this in your orca input file to tell the application to use 8 processors:

<code>
%pal nprocs 8 end
</code>

Please note that with older versions of Orca you have to load the appropriate MPI library to use it. This is a compatibility table between different Orca and MPI module versions:

|orca/4.0.0 | openmpi/2.0.1 |
|orca/4.0.1 | openmpi/2.0.2 |
|orca/4.2.0 | openmpi/3.1/3.1.4 |
|orca/4.2.1 | openmpi/3.1/3.1.4 |
|orca/5.0.3 | no MPI loading necessary, it is built in |

==== OpenMolcas ====

The following SLURM script can be used to run [[https://gitlab.com/Molcas/OpenMolcas/-/wikis/home|OpenMolcas]] jobs on up to 8 CPUs. Please modify for your specific needs. Kindly contributed by Jeremy Hilgar.

<code>
#!/bin/bash
#SBATCH -p cpu
#SBATCH -n 8              # 8 CPUs
#SBATCH --mem=20000       # 20GB
#SBATCH --export=ALL
#SBATCH -t 5-00:00        # Runtime in D-HH:MM format

module purge
module load openmolcas/8.4-dev

mkdir /scratch/$SLURM_JOB_ID
mkdir -p ${PWD}/output

export MOLCAS_WORKDIR=/scratch/$SLURM_JOB_ID
export MOLCAS_OUTPUT=/scratch/$SLURM_JOB_ID
export MOLCAS_MEM=19000
# set project name to current directory name
export MOLCAS_PROJECT=${PWD##*/}

# some modules markedly benefit from parallelization
export MOLCAS_NPROCS=8

# set up integrals
pymolcas ${PWD}/seward.in -oe ${PWD}/output/seward.out -b 1

# perform scf calculations
pymolcas ${PWD}/rasscf.in -oe ${PWD}/output/rasscf.out -b 1

# some modules do not benefit from parallelization,
# so we change the corresponding environment variable before calling them
export MOLCAS_NPROCS=1

# calculate spin-orbit interaction matrix elements
pymolcas ${PWD}/rassi.in -oe ${PWD}/output/rassi.out -b 1

# calculate magnetic properties
pymolcas ${PWD}/single_aniso.in -oe ${PWD}/output/single_aniso.out -b 1

# copy output magnetic properties file for further analysis (poly_aniso)
cp -a /scratch/$SLURM_JOB_ID/${PWD##*/}/ANISOINPUT /scratch/$SLURM_JOB_ID/${PWD##*/}/POLYFILE ${PWD}/output
</code>

===== Running GPU jobs =====

All GPU jobs must be submitted to the ''gpu'' partition (queue) and request ''gpu'' consumable resource. Use the following statement in your SLURM script to accomplish that:

  #SBATCH -p gpu
  #SBATCH --gres=gpu:1

==== Amber script for running GPU jobs ====

The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.

<code>
#!/bin/bash
#SBATCH -n 1                       # Request 1 cores
#SBATCH -t 0-00:05                 # Runtime in D-HH:MM format
#SBATCH -p gpu                     # Partition to run in
#SBATCH --gres=gpu:1
#SBATCH --mem=8024                 # Memory total in MB (for all cores)
#SBATCH -o %j.out                  # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err                  # File to which STDERR will be written, including job ID
set -xv

module load amber/20

export CUDA_VISIBLE_DEVICES=0

echo Running on host `hostname`
echo "Job id: $j" 
echo Time is `date`
echo Current directory is `pwd`

cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
echo SCRATCH: $SCRATCH
# copy job files to $SCRATCH
cp -a * $SCRATCH

# start your job in $SCRATCH
cd $SCRATCH
pmemd.cuda -O -i md.in -o md.out -p md.top -c md.rst -r md2.rst -x md.netcdf

# copy your results back to $HOME & cleanup
cp -a * $cwd
#rm -rf $SCRATCH
</code>


===== How to report SLURM issues =====

If you have problems submitting your SLURM jobs you can email <keck-help@keck2.ucsd.edu> for assistance.
Please include the following information in your email:

  * your jobs' SLURM ID ($SLURM_JOB_ID)
  * attach your SLURM submission script
  * cut and paste (no screenshots) the error message you are getting
  * attach job's .out and .err files