All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SLURM. User accounts not complying with this policy will be suspended.
All jobs must be submitted from w01.keck2.ucsd.edu using the sbatch command:
sbatch script.slurm
where script.slurm is your SLURM script. For examples see below.
Use this guide to migrate from SGE to SLURM: https://srcc.stanford.edu/sge-slurm-conversion
w01.keck2.ucsd.edu.sbatch script.slurm Where script.slurm is the filename of your SLURM script.squeue -u <userid> or just squeueMake sure that your SLURM script is in the correct Unix file format. You can verify that with this command:
$ file script.slurm script.slurm: Bourne-Again shell script, ASCII text executable
If you get something like this:
$ file script.slurm script.slurm: Bourne-Again shell script, ASCII text executable, with CRLF line terminators
then you likely created your SLURM script on a Windows machine and you need to convert it on w01 to Unix file format with this command: dos2unix script.slurm.
| Partition | Max wall clock time | Max number of CPUs | Max number of nodes |
|---|---|---|---|
| cpu | 5 days | 8 | 1 |
| unlimited | no limit | 8 | 1 |
| gpu | 5 days | 1 | 1 |
The following limits are imposed on all jobs:
sinfo to see the current limit)sacctmgr list account -s format=User,MaxJobsIf you have any special requirements please email keck2-help@keck2.ucsd.edu
| Command | Example syntax | Meaning |
|---|---|---|
| sbatch | sbatch <jobscript> | Submit a batch job. |
| srun | srun --pty -t 0-0:5:0 -p cpu /bin/bash -i | Start an interactive session for five minutes in the cpu queue. |
| squeue | squeue -u <userid> | View status of your jobs in the queue. Only non-completed jobs will be shown. |
| scontrol | scontrol show job <jobid> | Look at a running job in detail. For more information about the job, add the -dd parameter. |
| scancel | scancel <jobid> | Cancel a job. scancel can also be used to kill job arrays or job steps. |
| scontrol | scontrol hold <jobid> | Pause a job |
| scontrol | scontrol resume <jobid> | Resume a job |
| sacct | sacct -j <jobid> | Check job accounting data. Running sacct is most useful for completed jobs. |
| sinfo | sinfo | See node and partition information. Use the -N parameter to see information per node. |
sbatch --mem=12G job_script. To see how much memory is currently available on the nodes: sinfo --Node -lsbatch --mem-per-cpu=6G job_script.#SBATCH --mem=30G # request allocation of 30GB RAM for the job #SBATCH --nodelist=w16 (or w17) # request the job to be sent to w16 or w17, pick a node which has no jobs running
| scancel 1234 | cancel job 1234 |
| scancel -u myusername | cancel all my jobs |
| scancel -u myusername --state=running | cancel all my running jobs |
| scancel -u myusername --state=pending | cancel all my pending jobs |
| squeue -u <userid> | list information about all non-completed jobs for a user, including job ids and what status they're in. |
| squeue -j <jobid> | list information for a single job |
| squeue -t RUNNING | list information for only running jobs |
| squeue -t PENDING | list information only for pending jobs |
| squeue -p cpu | list information for only jobs in cpu partition |
| squeue -p gpu -u <userid> -t RUNNING | list information for jobs in gpu partition that are currently running for a user |
| scontrol show job <jobid> -dd | show details for a running job, -dd requests more detail |
| sstat -j <jobid>.batch --format JobID,MaxRSS, MaxVMSize,NTasks | show status information for running job you can find all the fields you can specify with the --format parameter by running sstat -e |
| sacct -j <jobid> --format=JobId,AllocCPUs,State,ReqMem, MaxRSS,Elapsed,TimeLimit,CPUTime,ReqTres | get statistics on a completed job you can find all the fields you can specify with the --format parameter by running sacct -e you can specify the width of a field with % and a number, for example --format=JobID%15 for 15 characters |
All workstations have a very fast local hard drive mounted under /scratch. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in /scratch at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SLURM example scripts below how this can be simply achieved.
Please note that old files (4 days and older) are regularly purged from /scratch.
To run a CPU intensive job it must be submitted to the cpu partition (queue), which is the default queue. An example of a SLURM script for running CPU intensive jobs (for example, Gaussian jobs) is below.
#!/bin/bash
#SBATCH -n 8 # Request 8 cores
#SBATCH -t 0-01:30 # Runtime in D-HH:MM format
#SBATCH -p cpu # Partition to run in
#SBATCH --mem=20G # Memory total in MB (for all cores)
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err # File to which STDERR will be written, including job ID
set -xv
echo Running on host $(hostname)
echo "Job id: ${SLURM_JOB_ID}"
echo Time is $(date)
echo Directory is $(pwd)
echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition "
cwd=$(pwd)
# create a randomly named scratch directory and copy your files there
export SCRATCH=$(mktemp -d /scratch/${USER}.XXXXXX)
echo "Using SCRATCH: ${SCRATCH}"
export GAUSS_SRCDIR=${SCRATCH}
# copy job files to $SCRATCH
cp -a * ${SCRATCH}
cd ${SCRATCH}
module load gaussian/16.B01-sse4
# start you g16 job (change the input/output file names for your job)
g16 < input.in >& output.out
# copy the results back to $HOME & cleanup
cp -a * ${cwd}
#rm -rf ${SCRATCH}
You can save this script to a file, for example gaussian.slurm and then submit it to the queue:
sbatch gaussian.slurm
You can verify that the jobs is in the queue:
squeue
Note: make sure you also have this statement in your Gaussian input file so that you are really using 8 CPUs:
%nprocshared=8 %mem=6GB
This is an example of a SLURM submit script for running the MPI version of orca on 8 processors.
#!/bin/bash
#SBATCH -n 8 # Request 8 cores
#SBATCH -t 0-00:05 # Runtime in D-HH:MM format
#SBATCH -p cpu # Partition to run in
#SBATCH --mem=20G # Memory total in MB (for all cores)
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err # File to which STDERR will be written, including job ID
set -xv
echo Running on host $(hostname)
echo "Job id: ${SLURM_JOB_ID}"
echo Time is $(date)
echo Directory is $(pwd)
echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition "
# create a scratch directory on the SDD and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
echo "Using SCRATCH directory: ${scratch_dir}"
current_dir=`pwd`
cp -a * $scratch_dir
cd $scratch_dir
module load orca/5.0.3
$ORCA_PATH/orca orca_input.inp > orca_output.out
# copy all data back from the scratch directory
cp -a * $current_dir
rm -rf $scratch_dir
You also have to put this in your orca input file to tell the application to use 8 processors:
%pal nprocs 8 end
Please note that with older versions of Orca you have to load the appropriate MPI library to use it. This is a compatibility table between different Orca and MPI module versions:
| orca/4.0.0 | openmpi/2.0.1 |
| orca/4.0.1 | openmpi/2.0.2 |
| orca/4.2.0 | openmpi/3.1/3.1.4 |
| orca/4.2.1 | openmpi/3.1/3.1.4 |
| orca/5.0.3 | no MPI loading necessary, it is built in |
The following SLURM script can be used to run OpenMolcas jobs on up to 8 CPUs. Please modify for your specific needs. Kindly contributed by Jeremy Hilgar.
#!/bin/bash
#SBATCH -p cpu
#SBATCH -n 8 # 8 CPUs
#SBATCH --mem=20000 # 20GB
#SBATCH --export=ALL
#SBATCH -t 5-00:00 # Runtime in D-HH:MM format
module purge
module load openmolcas/8.4-dev
mkdir /scratch/$SLURM_JOB_ID
mkdir -p ${PWD}/output
export MOLCAS_WORKDIR=/scratch/$SLURM_JOB_ID
export MOLCAS_OUTPUT=/scratch/$SLURM_JOB_ID
export MOLCAS_MEM=19000
# set project name to current directory name
export MOLCAS_PROJECT=${PWD##*/}
# some modules markedly benefit from parallelization
export MOLCAS_NPROCS=8
# set up integrals
pymolcas ${PWD}/seward.in -oe ${PWD}/output/seward.out -b 1
# perform scf calculations
pymolcas ${PWD}/rasscf.in -oe ${PWD}/output/rasscf.out -b 1
# some modules do not benefit from parallelization,
# so we change the corresponding environment variable before calling them
export MOLCAS_NPROCS=1
# calculate spin-orbit interaction matrix elements
pymolcas ${PWD}/rassi.in -oe ${PWD}/output/rassi.out -b 1
# calculate magnetic properties
pymolcas ${PWD}/single_aniso.in -oe ${PWD}/output/single_aniso.out -b 1
# copy output magnetic properties file for further analysis (poly_aniso)
cp -a /scratch/$SLURM_JOB_ID/${PWD##*/}/ANISOINPUT /scratch/$SLURM_JOB_ID/${PWD##*/}/POLYFILE ${PWD}/output
All GPU jobs must be submitted to the gpu partition (queue) and request gpu consumable resource. Use the following statement in your SLURM script to accomplish that:
#SBATCH -p gpu #SBATCH --gres=gpu:1
The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.
#!/bin/bash
#SBATCH -n 1 # Request 1 cores
#SBATCH -t 0-00:05 # Runtime in D-HH:MM format
#SBATCH -p gpu # Partition to run in
#SBATCH --gres=gpu:1
#SBATCH --mem=8024 # Memory total in MB (for all cores)
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID
#SBATCH -e %j.err # File to which STDERR will be written, including job ID
set -xv
module load amber/20
export CUDA_VISIBLE_DEVICES=0
echo Running on host `hostname`
echo "Job id: $j"
echo Time is `date`
echo Current directory is `pwd`
cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
echo SCRATCH: $SCRATCH
# copy job files to $SCRATCH
cp -a * $SCRATCH
# start your job in $SCRATCH
cd $SCRATCH
pmemd.cuda -O -i md.in -o md.out -p md.top -c md.rst -r md2.rst -x md.netcdf
# copy your results back to $HOME & cleanup
cp -a * $cwd
#rm -rf $SCRATCH
If you have problems submitting your SLURM jobs you can email keck-help@keck2.ucsd.edu for assistance. Please include the following information in your email: