All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SLURM. User accounts not complying with this policy will be suspended.
All jobs must be submitted from w01.keck2.ucsd.edu
using the sbatch
command:
sbatch script.slurm
where script.slurm
is your SLURM script. For examples see below.
Use this guide to migrate from SGE to SLURM: https://srcc.stanford.edu/sge-slurm-conversion
w01.keck2.ucsd.edu
.sbatch script.slurm
Where script.slurm
is the filename of your SLURM script.squeue -u <userid>
or just squeue
Make sure that your SLURM script is in the correct Unix file format. You can verify that with this command:
$ file script.slurm script.slurm: Bourne-Again shell script, ASCII text executable
If you get something like this:
$ file script.slurm script.slurm: Bourne-Again shell script, ASCII text executable, with CRLF line terminators
then you likely created your SLURM script on a Windows machine and you need to convert it on w01
to Unix file format with this command: dos2unix script.slurm
.
Partition | Max wall clock time | Max number of CPUs | Max number of nodes |
---|---|---|---|
cpu | 5 days | 8 | 1 |
unlimited | no limit | 8 | 1 |
gpu | 5 days | 1 | 1 |
The following limits are imposed on all jobs:
sinfo
to see the current limit)sacctmgr list account -s format=User,MaxJobs
If you have any special requirements please email keck2-help@keck2.ucsd.edu
Command | Example syntax | Meaning |
---|---|---|
sbatch | sbatch <jobscript> | Submit a batch job. |
srun | srun --pty -t 0-0:5:0 -p cpu /bin/bash -i | Start an interactive session for five minutes in the cpu queue. |
squeue | squeue -u <userid> | View status of your jobs in the queue. Only non-completed jobs will be shown. |
scontrol | scontrol show job <jobid> | Look at a running job in detail. For more information about the job, add the -dd parameter. |
scancel | scancel <jobid> | Cancel a job. scancel can also be used to kill job arrays or job steps. |
scontrol | scontrol hold <jobid> | Pause a job |
scontrol | scontrol resume <jobid> | Resume a job |
sacct | sacct -j <jobid> | Check job accounting data. Running sacct is most useful for completed jobs. |
sinfo | sinfo | See node and partition information. Use the -N parameter to see information per node. |
sbatch --mem=12G job_script
. To see how much memory is currently available on the nodes: sinfo --Node -l
sbatch --mem-per-cpu=6G job_script
.#SBATCH --mem=30G # request allocation of 30GB RAM for the job #SBATCH --nodelist=w16 (or w17) # request the job to be sent to w16 or w17, pick a node which has no jobs running
scancel 1234 | cancel job 1234 |
scancel -u myusername | cancel all my jobs |
scancel -u myusername --state=running | cancel all my running jobs |
scancel -u myusername --state=pending | cancel all my pending jobs |
squeue -u <userid> | list information about all non-completed jobs for a user, including job ids and what status they're in. |
squeue -j <jobid> | list information for a single job |
squeue -t RUNNING | list information for only running jobs |
squeue -t PENDING | list information only for pending jobs |
squeue -p cpu | list information for only jobs in cpu partition |
squeue -p gpu -u <userid> -t RUNNING | list information for jobs in gpu partition that are currently running for a user |
scontrol show job <jobid> -dd | show details for a running job, -dd requests more detail |
sstat -j <jobid>.batch --format JobID,MaxRSS, MaxVMSize,NTasks | show status information for running job you can find all the fields you can specify with the --format parameter by running sstat -e |
sacct -j <jobid> --format=JobId,AllocCPUs,State,ReqMem, MaxRSS,Elapsed,TimeLimit,CPUTime,ReqTres | get statistics on a completed job you can find all the fields you can specify with the --format parameter by running sacct -e you can specify the width of a field with % and a number, for example --format=JobID%15 for 15 characters |
All workstations have a very fast local hard drive mounted under /scratch
. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in /scratch
at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SLURM example scripts below how this can be simply achieved.
Please note that old files (4 days and older) are regularly purged from /scratch
.
To run a CPU intensive job it must be submitted to the cpu
partition (queue), which is the default queue. An example of a SLURM script for running CPU intensive jobs (for example, Gaussian jobs) is below.
#!/bin/bash #SBATCH -n 8 # Request 8 cores #SBATCH -t 0-01:30 # Runtime in D-HH:MM format #SBATCH -p cpu # Partition to run in #SBATCH --mem=20G # Memory total in MB (for all cores) #SBATCH -o %j.out # File to which STDOUT will be written, including job ID #SBATCH -e %j.err # File to which STDERR will be written, including job ID set -xv echo Running on host $(hostname) echo "Job id: ${SLURM_JOB_ID}" echo Time is $(date) echo Directory is $(pwd) echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition " cwd=$(pwd) # create a randomly named scratch directory and copy your files there export SCRATCH=$(mktemp -d /scratch/${USER}.XXXXXX) echo "Using SCRATCH: ${SCRATCH}" export GAUSS_SRCDIR=${SCRATCH} # copy job files to $SCRATCH cp -a * ${SCRATCH} cd ${SCRATCH} module load gaussian/16.B01-sse4 # start you g16 job (change the input/output file names for your job) g16 < input.in >& output.out # copy the results back to $HOME & cleanup cp -a * ${cwd} #rm -rf ${SCRATCH}
You can save this script to a file, for example gaussian.slurm
and then submit it to the queue:
sbatch gaussian.slurm
You can verify that the jobs is in the queue:
squeue
Note: make sure you also have this statement in your Gaussian input file so that you are really using 8 CPUs:
%nprocshared=8 %mem=6GB
This is an example of a SLURM submit script for running the MPI version of orca on 8 processors.
#!/bin/bash #SBATCH -n 8 # Request 8 cores #SBATCH -t 0-00:05 # Runtime in D-HH:MM format #SBATCH -p cpu # Partition to run in #SBATCH --mem=20G # Memory total in MB (for all cores) #SBATCH -o %j.out # File to which STDOUT will be written, including job ID #SBATCH -e %j.err # File to which STDERR will be written, including job ID set -xv echo Running on host $(hostname) echo "Job id: ${SLURM_JOB_ID}" echo Time is $(date) echo Directory is $(pwd) echo "This job has allocated $SLURM_NPROCS processors in $SLURM_JOB_PARTITION partition " # create a scratch directory on the SDD and copy all runtime data there export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX` echo "Using SCRATCH directory: ${scratch_dir}" current_dir=`pwd` cp -a * $scratch_dir cd $scratch_dir module load orca/5.0.3 $ORCA_PATH/orca orca_input.inp > orca_output.out # copy all data back from the scratch directory cp -a * $current_dir rm -rf $scratch_dir
You also have to put this in your orca input file to tell the application to use 8 processors:
%pal nprocs 8 end
Please note that with older versions of Orca you have to load the appropriate MPI library to use it. This is a compatibility table between different Orca and MPI module versions:
orca/4.0.0 | openmpi/2.0.1 |
orca/4.0.1 | openmpi/2.0.2 |
orca/4.2.0 | openmpi/3.1/3.1.4 |
orca/4.2.1 | openmpi/3.1/3.1.4 |
orca/5.0.3 | no MPI loading necessary, it is built in |
The following SLURM script can be used to run OpenMolcas jobs on up to 8 CPUs. Please modify for your specific needs. Kindly contributed by Jeremy Hilgar.
#!/bin/bash #SBATCH -p cpu #SBATCH -n 8 # 8 CPUs #SBATCH --mem=20000 # 20GB #SBATCH --export=ALL #SBATCH -t 5-00:00 # Runtime in D-HH:MM format module purge module load openmolcas/8.4-dev mkdir /scratch/$SLURM_JOB_ID mkdir -p ${PWD}/output export MOLCAS_WORKDIR=/scratch/$SLURM_JOB_ID export MOLCAS_OUTPUT=/scratch/$SLURM_JOB_ID export MOLCAS_MEM=19000 # set project name to current directory name export MOLCAS_PROJECT=${PWD##*/} # some modules markedly benefit from parallelization export MOLCAS_NPROCS=8 # set up integrals pymolcas ${PWD}/seward.in -oe ${PWD}/output/seward.out -b 1 # perform scf calculations pymolcas ${PWD}/rasscf.in -oe ${PWD}/output/rasscf.out -b 1 # some modules do not benefit from parallelization, # so we change the corresponding environment variable before calling them export MOLCAS_NPROCS=1 # calculate spin-orbit interaction matrix elements pymolcas ${PWD}/rassi.in -oe ${PWD}/output/rassi.out -b 1 # calculate magnetic properties pymolcas ${PWD}/single_aniso.in -oe ${PWD}/output/single_aniso.out -b 1 # copy output magnetic properties file for further analysis (poly_aniso) cp -a /scratch/$SLURM_JOB_ID/${PWD##*/}/ANISOINPUT /scratch/$SLURM_JOB_ID/${PWD##*/}/POLYFILE ${PWD}/output
All GPU jobs must be submitted to the gpu
partition (queue) and request gpu
consumable resource. Use the following statement in your SLURM script to accomplish that:
#SBATCH -p gpu #SBATCH --gres=gpu:1
The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.
#!/bin/bash #SBATCH -n 1 # Request 1 cores #SBATCH -t 0-00:05 # Runtime in D-HH:MM format #SBATCH -p gpu # Partition to run in #SBATCH --gres=gpu:1 #SBATCH --mem=8024 # Memory total in MB (for all cores) #SBATCH -o %j.out # File to which STDOUT will be written, including job ID #SBATCH -e %j.err # File to which STDERR will be written, including job ID set -xv module load amber/20 export CUDA_VISIBLE_DEVICES=0 echo Running on host `hostname` echo "Job id: $j" echo Time is `date` echo Current directory is `pwd` cwd=`pwd` # create a randomly named scratch directory export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX` echo SCRATCH: $SCRATCH # copy job files to $SCRATCH cp -a * $SCRATCH # start your job in $SCRATCH cd $SCRATCH pmemd.cuda -O -i md.in -o md.out -p md.top -c md.rst -r md2.rst -x md.netcdf # copy your results back to $HOME & cleanup cp -a * $cwd #rm -rf $SCRATCH
If you have problems submitting your SLURM jobs you can email keck-help@keck2.ucsd.edu for assistance. Please include the following information in your email: