This is an old revision of the document!
Table of Contents
Running jobs on GPUs
Keck II workstation w01
-w10
and w13
-w15.keck2.ucsd.edu
have NVidia GTX680 GPU installed each. These can be used to run computationally intensive jobs on.
All jobs must be submitted through the SGE queue manager. All rogue jobs will be terminated and user accounts not adhering to this policy will be suspended.
How to submit a job
- Create all necessary input files for your job.
- Create an SGE script which is used to submit the job to the job queue manager.
- Submit your job suing this command:
qsub script.sge
Wherescript.sge
is the filename of your SGE script. - Check on progress of your job in the queue:
qstat -f
More information about SGE can be found here.
KeckII scheduling policies
All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.
Queue limits
The following limits are imposed on all jobs:
- max wall-clock time is 48 hrs
- max number of processors per user is 16 although this is dynamically changed based on the load. To see the current limit:
qconf -srqs
If you have any special requirements please email keck-help@keck2.ucsd.edu
Best practices
All workstations have a very fast solid state drive (SSD) mounted under /scratch
. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in /scratch
at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SGE example scripts how this can be simply achieved.
Example SGE scripts
These are example SGE script for running most common applications on the GPUs.
Amber
The optimal AMBER job configuration for KeckII is to use 1 CPU and 1 GPU per run.
#!/bin/bash #$ -cwd #$ -q gpu.q #$ -V #$ -N AMBER_job #$ -S /bin/bash #$ -e sge.err #$ -o sge.out myrun=my_simulation_name module load nvidia module load amber export CUDA_VISIBLE_DEVICES=0 # create a scratch directory on the SDD and copy all runtime data there export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX` current_dir=`pwd` cp * $scratch_dir cd $scratch_dir $AMBERHOME/bin/pmemd.cuda -O -i $myrun.in -o $myrun.out -r $myrun.rst \ -x $myrun.nc -p $myrun.prmtop -c $myrun.rst # copy all data back from the scratch directory cp * $current_dir rm -rf $scratch_dir
NAMD
Running NAMD on 2 CPUs and one GPU is the optimal number of CPUs/GPUs for a typical NAMD job on KeckII workstations.
Running namd on 2 CPUs/1 GPU
#!/bin/bash #$ -cwd #$ -q gpu.q #$ -V #$ -N NAMD_job #$ -pe orte-host 2 #$ -S /bin/bash #$ -e sge.err #$ -o sge.out module load nvidia module load namd-cuda # create a scratch directory and copy all runtime data there export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX` current_dir=`pwd` cp * $scratch_dir cd $scratch_dir # 2 CPUs/1 GPU namd2 +idlepoll +p2 +devices 1 apoa1.namd >& apoa1-2.1.out # copy all data back from the scratch directory cp * $current_dir rm -rf $scratch_dir