User Tools

Site Tools


wiki:gpu_jobs

This is an old revision of the document!


Running jobs on GPUs

Keck II workstation w01-w10 and w13-w15.keck2.ucsd.edu have NVidia GTX680 GPU installed each. These can be used to run computationally intensive jobs on.

All jobs must be submitted through the SGE queue manager. All rogue jobs will be terminated and user accounts not adhering to this policy will be suspended.

How to submit a job

  • Create all necessary input files for your job.
  • Create an SGE script which is used to submit the job to the job queue manager.
  • Submit your job suing this command: qsub script.sge Where script.sge is the filename of your SGE script.
  • Check on progress of your job in the queue: qstat -f

More information about SGE can be found here.

KeckII scheduling policies

All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.

Queue limits

The following limits are imposed on all jobs:

  • max wall-clock time is 48 hrs
  • max number of processors per user is 16 although this is dynamically changed based on the load. To see the current limit: qconf -srqs

If you have any special requirements please email keck-help@keck2.ucsd.edu

Best practices

All workstations have a very fast solid state drive (SSD) mounted under /scratch. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in /scratch at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SGE example scripts how this can be simply achieved.

Example SGE scripts

These are example SGE script for running most common applications on the GPUs.

Amber

The optimal AMBER job configuration for KeckII is to use 1 CPU and 1 GPU per run.

#!/bin/bash
#$ -cwd
#$ -q gpu.q
#$ -V
#$ -N AMBER_job
#$ -S /bin/bash
#$ -e sge.err
#$ -o sge.out

myrun=my_simulation_name

module load nvidia
module load amber
export CUDA_VISIBLE_DEVICES=0

# create a scratch directory on the SDD and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
current_dir=`pwd`
cp * $scratch_dir
cd $scratch_dir

$AMBERHOME/bin/pmemd.cuda -O -i $myrun.in -o $myrun.out -r $myrun.rst \
 -x $myrun.nc -p  $myrun.prmtop -c $myrun.rst

# copy all data back from the scratch directory
cp * $current_dir
rm -rf $scratch_dir

NAMD

Running NAMD on 2 CPUs and one GPU is the optimal number of CPUs/GPUs for a typical NAMD job on KeckII workstations.

Running namd on 2 CPUs/1 GPU

#!/bin/bash
#$ -cwd
#$ -q gpu.q
#$ -V
#$ -N NAMD_job
#$ -pe orte-host 2
#$ -S /bin/bash
#$ -e sge.err
#$ -o sge.out

module load nvidia
module load namd-cuda

# create a scratch directory and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
current_dir=`pwd`
cp * $scratch_dir
cd $scratch_dir

# 2 CPUs/1 GPU
namd2 +idlepoll +p2 +devices 1 apoa1.namd >& apoa1-2.1.out

# copy all data back from the scratch directory
cp * $current_dir
rm -rf $scratch_dir
wiki/gpu_jobs.1384914436.txt.gz ยท Last modified: 2013/11/19 18:27 by admin