Differences

This shows you the differences between two versions of the page.

--- wiki:gpu_jobs [2013/11/19 18:27] – [Amber] admin
+++ wiki:gpu_jobs [2013/12/11 11:21] (current) – removed admin
@@ Line 1: / Line 1: @@
-====== Running jobs on GPUs ======
-Keck II workstation ''w01''-''w10'' and ''w13''-''w15.keck2.ucsd.edu'' have NVidia GTX680 GPU installed each. These can be used to run computationally intensive jobs on.
-<color black/yellow>All jobs must be submitted through the SGE queue manager. All rogue jobs will be terminated and user accounts not adhering to this policy will be suspended.</color>
-===== How to submit a job =====
-  * Create all necessary input files for your job.
-  * Create an SGE script which is used to submit the job to the job queue manager.
-  * Submit your job suing this command: ''qsub script.sge'' Where ''script.sge'' is the filename of your SGE script.
-  * Check on progress of your job in the queue: ''qstat -f ''
-More information about SGE can be found [[https://ctbp.ucsd.edu/computing/wiki/introduction_to_sge_the_queuing_system_on_the_clusters|here]].
-===== KeckII scheduling policies =====
-All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.
-==== Queue limits ====
-The following limits are imposed on all jobs:
-  * max wall-clock time is 48 hrs
-  * max number of processors per user is 16 although this is  dynamically changed based on the load. To see the current limit: ''qconf -srqs''
-If you have any special requirements please email <keck-help@keck2.ucsd.edu>
-===== Best practices =====
-All workstations have a very fast solid state drive (SSD) mounted under ''/scratch''. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in ''/scratch'' at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SGE example scripts how this can be simply achieved.
-===== Example SGE scripts =====
-These are example SGE script for running most common applications on the GPUs.
-==== Amber ====
-The optimal AMBER job configuration for KeckII is to use 1 CPU and 1 GPU per run.
-<code>
-#!/bin/bash
-#$ -cwd
-#$ -q gpu.q
-#$ -V
-#$ -N AMBER_job
-#$ -S /bin/bash
-#$ -e sge.err
-#$ -o sge.out
-myrun=my_simulation_name
-module load nvidia
-module load amber
-export CUDA_VISIBLE_DEVICES=0
-# create a scratch directory on the SDD and copy all runtime data there
-export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
-current_dir=`pwd`
-cp * $scratch_dir
-cd $scratch_dir
-$AMBERHOME/bin/pmemd.cuda -O -i $myrun.in -o $myrun.out -r $myrun.rst \
- -x $myrun.nc -p  $myrun.prmtop -c $myrun.rst
-# copy all data back from the scratch directory
-cp * $current_dir
-rm -rf $scratch_dir
-</code>
-==== NAMD ====
-Running NAMD on 2 CPUs and one GPU is the optimal number of CPUs/GPUs for a typical NAMD job on KeckII workstations.
-=== Running namd on 2 CPUs/1 GPU ===
-<code>
-#!/bin/bash
-#$ -cwd
-#$ -q all.q
-#$ -V
-#$ -N NAMD_job
-#$ -pe orte-host 2
-#$ -S /bin/bash
-#$ -e sge.err
-#$ -o sge.out
-module load nvidia
-module load namd-cuda
-# create a scratch directory and copy all runtime data there
-export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
-current_dir=`pwd`
-cp * $scratch_dir
-cd $scratch_dir
-# 2 CPUs/1 GPU
-namd2 +idlepoll +p2 +devices 1 apoa1.namd >& apoa1-2.1.out
-# copy all data back from the scratch directory
-cp * $current_dir
-rm -rf $scratch_dir
-</code>