Differences

This shows you the differences between two versions of the page.

--- wiki:sge [2014/09/09 09:45] – [Glide docking job on 4 CPUs] admin
+++ wiki:sge [2017/05/01 09:26] (current) – [Orca MPI] admin
@@ Line 1: / Line 1: @@
-====== Using SGE to run jobs at KeckII ======
+====== Using SGE to run jobs at Keck Center ======
-<fc #FF0000>All non-interactive jobs on Keck2 workstations must be run through the job queue manager, SGE.</fc> User accounts not complying with this policy will be suspended. Please see this [[http://ctbp.ucsd.edu/computing/wiki/introduction_to_sge_the_queuing_system_on_the_clusters|SGE How-To]] for more details on how to submit a job and also some examples for most common scenarios.
+<fc #FF0000>All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SGE.</fc> User accounts not complying with this policy will be suspended. Please see this [[http://ctbp.ucsd.edu/computing/wiki/introduction_to_sge_the_queuing_system_on_the_clusters|SGE How-To]] for more details on how to submit a job and also some examples for most common scenarios.
 All jobs must be submitted from ''w01.keck2.ucsd.edu'' using the ''qsub'' command:
@@ Line 21: / Line 21: @@
   * Check on progress of your job in the queue: ''qstat -f ''
-===== KeckII scheduling policies =====
+===== Keck Center scheduling policies =====
 All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.
@@ Line 29: / Line 29: @@
 The following limits are imposed on all jobs:
-  * max wall-clock time is 48 hrs
+  * max wall-clock time is 48 hrs (subject to change, use ''qconf -sq main.q| grep h_rt'' to see the current limit)
-  * max number of processors per user is 8 although this is  dynamically changed based on cluster load. To see the current limit: ''qconf -srqs''
+  * max number of processors per user is 8 although this is  dynamically changed based on the cluster load. To see the current limit: ''qconf -srqs''
@@ Line 49: / Line 49: @@
 Please note that old files (4 days and older) are regularly purged from ''/scratch''.
 ===== Setting up account =====
@@ Line 61: / Line 62: @@
 chmod 640 authorized_keys
 </code>
 ===== Running CPU intensive jobs =====
@@ Line 111: / Line 109: @@
    qstat -f
+===== Running parallel (MPI) jobs =====
+If your application supports this you can run up to 8 parallel processes per one job. The workstations have 8 physical cores so maximum requestable number of processors is 8. <fc #FF0000>Do not over-subscribe the workstations.</fc>
+You have to use the ''mpi'' SGE queue environment with the following statement in your SGE submit script:
+<code>
+#$ -pe mpi 8
+</code>
+This requests 8 processors for your job. You have to have similar request in your application's input file. See example below.
+==== Orca MPI ====
+This is an example of a SGE submit script for running the MPI version of orca on 8 processors.
+<code>
+#!/bin/bash
+#$ -cwd
+#$ -N orca_job
+#$ -m beas
+#$ -pe mpi 8
+#$ -l h_rt=60:00:00
+#
+# create a scratch directory on the SDD and copy all runtime data there
+export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
+current_dir=`pwd`
+cp * $scratch_dir
+cd $scratch_dir
+module load orca/3.0.3
+module load openmpi/1.6.2
+$ORCA_PATH/orca orca_input.inp > orca_output.out
+# copy all data back from the scratch directory
+cp * $current_dir
+rm -rf $scratch_dir
+</code>
+You also have to put this in your orca input file to tell the application to use 8 processors:
+<code>
+%pal nprocs 8 end
+</code>
+Please note that you have to load the appropriate MPI library to use Orca. This is a compatability
+table between different Orca nad MPI module versions:
+|orca/4.0.0 | openmpi/2.0.1 |
+|orca/3.0.3 | openmpi/1.6.2 |
 ==== Amber MPI version ====
@@ Line 132: / Line 183: @@
 #$ -pe mpi 2
-module load openmpi
+module load openmpi/2.0.1
-module load amber
+module load amber/16
 echo Running on host `hostname`
@@ Line 209: / Line 260: @@
 # glide docking driver script
 #
-# rok 2014.9.9
+# rok 2014.9.10
 set -xv
 export SCHRODINGER_TEMP_PROJECT=$SCRATCH
@@ Line 215: / Line 266: @@
 export SCHRODINGER_JOBDB2=$SCRATCH
 export SCHRODINGER_TMPDIR=$SCRATCH
-export SCHRODINGER_SIGUSR1="STOP"
 export SCHRODINGER_JOBDIR=$SCRATCH
 export SCHRODINGER_BATCHID="$JOB_ID"
@@ Line 222: / Line 272: @@
 export SCHRODINGER_MAX_RETRIES=0
+export DONE=""
+function finish() {
+    echo "$(basename $0)  caught error on line : $1 command was: $2"
+    $SCHRODINGER/jobcontrol -list -children
+    $SCHRODINGER/jobcontrol -abort all
+    $SCHRODINGER/jobcontrol -list -children
+    $SCHRODINGER/utilities/jserver -info
+    $SCHRODINGER/utilities/jserver -kill
+    $SCHRODINGER/utilities/jserver -clean
+    # copy your results back to a new directory in $HOME & cleanup
+    outdir=$cwd.Results.$JOB_ID
+    mkdir $outdir
+    cp -a * $outdir
+    export DONE=1
+}
+trap 'finish $LINENO $BASH_COMMAND; exit' SIGHUP SIGINT SIGQUIT SIGTERM SIGUSR1
+GLIDE_OPTS="-NJOBS $NSLOTS -HOST localhost:$NSLOTS -LOCAL -WAIT -max_retries 0 -SUBLOCAL"
-GLIDE_OPTS="-NJOBS $NSLOTS -HOST localhost:$NSLOTS -LOCAL -WAIT -max_retries 0 -strict -SUBLOCAL"
 cat > dock.in <<EOF
@@ Line 231: / Line 299: @@
 $SCHRODINGER/glide $GLIDE_OPTS dock.in
-$SCHRODINGER/jobcontrol -list -children
-$SCHRODINGER/jobcontrol -abort all
-$SCHRODINGER/jobcontrol -list -children
-$SCHRODINGER/utilities/jserver -info
-$SCHRODINGER/utilities/jserver -kill
-$SCHRODINGER/utilities/jserver -clean
 </code>
@@ Line 279: / Line 340: @@
 ./run_docking.sh
-# copy your results back to a new directory in $HOME & cleanup
+# clean the job, if still managed by SCHRD job control
-outdir=$cwd.Results.$JOB_ID
+$SCHRODINGER/jobcontrol -list -children
-mkdir $outdir
+$SCHRODINGER/jobcontrol -abort all
-cp -a * $outdir
+$SCHRODINGER/jobcontrol -list -children
+$SCHRODINGER/utilities/jserver -info
+$SCHRODINGER/utilities/jserver -kill
+$SCHRODINGER/utilities/jserver -clean
+if [ -z "$DONE" ] ; then
+    # copy your results back to a new directory in $HOME & cleanup
+    outdir=$cwd.Results.$JOB_ID
+    mkdir $outdir
+    cp -a * $outdir
+fi
 #rm -rf $SCRATCH
 </code>
@@ Line 298: / Line 370: @@
 ==== Amber ====
-The optimal AMBER job configuration for KeckII is to use 1 CPU and 1 GPU per run.
+The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.
 <code>
@@ Line 314: / Line 386: @@
 #$ -l h_rt=12:00:00
-module load nvidia
+module load cuda/7.5.18
-module load amber
+module load amber/16
 export CUDA_VISIBLE_DEVICES=0
@@ Line 347: / Line 420: @@
 <code>
 #!/bin/bash
+set -xv
 #$ -cwd
 #$ -q gpu.q
@@ Line 384: / Line 458: @@
 <code>
 #!/bin/bash
+set -xv
 #$ -cwd
 #$ -q gpu.q
@@ Line 393: / Line 468: @@
 #$ -l h_rt=48:00:00
-module load nvidia
+module load namd-cuda/2.11
-module load namd-cuda
 export CUDA_VISIBLE_DEVICES=0
@@ Line 416: / Line 490: @@
 ==== Benchmarks ====
-These are several GPU benchmarks for CUDA enabled Amber and NAMD which should help you to estimate the Keck2 hardware performance.
+These are several GPU benchmarks for CUDA enabled Amber and NAMD which should help you to estimate the Keck Center hardware performance.