User Tools

Site Tools


wiki:sge

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wiki:sge [2014/09/09 09:45] – [Glide docking job on 4 CPUs] adminwiki:sge [2017/05/01 09:26] (current) – [Orca MPI] admin
Line 1: Line 1:
-====== Using SGE to run jobs at KeckII ======+====== Using SGE to run jobs at Keck Center ======
  
  
  
-<fc #FF0000>All non-interactive jobs on Keck2 workstations must be run through the job queue manager, SGE.</fc> User accounts not complying with this policy will be suspended. Please see this [[http://ctbp.ucsd.edu/computing/wiki/introduction_to_sge_the_queuing_system_on_the_clusters|SGE How-To]] for more details on how to submit a job and also some examples for most common scenarios. +<fc #FF0000>All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SGE.</fc> User accounts not complying with this policy will be suspended. Please see this [[http://ctbp.ucsd.edu/computing/wiki/introduction_to_sge_the_queuing_system_on_the_clusters|SGE How-To]] for more details on how to submit a job and also some examples for most common scenarios. 
  
 All jobs must be submitted from ''w01.keck2.ucsd.edu'' using the ''qsub'' command: All jobs must be submitted from ''w01.keck2.ucsd.edu'' using the ''qsub'' command:
Line 21: Line 21:
   * Check on progress of your job in the queue: ''qstat -f ''   * Check on progress of your job in the queue: ''qstat -f ''
  
-===== KeckII scheduling policies =====+===== Keck Center scheduling policies =====
  
 All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue. All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.
Line 29: Line 29:
 The following limits are imposed on all jobs: The following limits are imposed on all jobs:
  
-  * max wall-clock time is 48 hrs +  * max wall-clock time is 48 hrs (subject to change, use ''qconf -sq main.q| grep h_rt'' to see the current limit) 
-  * max number of processors per user is 8 although this is  dynamically changed based on cluster load. To see the current limit: ''qconf -srqs''+  * max number of processors per user is 8 although this is  dynamically changed based on the cluster load. To see the current limit: ''qconf -srqs''
  
  
Line 49: Line 49:
  
 Please note that old files (4 days and older) are regularly purged from ''/scratch''. Please note that old files (4 days and older) are regularly purged from ''/scratch''.
 +
 ===== Setting up account ===== ===== Setting up account =====
  
Line 61: Line 62:
 chmod 640 authorized_keys chmod 640 authorized_keys
 </code> </code>
- 
- 
- 
  
 ===== Running CPU intensive jobs ===== ===== Running CPU intensive jobs =====
Line 111: Line 109:
  
    qstat -f    qstat -f
 +
 +===== Running parallel (MPI) jobs =====
 +
 +If your application supports this you can run up to 8 parallel processes per one job. The workstations have 8 physical cores so maximum requestable number of processors is 8. <fc #FF0000>Do not over-subscribe the workstations.</fc>
 +
 +You have to use the ''mpi'' SGE queue environment with the following statement in your SGE submit script:
 +
 +<code>
 +#$ -pe mpi 8
 +</code>
 +
 +This requests 8 processors for your job. You have to have similar request in your application's input file. See example below.
 +
 +
 +==== Orca MPI ====
 +
 +This is an example of a SGE submit script for running the MPI version of orca on 8 processors.
 +
 +<code>
 +#!/bin/bash
 +#$ -cwd
 +#$ -N orca_job
 +#$ -m beas
 +#$ -pe mpi 8
 +#$ -l h_rt=60:00:00
 +#
 +# create a scratch directory on the SDD and copy all runtime data there
 +export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
 +current_dir=`pwd`
 +cp * $scratch_dir
 +cd $scratch_dir
 +
 +module load orca/3.0.3
 +module load openmpi/1.6.2
 +$ORCA_PATH/orca orca_input.inp > orca_output.out 
 +
 +# copy all data back from the scratch directory
 +cp * $current_dir
 +rm -rf $scratch_dir
 +</code>
 +
 +You also have to put this in your orca input file to tell the application to use 8 processors:
 +
 +<code>
 +%pal nprocs 8 end
 +</code>
 +
 +Please note that you have to load the appropriate MPI library to use Orca. This is a compatability
 +table between different Orca nad MPI module versions:
 +
 +|orca/4.0.0 | openmpi/2.0.1 |
 +|orca/3.0.3 | openmpi/1.6.2 |
 +
  
 ==== Amber MPI version ==== ==== Amber MPI version ====
Line 132: Line 183:
 #$ -pe mpi 2 #$ -pe mpi 2
  
-module load openmpi +module load openmpi/2.0.1 
-module load amber+module load amber/16
  
 echo Running on host `hostname` echo Running on host `hostname`
Line 209: Line 260:
 # glide docking driver script # glide docking driver script
 # #
-# rok 2014.9.9+# rok 2014.9.10
 set -xv set -xv
 export SCHRODINGER_TEMP_PROJECT=$SCRATCH export SCHRODINGER_TEMP_PROJECT=$SCRATCH
Line 215: Line 266:
 export SCHRODINGER_JOBDB2=$SCRATCH export SCHRODINGER_JOBDB2=$SCRATCH
 export SCHRODINGER_TMPDIR=$SCRATCH export SCHRODINGER_TMPDIR=$SCRATCH
-export SCHRODINGER_SIGUSR1="STOP" 
 export SCHRODINGER_JOBDIR=$SCRATCH export SCHRODINGER_JOBDIR=$SCRATCH
 export SCHRODINGER_BATCHID="$JOB_ID" export SCHRODINGER_BATCHID="$JOB_ID"
Line 222: Line 272:
 export SCHRODINGER_MAX_RETRIES=0 export SCHRODINGER_MAX_RETRIES=0
  
 +export DONE=""
 +
 +function finish() {
 +    echo "$(basename $0)  caught error on line : $1 command was: $2"
 +    $SCHRODINGER/jobcontrol -list -children
 +    $SCHRODINGER/jobcontrol -abort all
 +    $SCHRODINGER/jobcontrol -list -children
 +    $SCHRODINGER/utilities/jserver -info
 +    $SCHRODINGER/utilities/jserver -kill
 +    $SCHRODINGER/utilities/jserver -clean
 +    # copy your results back to a new directory in $HOME & cleanup
 +    outdir=$cwd.Results.$JOB_ID
 +    mkdir $outdir
 +    cp -a * $outdir
 +    export DONE=1
 +}
 +trap 'finish $LINENO $BASH_COMMAND; exit' SIGHUP SIGINT SIGQUIT SIGTERM SIGUSR1
 +
 +GLIDE_OPTS="-NJOBS $NSLOTS -HOST localhost:$NSLOTS -LOCAL -WAIT -max_retries 0 -SUBLOCAL"
  
-GLIDE_OPTS="-NJOBS $NSLOTS -HOST localhost:$NSLOTS -LOCAL -WAIT -max_retries 0 -strict -SUBLOCAL" 
  
 cat > dock.in <<EOF cat > dock.in <<EOF
Line 231: Line 299:
  
 $SCHRODINGER/glide $GLIDE_OPTS dock.in $SCHRODINGER/glide $GLIDE_OPTS dock.in
- 
-$SCHRODINGER/jobcontrol -list -children 
-$SCHRODINGER/jobcontrol -abort all 
-$SCHRODINGER/jobcontrol -list -children 
-$SCHRODINGER/utilities/jserver -info 
-$SCHRODINGER/utilities/jserver -kill 
-$SCHRODINGER/utilities/jserver -clean 
  
 </code> </code>
Line 279: Line 340:
 ./run_docking.sh ./run_docking.sh
  
-# copy your results back to a new directory in $HOME & cleanup +# clean the job, if still managed by SCHRD job control 
-outdir=$cwd.Results.$JOB_ID +$SCHRODINGER/jobcontrol -list -children 
-mkdir $outdir +$SCHRODINGER/jobcontrol -abort all 
-cp -a * $outdir+$SCHRODINGER/jobcontrol -list -children 
 +$SCHRODINGER/utilities/jserver -info 
 +$SCHRODINGER/utilities/jserver -kill 
 +$SCHRODINGER/utilities/jserver -clean 
 + 
 +if [ -z "$DONE" ] ; then 
 +    # copy your results back to a new directory in $HOME & cleanup 
 +    outdir=$cwd.Results.$JOB_ID 
 +    mkdir $outdir 
 +    cp -a * $outdir 
 +fi 
 #rm -rf $SCRATCH #rm -rf $SCRATCH
 </code> </code>
Line 298: Line 370:
 ==== Amber ==== ==== Amber ====
  
-The optimal AMBER job configuration for KeckII is to use 1 CPU and 1 GPU per run.+The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.
  
 <code> <code>
Line 314: Line 386:
 #$ -l h_rt=12:00:00 #$ -l h_rt=12:00:00
  
-module load nvidia +module load cuda/7.5.18 
-module load amber+module load amber/16 
 export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
  
Line 347: Line 420:
 <code> <code>
 #!/bin/bash #!/bin/bash
 +set -xv
 #$ -cwd #$ -cwd
 #$ -q gpu.q #$ -q gpu.q
Line 384: Line 458:
 <code> <code>
 #!/bin/bash #!/bin/bash
 +set -xv
 #$ -cwd #$ -cwd
 #$ -q gpu.q #$ -q gpu.q
Line 393: Line 468:
 #$ -l h_rt=48:00:00 #$ -l h_rt=48:00:00
  
-module load nvidia +module load namd-cuda/2.11
-module load namd-cuda+
 export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
  
Line 416: Line 490:
 ==== Benchmarks ==== ==== Benchmarks ====
  
-These are several GPU benchmarks for CUDA enabled Amber and NAMD which should help you to estimate the Keck2 hardware performance.+These are several GPU benchmarks for CUDA enabled Amber and NAMD which should help you to estimate the Keck Center hardware performance.
  
  
wiki/sge.1410281104.txt.gz · Last modified: 2014/09/09 09:45 by admin