====== Using SGE to run jobs at Keck Center ======


<fc #FF0000>All non-interactive jobs on Keck Center workstations must be run through the job queue manager, SGE.</fc> User accounts not complying with this policy will be suspended. Please see this [[http://ctbp.ucsd.edu/computing/wiki/introduction_to_sge_the_queuing_system_on_the_clusters|SGE How-To]] for more details on how to submit a job and also some examples for most common scenarios. 

All jobs must be submitted from ''w01.keck2.ucsd.edu'' using the ''qsub'' command:

  qsub script.sge


where ''script.sge'' is your SGE script. For examples see below.

It is no longer necessary to log in to individual workstations to run any non-interactive jobs. In fact, remote logins to individual workstations will be eventually disabled. Any jobs running outside of the queue will be terminated.

===== How to submit a job =====

  * Create all necessary input files for your job.
  * Create an SGE script which is used to submit the job to the job queue manager.
  * Submit your job suing this command: ''qsub script.sge'' Where ''script.sge'' is the filename of your SGE script.
  * Check on progress of your job in the queue: ''qstat -f ''

===== Keck Center scheduling policies =====

All jobs must be submitted to the SGE queue. It is strictly prohibited to run any non-interactive CPU-consuming jobs outside of the queue.

==== Queue limits ====

The following limits are imposed on all jobs:

  * max wall-clock time is 48 hrs (subject to change, use ''qconf -sq main.q| grep h_rt'' to see the current limit)
  * max number of processors per user is 8 although this is  dynamically changed based on the cluster load. To see the current limit: ''qconf -srqs''


If you have any special requirements please email <keck2-help@keck2.ucsd.edu>


==== SGE useful commands ====


  * request a node with 12GB of RAM: ''qsub -l mem_free=12G job_script'', to see how much memory is currently available on the nodes: ''qhost''

  * to see what jobs are running on the cluster: ''qstat -f -u \*'' 

===== Best practices =====

All workstations have a very fast local hard drive mounted under ''/scratch''. We strongly recommend using this drive for your jobs. The usual practice is to create a temporary directory in ''/scratch'' at the beginning of your job, copy your runtime (input) files there, change your working directory and run your job from there. Please see the SGE example scripts below how this can be simply achieved.

Please note that old files (4 days and older) are regularly purged from ''/scratch''.

===== Setting up account =====

Note: In order to submit your jobs on the cluster your account must be set up for password-less ssh login to the nodes. To do this perform the following on ''w01.keck2.ucsd.edu'':

<code>
cd $HOME
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
cd .ssh
touch authorized_keys
cat id_rsa.pub >> authorized_keys
chmod 640 authorized_keys
</code>

===== Running CPU intensive jobs =====

To run a CPU intensive job it must be submitted to the ''main.q'' queue, which is the default queue. An example of 
an SGE script for running CPU intensive jobs (for example, Gaussian jobs) is below. Please note that this job will run on 1 CPU only. If you want to run a multiple-CPU job please see this[[http://keck2.ucsd.edu/dokuwiki/doku.php/wiki:sge?&#gaussian_on_multiple_porcessors|this example]].
<code>
#!/bin/bash
#$ -cwd
#$ -q main.q
# name of your job
#$ -N cpu_test_job
#$ -m e
# name of SGE stdout and stderr files
#$ -e sge.err
#$ -o sge.out
# requesting 24hrs wall clock time (maximum is 48hrs)
#$ -l h_rt=24:00:00

echo Running on host `hostname`
echo "SGE job id: $JOB_ID" 
echo Time is `date`
echo Directory is `pwd`
echo This job has allocated $NSLOTS processor(s)
cwd=`pwd`
# create a randomly named scratch directory and copy your files there
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
# copy job files to $SCRATCH
cp * $SCRATCH
cd $SCRATCH

module load gaussian

# start you g09 job (change the input/output file names for your job)
g09 < input.in >& output.out

# copy the results back to $HOME & cleanup
cp * $cwd
#rm -rf $SCRATCH
</code>

You can save this script to a file, for example ''gaussian.sge'' and then submit it to the queue:

   qsub gaussian.sge
   
You can verify that the jobs is in the queue:

   qstat -f

===== Running parallel (MPI) jobs =====

If your application supports this you can run up to 8 parallel processes per one job. The workstations have 8 physical cores so maximum requestable number of processors is 8. <fc #FF0000>Do not over-subscribe the workstations.</fc>

You have to use the ''mpi'' SGE queue environment with the following statement in your SGE submit script:

<code>
#$ -pe mpi 8
</code>

This requests 8 processors for your job. You have to have similar request in your application's input file. See example below.


==== Orca MPI ====

This is an example of a SGE submit script for running the MPI version of orca on 8 processors.

<code>
#!/bin/bash
#$ -cwd
#$ -N orca_job
#$ -m beas
#$ -pe mpi 8
#$ -l h_rt=60:00:00
#
# create a scratch directory on the SDD and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
current_dir=`pwd`
cp * $scratch_dir
cd $scratch_dir

module load orca/3.0.3
module load openmpi/1.6.2
$ORCA_PATH/orca orca_input.inp > orca_output.out 

# copy all data back from the scratch directory
cp * $current_dir
rm -rf $scratch_dir
</code>

You also have to put this in your orca input file to tell the application to use 8 processors:

<code>
%pal nprocs 8 end
</code>

Please note that you have to load the appropriate MPI library to use Orca. This is a compatability
table between different Orca nad MPI module versions:

|orca/4.0.0 | openmpi/2.0.1 |
|orca/3.0.3 | openmpi/1.6.2 |


==== Amber MPI version ====

sander or pmemd can also be run in parallel on up to 8 CPUs. This is an example of SGE script:

<code>
#!/bin/bash
#
set -xv
#$ -cwd
#
#$ -N amber_mpi_job
#$ -e sge.err
#$ -o sge.out
# requesting the CPU queue
#$ -q main.q
# requesting 12hrs wall clock time (48 hrs is max)
#$ -l h_rt=12:00:00
# requesting two processors
#$ -pe mpi 2

module load openmpi/2.0.1
module load amber/16

echo Running on host `hostname`
echo "SGE job id: $JOB_ID" 
echo Time is `date`
echo Directory is `pwd`
echo This job has allocated $NSLOTS processors.

cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
echo SCRATCH: $SCRATCH

# copy job files to $SCRATCH
cp * $SCRATCH
# start your job in $SCRATCH
cd $SCRATCH
mpirun -v -np $NSLOTS pmemd.MPI -O -i md.in -o md.out -p md.top -c md.rst

# copy your results back to $HOME & cleanup
cp * $cwd
#rm -rf $SCRATCH
</code>

==== Gaussian on multiple processors ====

This is an example SGE script to run g09 job on 8 processors. To request 8 CPUs you have to place this statement in your SGE script file: ''#$ -pe mpi 8'' This value must be the same as is the ''%NProcShared'' value in your Gaussian input file (''%NProcShared=8'').

<code>
#!/bin/bash
##set -xv
#$ -cwd
#$ -q main.q
# name of your job
#$ -N g09_job
#$ -m e
# name of SGE stdout and stderr files
#$ -e sge.err
#$ -o sge.out
# requesting 24hrs wall clock time (maximum is 48hrs)
#$ -l h_rt=24:00:00
# requesting 8 CPUs (this must be equal to %NProcShared in your Gaussian input file)
#$ -pe mpi 8

echo Running on host `hostname`
echo "SGE job id: $JOB_ID" 
echo Time is `date`
echo Directory is `pwd`
echo This job has allocated $NSLOTS processors.
cwd=`pwd`
# create a randomly named scratch directory and copy your files there
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
# copy job files to $SCRATCH
cp * $SCRATCH
cd $SCRATCH

module load gaussian

# start you g09 job (change the input/output file names for your job)
g09 < input_file >& output_file

# copy the results back to $HOME & cleanup
cp * $cwd
#rm -rf $SCRATCH
</code>


==== Glide docking job on 4 CPUs ====

To run a Schrodinger Glide docking job on 4 CPUs create first this driver script, named ''run_docking.sh'', with proper permissions (''chmod +x run_docking.sh''):

<code>
#!/bin/bash
#
# glide docking driver script
#
# rok 2014.9.10
set -xv
export SCHRODINGER_TEMP_PROJECT=$SCRATCH
export MAESTRO_TEMP_LOCATION=$SCRATCH
export SCHRODINGER_JOBDB2=$SCRATCH
export SCHRODINGER_TMPDIR=$SCRATCH
export SCHRODINGER_JOBDIR=$SCRATCH
export SCHRODINGER_BATCHID="$JOB_ID"
export SCHRODINGER_TASKID="$SGE_TASK_ID"
export SCHRODINGER_QUEUE_TMPDIR=$SCRATCH
export SCHRODINGER_MAX_RETRIES=0

export DONE=""

function finish() {
    echo "$(basename $0)  caught error on line : $1 command was: $2"
    $SCHRODINGER/jobcontrol -list -children
    $SCHRODINGER/jobcontrol -abort all
    $SCHRODINGER/jobcontrol -list -children
    $SCHRODINGER/utilities/jserver -info
    $SCHRODINGER/utilities/jserver -kill
    $SCHRODINGER/utilities/jserver -clean
    # copy your results back to a new directory in $HOME & cleanup
    outdir=$cwd.Results.$JOB_ID
    mkdir $outdir
    cp -a * $outdir
    export DONE=1
}
trap 'finish $LINENO $BASH_COMMAND; exit' SIGHUP SIGINT SIGQUIT SIGTERM SIGUSR1

GLIDE_OPTS="-NJOBS $NSLOTS -HOST localhost:$NSLOTS -LOCAL -WAIT -max_retries 0 -SUBLOCAL"


cat > dock.in <<EOF
GRIDFILE "$SCRATCH/Grid/Cluster2.zip"
LIGANDFILE "$SCRATCH/ligands_prepped.maegz"
EOF

$SCHRODINGER/glide $GLIDE_OPTS dock.in

</code>

Modify the paths as necessary and then submit this job via SGE using this SGE script:

<code>
#!/bin/bash
#
set -xv
#$ -cwd
#
#$ -N glide_dock
#$ -e sge.err
#$ -o sge.out
# requesting the 4 CPUs
#$ -q main.q
#$ -pe mpi 4
# requesting 24hrs wall clock time (48 hrs is max)
#$ -l h_rt=24:00:00

echo Running on host `hostname`
echo "SGE job id: $JOB_ID" 
echo Time is `date`
echo Current directory is `pwd`
echo This job has allocated $NSLOTS processor/s
export NSLOTS

module load schrodinger

cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
echo SCRATCH: $SCRATCH

# copy job files to $SCRATCH
cp -a * $SCRATCH

# start your job in $SCRATCH
cd $SCRATCH
./run_docking.sh

# clean the job, if still managed by SCHRD job control
$SCHRODINGER/jobcontrol -list -children
$SCHRODINGER/jobcontrol -abort all
$SCHRODINGER/jobcontrol -list -children
$SCHRODINGER/utilities/jserver -info
$SCHRODINGER/utilities/jserver -kill
$SCHRODINGER/utilities/jserver -clean

if [ -z "$DONE" ] ; then
    # copy your results back to a new directory in $HOME & cleanup
    outdir=$cwd.Results.$JOB_ID
    mkdir $outdir
    cp -a * $outdir
fi

#rm -rf $SCRATCH
</code>

<fc #FF0000>NOTE: This job will use 4 license tokens from a shared license pool. Since this is a shared resource among several groups please be considerate of other users and be conservative with token usage.</fc>

===== Running GPU jobs =====

All GPU jobs must be submitted to the ''gpu.q'' queue. Use the following statement in your SGE script to accomplish that:

  #$ -q gpu.q

An example of SGE script for running GPU jobs:

==== Amber ====

The optimal AMBER job configuration for Keck II is to use 1 CPU and 1 GPU per run.

<code>
#!/bin/bash
#
set -xv
#$ -cwd
#
#$ -N GPU_test_job
#$ -e sge.err
#$ -o sge.out
# requesting the GPU queue
#$ -q gpu.q
# requesting 12hrs wall clock time (48 hrs is max)
#$ -l h_rt=12:00:00

module load cuda/7.5.18
module load amber/16

export CUDA_VISIBLE_DEVICES=0

echo Running on host `hostname`
echo "SGE job id: $JOB_ID" 
echo Time is `date`
echo Current directory is `pwd`

cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
echo SCRATCH: $SCRATCH
# copy job files to $SCRATCH
cp * $SCRATCH

# start your job in $SCRATCH
cd $SCRATCH
pmemd.cuda -O -i md.in -o md.out -p md.top -c md.rst -r md2.rst -x md.netcdf

# copy your results back to $HOME & cleanup
cp * $cwd
#rm -rf $SCRATCH
</code>


==== Gromacs ====


Running Gromacs job on 2 CPUs and 1 GPU:

<code>
#!/bin/bash
set -xv
#$ -cwd
#$ -q gpu.q
#$ -V
#$ -l h_rt=1:00:00
#$ -N NAMD_GMX_461
#$ -M email@ucsd.edu
#$ -m abe
#$ -S /bin/bash

echo -n "Running on: "
hostname

# create a scratch directory and copy all runtime data there
export scratch_dir=`mktemp -d /scratch/${USER}.XXXXXX`
echo "Scratch directory: " $scratch_dir
current_dir=`pwd`
echo "Current directory: " $current_dir
cp -rpv * $scratch_dir
cd $scratch_dir

module load gromacs-cuda
export CUDA_VISIBLE_DEVICES=0

mdrun -v -nt 2 -pin on -maxh 1 -gpu_id 0 -o traj1.trr -x traj1.xtc

# copy all data back from the scratch directory
rsync -av * $current_dir
rm -rf $scratch_dir
</code>


==== NAMD ====

Running NAMD on 2 CPUs and one GPU is the optimal number of CPUs/GPUs for a typical NAMD job on KeckII workstations.

<code>
#!/bin/bash
set -xv
#$ -cwd
#$ -q gpu.q
#$ -V
#$ -N NAMD_job
#$ -e sge.err
#$ -o sge.out
# requesting 48hrs wall clock time (48 hrs is max)
#$ -l h_rt=48:00:00

module load namd-cuda/2.11
export CUDA_VISIBLE_DEVICES=0

cwd=`pwd`
# create a randomly named scratch directory
export SCRATCH=`mktemp -d /scratch/${USER}.XXXXXX`
# copy job files to $SCRATCH
cp * $SCRATCH

# start your job in $SCRATCH
cd $SCRATCH
# 2 CPUs/1 GPU
namd2 +idlepoll +p2 +devices 0 apoa1.namd >& apoa1-2.1.out

# copy all data back from the scratch directory & cleanup
cp * $cwd
rm -rf $SCRATCH
</code>


==== Benchmarks ====

These are several GPU benchmarks for CUDA enabled Amber and NAMD which should help you to estimate the Keck Center hardware performance.


=== Amber12 (pmemd.cuda built with Intel 12.1.5, CUDA 4.2.9) (1CPU, 1GPU) ===

  * JAC_production_NPT 60.26 ns/day
  * JAC_production_NVE 74.55 ns/day

JAC/DHFR [[http://ambermd.org/gpus/benchmarks.htm|system]]: 23,558 atoms, PME


== Gromacs (gromacs 4.6.1 built with gcc and CUDA 5.0) (8CPUs, 1GPU) ===

128 DPPC Lipids, 5841 water molecules (~45 waters/lipid), CHARMM36 force field, TIP3P water molecules.
Total: 34163 atoms

Performance: 31.62 ns/day

Courtesy of Jesper Sørensen


=== NAMD 2.9 (CUDA enabled namd2 built with Intel 12.1.5, CUDA 4.2.9) ===

  * ApoA1 (2CPUs, 1GPU) 1.46 ns/day
  * ApoA1 (4CPUs, 1GPU) 1.84 ns/day

ApoA1 [[https://www-s.ks.uiuc.edu/Research/namd/performance.html|system]]: 92,224 atoms, 12A cutoff + PME every 4 steps, periodic