KeckII

General Info
KeckII System News and Updates
Fees for use of the facility
Obtaining an Account
Reserving the KeckII center
Support
Publications

Resources For Users
KeckII wiki
Cluster status
FAQ
HOW-TOs
Hardware
Software
Policies

 
UCSD
 

HOW-TO: Sun GridEngine (SGE) on the KeckII Cluster


Introduction to SGE
Writing and Submitting Batch Jobs
Monitoring and Controlling Jobs
Site Scheduling Policies
Sample SGE scripts

Introduction to SGE

Sun Grid Engine has a large set of programs that let the user submit/delete jobs, check job status, and have information about available queues and environments. For the normal user the knowledge of the following basic commands should be sufficient to get started with Grid Engine and have full control of his jobs:

qconfShows (-s) the user the configurations and access permissions only. For example qconf -sql will give you a list of all available queues.
qdelGives the user the ability to delete his own jobs only.
qhostDisplays status information about Sun Grid Engine execution hosts.
qmodModify the status of your jobs (like suspend/resume)
qmonProvides the X-windows GUI command interface.
qstatProvides a status listing of all jobs and queues associated with the cluster.
qsubIs the user interface for submitting a job to Grid Engine.


Writing and Submitting Batch Jobs

To run a job with grid engine you have to submit it from the command line or the GUI. But first, you have to write a batch script file that contains all the commands and environment requests that you want for this job. If, for example, test.sh is the name of the script file, then use the command ``qsub'' to submit the job:

qsub test.sh

And, if the submission of the job is successful, you will see this message:

your job 1 (``test.sh'') has been submitted.

After that, you can monitor the status of your job with the command ``qstat'' or the GUI QMON.

When the job is finished you will have two output files called "test.sh.o1" and "test.sh.e1".

In Grid Engine, it is a batch script that contains additionally to normal UNIX command special comments lines defined by the leading prefix ``#$''.

The first line of the batch file starts with

#$/bin/csh

which is default shell interpreter for Grid Engine. But you can force Grid Engine to use your preferred shell interpreter (bash for example) by adding this line at your script file

#$ -S /bin/bash

to tell GE to run the job from the current working directory add this script line

#$ -cwd

if you want to pass some environment variable VAR (or a list of variables separated by commas) use the -v option like this

#$ -v VAR (#$ -V passes all variables listed in env).

Insert the full path name of the files to which you want to redirect the standard output/error respectively.

#$ -o <path_name>

#$ -e <path_name>

The prefix #$ has many options and is used the same way you use qsub, so check qsub man pages to take a look at those options.

Here is a serial sample script that has to be modified to fit your case:


#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
#$ -M myemail
#$ -e error_file
#$ -o output_file
date
sleep 10
date

Insert you email-address after the "#$ -M", and also insert the full path name of the files to which you want to redirect the standard output/error. after the "#$ -o" (the "#$ -e") statement, respectively.

Note that that qsub accepts shell scripts only, not executable files, and also that shell scripts need to be executable, if it's not the case run the command

chmod u+rwx serial.sh

And after that, to submit the job you simply type

qsub serial.sh

And, from the command line you can use the same options and type:

qsub -cwd -v VAR -o /home/user -e /home/user serial.sh

An example of parallel job using 2 processors:


#!/bin/bash
#
#$ -pe mpi 2
#$ -cwd
#$ -j y
#$ -S /bin/bash
#

/opt/mpich/gnu/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines \
  -nolocal /opt/hpl/gnu/bin/xhp

To actually submit this parallel job:

qsub test.sh

Note: In order to submit your jobs on the cluster your account must be set up for password-less ssh login to the nodes. To do this perform the following on erikson.ucsd.edu:

cd $HOME
ssh-keygen -t rsa1 -N ""  -f $HOME/.ssh/identity
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
cd .ssh
touch authorized_keys authorized_keys2
cat identity.pub >> authorized_keys
cat id_rsa.pub id_dsa.pub >> authorized_keys2
chmod 640 authorized_keys authorized_keys2

Monitoring and Controlling Jobs

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

  • qstat: Displays list of all jobs with no queue status information.

  • qstat -u hpc1***: Displays list of all jobs belonging to user hpc1***

  • qstat -f: gives full information about jobs and queues.

  • qstat -j [job_id]: Gives the reason why the pending job (if any) is not being scheduled.

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh

How do I control my jobs

Based on the status of the job displayed, you can control the job by the following actions:

  • Modify a job: As a user, you have certain rights that apply exclusively to your jobs. The Grid Engine command line used is qmod. Check the man pages for the options that you are allowed to use.

  • Suspend/(or Resume) a job: This uses the UNIX kill command, and applies only to running jobs, in practice you type

    qmod -s/(or-r)job_id (where job_id is given by qstat or qsub).

  • Delete a job: You can delete a job that is running or spooled in the queue by using the qdel command like this

    qdel job_id (where job_id is given by qstat or qsub).

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.


For further information, see the SGE User's Guide ( PDF, HTML).

Site Scheduling Policies

Note: these policies may be changed any time, please check this page for updates

To see current SGE queue settings execute q_settings on erikson.ucsd.edu (the cluster frontend).

  • The maximum walltime is set to 48 hours (2 days). Default is 30mins, you can change this limit with:
    #$ -l h_rt=XX:00:00
    
  • Maximum number of processors per user is 48.
  • There is 1 node (2 processors, 4GB RAM) dedicated for debugging runs. The hard wall clock time limit there is 30min. To use these two processors just request maximum 30min wallclock time and your job will be scheduled there:
    #$ -l h_rt=00:30:00
    

    You can also access this node through qrsh facility. Just submit the following command: qrsh -l h_rt=00:29:59 and you should immediately get a prompt back on the debugging node. You can use the node as you would frontend with addition that you can run there interactive jobs for up to 30min.
  • Parallel jobs (i.e., requesting more than 1 CPU) will have higher priority than serial, single CPU jobs. So if there are several serial jobs in the queue and a parallel job is submitted to the queue this parallel job will most likely skip the queued serial jobs and will be scheduled ahead of them. This policy is setup to encourage parallel job submission on the cluster.
  • Jobs requiring large amount of memory can request a large memory node with this statement:

    #$ -l mem_free=1G

    This will guarantee that the job will be sent to one of the 2GB RAM nodes. To check how much free memory is available per node use

    qhost -F mem_free

    and to see a list of nodes which will be considered for large memory job:
    qhost -l mem_free=1G
    

  • There is also defined parallel environment mpi-uni which guarantees that only 1 processor per node will be assigned, leaving the other processor idle. This can be used for jobs which require large memory or heavy IO resources. This can be requested for example with:
    #$ -pe mpi-uni 2
    
  • Part of the cluster or even all nodes can be reserved for a user or group - if there is justified need for this. This will be determined on a case by case basis. If you would like to reserve any of the Keck II resources please contact keck-help.

Note: If you're generating and submitting several SGE scripts on the fly please make sure the individual scripts are submitted with at least 10-15 sec pause between them (e.g., use sleep 20 in your submitting loop). Submitting a lot of SGE scripts at the same time puts a significant strain on SGE resources.

If you have any questions or concerns about these policies please contact keck-help (keck-help @ keck2.ucsd.edu)

Sample SGE scripts

  1. An example of simple APBS serial job.

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N serial_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    
    /soft/linux/pkg/apbs/bin/apbs inputfile >& outputfile
    
    


  2. An example script for running executable a.out in parallel on 8 CPUs. (Note: For your executable to run in parallel it must be compiled with parallel library like MPICH, LAM/MPI, PVM, etc.) This script shows file staging, i.e., using fast local filesystem /scratch on the compute node in order to eliminate speed bottlenecks.

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N parallel_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    #$ -pe mpi 8
    # requesting 10hrs wall clock time
    #$ -l h_rt=10:00:00
    #
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    set orig_dir=`pwd`
    echo This job runs on the following processors:
    cat $TMPDIR/machines
    echo This job has allocated $NSLOTS processors
    
    # copy input and support files to a temporary directory on compute node
    set temp_dir=/scratch/`whoami`.$$
    mkdir $temp_dir
    cp input_file support_file $temp_dir
    cd $temp_dir
    
    /opt/mpich/intel/bin/mpirun -v -machinefile $TMPDIR/machines \
               -nolocal -np $NSLOTS $HOME/a.out ./input_file >& output_file
    
    # copy files back and clean up
    cp * $orig_dir
    rm -rf $temp_dir
    
    


  3. An example of SGE script for Amber users (parallel run, 4 CPUs, with input file generated on the fly):

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N amber_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    #$ -pe mpi 4
    # requesting 6hrs wall clock time
    #$ -l h_rt=6:00:00
    #
    setenv MPI_MAX_CLUSTER_SIZE 2
    
    # export all environment variables to SGE 
    #$ -V
    
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    echo This job runs on the following processors:
    cat $TMPDIR/machines
    echo This job has allocated $NSLOTS processors
    
    set in=./mdin
    set out=./mdout
    set crd=./inpcrd.equil
    
    cat <<eof > $in
     short md, nve ensemble
     &cntrl
       ntx=7, irest=1,
       ntc=2, ntf=2, tol=0.0000001,
       nstlim=1000,
       ntpr=10, ntwr=10000,
       dt=0.001, vlimit=10.0,
       cut=9.,
       ntt=0, temp0=300.,
     &end
     &ewald
      a=62.23, b=62.23, c=62.23,
      nfft1=64,nfft2=64,nfft3=64,
      skinnb=2.,
     &end
    eof
    
    set sander=/soft/linux/pkg/amber8/exe.parallel/sander
    set mpirun=/opt/mpich/intel/bin/mpirun
    
    # needs prmtop and inpcrd.equil files
    
    $mpirun -v -machinefile $TMPDIR/machines -nolocal -np $NSLOTS \
       $sander -O -i $in -c $crd -o $out < /dev/null
    
    /bin/rm -f $in restrt
    
    

    Please note that if you are running parallel amber8 you must include the following in your .cshrc:

    # Set P4_GLOBMEMSIZE environment variable used to reserve memory in bytes
    # for communication with shared memory on dual nodes
    # (optimum/minimum size may need experimentation)
    setenv P4_GLOBMEMSIZE 32000000
    


  4. An example of SGE script for APBS job (parallel run, 8 CPUs, running example input file which is included in APBS distribution (/soft/linux/src/apbs-0.3.1/examples/actin-dimer):

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N apbs-PARALLEL
    #$ -e apbs-PARALLEL.errout
    #$ -o apbs-PARALLEL.errout
    #
    # requesting 8 processors
    #$ -pe mpi 8
    
    echo -n "Running on: "
    hostname
    
    setenv APBSBIN_PARALLEL /soft/linux/pkg/apbs/bin/apbs-icc-parallel
    setenv MPIRUN /opt/mpich/intel/bin/mpirun
    
    echo "Starting apbs-PARALLEL calculation ..."  
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np 8 -nolocal \
        $APBSBIN_PARALLEL apbs-PARALLEL.in >& apbs-PARALLEL.out
    
    echo "Done."
    
    


  5. An example of SGE script for paralell CHARMM job (4 processors):

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N charmm-test
    #$ -e charmm-test.errout
    #$ -o charmm-test.errout
    #
    # requesting 4 processors
    #$ -pe mpi 4
    # requesting 2hrs wall clock time
    #$ -l h_rt=2:00:00
    #
    
    echo -n "Running on: "
    hostname
    
    setenv CHARMM /soft/linux/pkg/c31a1/bin/charmm.parallel.092204
    setenv MPIRUN /soft/linux/pkg/mpich-1.2.6/intel/bin/mpirun
    
    echo "Starting CHARMM calculation (using $NSLOTS processors)"
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS -nolocal \
        $CHARMM < mbcodyn.inp > mbcodyn.out
    
    echo "Done."
    
    


  6. An example of SGE script for paralell NAMD job (8 processors):

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N namd-job
    #$ -e namd-job.errout
    #$ -o namd-job.errout
    #
    # requesting 8 processors
    #$ -pe mpi 8
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    #
    
    echo -n "Running on: "
    hostname
    
    /soft/linux/pkg/NAMD/namd2.sh namd_input_file > namd2.log
    
    echo "Done."
    
    


  7. An example of SGE script for serial Gaussian03 job:

    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N g03_test_job
    #$ -m e
    #$ -e g03_sge.err
    #$ -o g03_sge.out
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    
    setenv g03root /soft/linux/pkg
    #setenv GAUSS_SCRDIR /scratch
    source $g03root/g03/bsd/g03.login
    
    echo -n "Running on: "
    hostname
    
    g03 < test001.com > test001.out
    
    echo "Done."
    
    
    


Please direct any questions or comments to keck-help @ keck2.ucsd.edu