Cluster computing, getting started
A few notes about using the research cluster to run lots of jobs. We have machines
that are situated in both CS and in ECE, therefore you need an account in both
CS and ECE. Please contact one of the faculty (Lebeck or Sorin) to get these
accounts created and added to the group dukearch.
- First read the man page on qsub "man qsub". If it says the manpage
is not found then you need to specify the path to the manpages. On the CS
research machine nicl.cs.duke.edu you can type "man -M /usr/research/admin/sge/current/man
qsub" and that should find the man page for qsub. For those in the know,
you can add this to your $MANPATH environment variable...
- The directory that you plan to write output to from the batch system must
be in either /usr/research/arch or in /usr/research/arch1 and it must be group
dukearch with group write permissions. To accomplish this use the following
commands:
chgrp dukearch <directory>
chmod g+w <directory>
- To submit jobs you must be logged into either nicl.cs.duke.edu with your
CS research account or into platypus.ee.duke.edu with your ECE account.
- The command for submitting jobs is qsub. You must have a script file as
an input argument to qsub, type qsub -help to see the list of available arguments.
The -cwd argument is very useful, it tells qsub to use the "current working
directory" as the place to search for binaries, input files and to write
output files.
- Most people find it useful to use scripts to submit a large number of jobs.
You can write this script in your favorite scripting languate (csh, perl,
etc.) Below is a csh script that submits 10 instances of a job to the cluster.
#!/bin/csh
# Example to submit 10 jobs of hello world
foreach i (0 1 2 3 4 5 6 7 8 9)
qsub -cwd -N hello_$i run_hello
end
- The job submitted is specified in the script file "run_hello"
which resides in the current working directory. The contents of "run_hello"
are below:
#
hello
- hello is a linux/x86 binary in the current working directory. For a good
test, simply change hello to "date" which will simply print the
date and time to standard out.