Difference between revisions of "Moab"
From Montana Tech High Performance Computing
(Created page with "Adaptive Computing's Moab [http://docs.adaptivecomputing.com/mwm/help.htm] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab s...") |
|||
Line 1: | Line 1: | ||
+ | |||
Adaptive Computing's Moab [http://docs.adaptivecomputing.com/mwm/help.htm] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [http://docs.adaptivecomputing.com/mwm/help.htm#topics/moabCommands/user-cmds.html%3FTocPath%3D4.0%20Scheduler%20Commands|_____5], described below. | Adaptive Computing's Moab [http://docs.adaptivecomputing.com/mwm/help.htm] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [http://docs.adaptivecomputing.com/mwm/help.htm#topics/moabCommands/user-cmds.html%3FTocPath%3D4.0%20Scheduler%20Commands|_____5], described below. | ||
Revision as of 15:40, 15 September 2017
Adaptive Computing's Moab [1] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [2], described below.
Contents
Submitting Jobs with msub
A job is created by submitting an executable script to the Moab Workload Manager with msub [3]. The msub documentation describes a variety of command line arguments for requesting resources, declaring the job name, specifying the priority or destination queue, defining the mail options, etc.. The script contains the commands that will be executed on the compute node assigned by Moab/TORQUE for the job. For jobs that request multiple nodes, the script will run on a single node and should contain the commands necessary to utilize all the processors assigned to the job. An example of an MPI job script is below. The job scripts can contain PBS directives that replace the need to use the msub command line arguments.
Requesting Resources
There are 22 compute nodes with 32 processors per node in the cluster. If no resources are requested, then a single processor on a node will be assigned. Use the -l flag to request resources [4]. For example, "msub -l nodes=4" will allocate 1 processor on each of four nodes for the job, because the default is to assign 1 processor per node requested. To request all the processors on a node, use ppn=32 (i.e., msub -l nodes=4:ppn=32). Other resources that are often requested memory size and walltime.
Examples
Interactive Job
To run a program interactively on a compute node:
- msub -I
If you want to request a specific node, use the -l option with the resource request:
- msub -I -l nodes=n9
(Note Moab simply calls Torque's qsub -I for interactive jobs. Moab is currently experiencing communication problems with Torque, so using qsub instead is okay).
Script without PBS directives
A script does not require PBS directives. For instances a simple testjob script to print the host name and ping the management node would contain:
- #!/bin/sh
- hostname
- ping -c 30 scyld
To request 2 nodes and 4 processors per node with a mail message when the job ends, the command line would look like:
- msub testjob -l nodes=2:ppn=4 -m e -M username@mtech.edu
An output file will be created that contains the hostname that the script ran on and the output from pinging the management node for 30 seconds.
Script with PBS directives
Since scripts are normally submitted several times, it is more convenient to include the msub options in the script file as PBS directives. The previous testjob script would become:
- #!/bin/sh
- #PBS -l nodes=2:ppn=4
- #PBS -N PingJob
- #PBS -m e
- #PBS -M username@mtech.edu
- #PBS -l walltime=00:01:00
- cd $PBS_O_WORKDIR
- hostname
- pwd
- ping -c 30 scyld
The job is now simply submitted with:
- msub testjob
Another example is using R to read/write file data:
- #!/bin/sh
- #PBS -l nodes=1:ppn=32
- #PBS -N PingJob
- #PBS -m e
- #PBS -M username@mtech.edu
- #PBS -l walltime=00:01:00
- cd $PBS_O_WORKDIR
- module load R/3.1.0
- R < parLapply_test.R > parLapply_test.output --no-save
If for some reason your job is submitted from one directory for data and programs in another directory, the working directory can be specified in the script with the -d flag.
- #!/bin/sh
- #PBS -l nodes=2:ppn=4
- #PBS -N PingJob
- #PBS -d /home/mtech/username/working_dir
- #PBS -m e
- #PBS -M username@mtech.edu
- #PBS -l walltime=00:01:00
- hostname
- ping -c 30 scyld
Memory Resources
To allocate the correct amount of memory for a job, a user should specify how much memory the job will need. This can be done on command line or with a PBS directive:
- #PBS -l mem=16gb
The above will allocate 16 GB for a job to be split by the number of processes or tasks assigned to the job. If one node with ppn=4 is requested, then each process will get 4 GB. If only one processor (ppn=1) is requested, then it would get all 16 GB. Moab will assign the job to a node that has at least 16 GB free. A hard limit of 1.1 is set so that if a process exceeds its requested amount by 10% for more than one minute, it will be cancelled.
There are 5 nodes with 128 GB of memory. These nodes can be accessed by requesting the memnode feature:
- #PBS -l feature=memnode
Script for MPI job
Applications that use MPI require slightly more sophisticated scripts that set the shell and MPI version, identifies the compute nodes allocated for the job, and initiates the mpd daemons on the assigned compute nodes. An example for MPICH2:
- #!/bin/bash
- #PBS -l nodes=4:ppn=32
- #PBS -N MPIJob
- #PBS -d /home/mtech/username
- #PBS -S /bin/bash
- #PBS -m e
- #PBS -M username@mtech.edu
- #PBS -l walltime=00:10:00
- MPDHOSTS=mpd.hosts.$PBS_JOBID
- sort -u $PBS_NODEFILE > $MPDHOSTS
- NODES=`cat $MPDHOSTS | wc -l `
- NPROCS=`cat $PBS_NODEFILE | wc -l`
- echo "NODES=$NODES"
- echo "NPROCS=$NPROCS"
- module load mpich2/gnu
- mpirun -np $NPROCS --hostfile $MPDHOSTS mympiapp
- rm $MPDHOSTS
InfiniBand with OpenMPI
By default the 1 Gig eth network is used. To specify that OpenMPI uses the InfiniBand network, include --mca btl openib,sm,self :
- module load openmpi/gnu
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64
- mpirun --mca btl openib,sm,self -np $NPROCS --hostfile $MPDHOSTS mympiapp
Monitoring jobs with showq and checkjob
showq will show the status of your jobs and the number of nodes in use. For more details including nodes assigned use showq -r
To get information on an individual job, use the checkjob command [5]. The checkjob -v gives more verbose information on the job.
To check status of all nodes for availability mdiag -n
Canceling Jobs
To terminate a job that is currently running or in the queue, use mjobctl -c[6] command. The canceljob [7] can also be used, but it is deprecated.
Admin Notes
Setting some default parameters can be done in Torque and Moab, the Moab settings take precedence.
To view, set, and unset parameters set in Torque for the batch queue:
- qmgr -c "list queue batch"
- qmgr -c "set queue batch resources_default.walltime=3600"
- qmgr -c "unset queue batch resources_default.walltime"
Edit moab.cfg with:
- CLASSCFG[DEFAULT] DEFAULT.WCLIMIT=3600
Changing default memory allocation was unsuccessful in Moab. In Torque:
- qmgr -c "set queue batch resources_default.mem=4gb"
will set the total memory allocation for a job. The amount assigned for each process will be its proportional share of the total. If ppn=4, then each process will get 1 gb for this example. Note that the resources_assigned.mem = 4294967296b will automatically get set. It does not look like setting these in Torque enforces any memory restrictions to jobs.