Difference between revisions of "Moab"
From Montana Tech High Performance Computing
(Created page with "Adaptive Computing's Moab [http://docs.adaptivecomputing.com/mwm/help.htm] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab s...") |
(→Script for MPI job) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Adaptive Computing's Moab [http://docs.adaptivecomputing.com/mwm/help.htm] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [http://docs.adaptivecomputing.com/mwm/help.htm#topics/moabCommands/user-cmds.html%3FTocPath%3D4.0%20Scheduler%20Commands|_____5], described below. | Adaptive Computing's Moab [http://docs.adaptivecomputing.com/mwm/help.htm] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [http://docs.adaptivecomputing.com/mwm/help.htm#topics/moabCommands/user-cmds.html%3FTocPath%3D4.0%20Scheduler%20Commands|_____5], described below. | ||
− | |||
==Submitting Jobs with msub== | ==Submitting Jobs with msub== | ||
A job is created by submitting an executable script to the Moab Workload Manager with '''msub''' [http://docs.adaptivecomputing.com/mwm/help.htm#commands/msub.html]. The msub documentation describes a variety of command line arguments for requesting resources, declaring the job name, specifying the priority or destination queue, defining the mail options, etc.. The script contains the commands that will be executed on the compute node assigned by Moab/TORQUE for the job. For jobs that request multiple nodes, the script will run on a single node and should contain the commands necessary to utilize all the processors assigned to the job. An example of an MPI job script is below. The job scripts can contain PBS directives that replace the need to use the '''msub''' command line arguments. | A job is created by submitting an executable script to the Moab Workload Manager with '''msub''' [http://docs.adaptivecomputing.com/mwm/help.htm#commands/msub.html]. The msub documentation describes a variety of command line arguments for requesting resources, declaring the job name, specifying the priority or destination queue, defining the mail options, etc.. The script contains the commands that will be executed on the compute node assigned by Moab/TORQUE for the job. For jobs that request multiple nodes, the script will run on a single node and should contain the commands necessary to utilize all the processors assigned to the job. An example of an MPI job script is below. The job scripts can contain PBS directives that replace the need to use the '''msub''' command line arguments. | ||
===Requesting Resources=== | ===Requesting Resources=== | ||
− | There are 22 compute nodes with 32 processors per node in the cluster. If no resources are requested, then a single processor on a node will be assigned. Use the -l flag to request resources [http://www.adaptivecomputing.com/resources/docs/torque/4-1-4/help.htm#topics/2-jobs/requestingRes.htm]. For example, | + | There are 22 compute nodes with 32 processors per node in the cluster. If no resources are requested, then a single processor on a node will be assigned. Use the -l flag to request resources [http://www.adaptivecomputing.com/resources/docs/torque/4-1-4/help.htm#topics/2-jobs/requestingRes.htm]. For example, <code>msub -l nodes=4</code> will allocate 1 processor on each of four nodes for the job, because the default is to assign 1 processor per node requested. To request all the processors on a node, use ppn=32 (i.e., msub -l nodes=4:ppn=32). Other resources that are often requested memory size and walltime. |
===Examples=== | ===Examples=== | ||
Line 11: | Line 10: | ||
To run a program interactively on a compute node: | To run a program interactively on a compute node: | ||
− | : msub -I | + | : <code>msub -I</code> |
− | If you want to request a specific node, use the -l option with the resource request: | + | If you want to request a specific node, use the <code>-l</code> option with the resource request: |
− | : msub -I -l nodes=n9 | + | : <code>msub -I -l nodes=n9</code> |
− | |||
− | |||
====Script without PBS directives==== | ====Script without PBS directives==== | ||
A script does not require PBS directives. For instances a simple testjob script to print the host name and ping the management node would contain: | A script does not require PBS directives. For instances a simple testjob script to print the host name and ping the management node would contain: | ||
− | : #!/bin/ | + | : <code style=display:block>#!/bin/shhostname<br>ping -c 30 scyld</code> |
− | |||
− | |||
To request 2 nodes and 4 processors per node with a mail message when the job ends, the command line would look like: | To request 2 nodes and 4 processors per node with a mail message when the job ends, the command line would look like: | ||
− | : msub testjob -l nodes=2:ppn=4 -m e -M username@mtech.edu | + | : <code>msub testjob -l nodes=2:ppn=4 -m e -M username@mtech.edu</code> |
An output file will be created that contains the hostname that the script ran on and the output from pinging the management node for 30 seconds. | An output file will be created that contains the hostname that the script ran on and the output from pinging the management node for 30 seconds. | ||
Line 35: | Line 30: | ||
Since scripts are normally submitted several times, it is more convenient to include the msub options in the script file as PBS directives. The previous testjob script would become: | Since scripts are normally submitted several times, it is more convenient to include the msub options in the script file as PBS directives. The previous testjob script would become: | ||
− | : #!/bin/sh | + | : <code style=display:block>#!/bin/sh<br>#PBS -l nodes=2:ppn=4<br>#PBS -N PingJob<br>#PBS -m e<br>#PBS -M username@mtech.edu<br>#PBS -l walltime=00:01:00<br>cd $PBS_O_WORKDIR<br>hostname<br>pwd<br>ping -c 30 scyld</code> |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
The job is now simply submitted with: | The job is now simply submitted with: | ||
− | : msub testjob | + | : <code>msub testjob</code> |
Another example is using R to read/write file data: | Another example is using R to read/write file data: | ||
− | : #!/bin/sh | + | : <code style=display:block>#!/bin/sh#PBS -l nodes=1:ppn=32<br>#PBS -N PingJob<br>#PBS -m e<br>#PBS -M username@mtech.edu<br>#PBS -l walltime=00:01:00<br><br>cd $PBS_O_WORKDIR<br>module load R/3.1.0<br>R < parLapply_test.R > parLapply_test.output --no-save</code> |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
If for some reason your job is submitted from one directory for data and programs in another directory, the working directory can be specified in the script with the -d flag. | If for some reason your job is submitted from one directory for data and programs in another directory, the working directory can be specified in the script with the -d flag. | ||
− | : #!/bin/sh | + | : <code style=display:block>#!/bin/sh<br>#PBS -l nodes=2:ppn=4<br>#PBS -N PingJob<br>#PBS -d /home/mtech/username/working_dir<br>#PBS -m e<br>#PBS -M username@mtech.edu<br>#PBS -l walltime=00:01:00<br><br>hostname<br>ping -c 30 scyld</code> |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
====Memory Resources==== | ====Memory Resources==== | ||
To allocate the correct amount of memory for a job, a user should specify how much memory the job will need. This can be done on command line or with a PBS directive: | To allocate the correct amount of memory for a job, a user should specify how much memory the job will need. This can be done on command line or with a PBS directive: | ||
− | : #PBS -l mem=16gb | + | : <code>#PBS -l mem=16gb</code> |
The above will allocate 16 GB for a job to be split by the number of processes or tasks assigned to the job. If one node with ppn=4 is requested, then each process will get 4 GB. If only one processor (ppn=1) is requested, then it would get all 16 GB. Moab will assign the job to a node that has at least 16 GB free. A hard limit of 1.1 is set so that if a process exceeds its requested amount by 10% for more than one minute, it will be cancelled. | The above will allocate 16 GB for a job to be split by the number of processes or tasks assigned to the job. If one node with ppn=4 is requested, then each process will get 4 GB. If only one processor (ppn=1) is requested, then it would get all 16 GB. Moab will assign the job to a node that has at least 16 GB free. A hard limit of 1.1 is set so that if a process exceeds its requested amount by 10% for more than one minute, it will be cancelled. | ||
Line 81: | Line 51: | ||
There are 5 nodes with 128 GB of memory. These nodes can be accessed by requesting the memnode feature: | There are 5 nodes with 128 GB of memory. These nodes can be accessed by requesting the memnode feature: | ||
− | : #PBS -l feature=memnode | + | : <code>#PBS -l feature=memnode</code> |
====Script for MPI job==== | ====Script for MPI job==== | ||
Applications that use MPI require slightly more sophisticated scripts that set the shell and MPI version, identifies the compute nodes allocated for the job, and initiates the mpd daemons on the assigned compute nodes. An example for MPICH2: | Applications that use MPI require slightly more sophisticated scripts that set the shell and MPI version, identifies the compute nodes allocated for the job, and initiates the mpd daemons on the assigned compute nodes. An example for MPICH2: | ||
− | : #!/bin/bash | + | : <code style=display:block>#!/bin/bash<br>#PBS -l nodes=4:ppn=32<br>#PBS -N MPIJob<br>#PBS -d /home/mtech/username<br>#PBS -S /bin/bash<br>#PBS -m e<br>#PBS -M username@mtech.edu<br>#PBS -l walltime=00:10:00<br><br>MPDHOSTS=mpd.hosts.$PBS_JOBID<br>sort -u $PBS_NODEFILE > $MPDHOSTS<br>NODES=`cat $MPDHOSTS | wc -l `<br>NPROCS=`cat $PBS_NODEFILE | wc -l`<br>echo "NODES=$NODES"<br>echo "NPROCS=$NPROCS"<br>module load mpich2/gnu<br>mpirun -np $NPROCS --hostfile $MPDHOSTS mympiapp<br>rm $MPDHOSTS</code> |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
====InfiniBand with OpenMPI==== | ====InfiniBand with OpenMPI==== | ||
By default the 1 Gig eth network is used. To specify that OpenMPI uses the InfiniBand network, include --mca btl openib,sm,self : | By default the 1 Gig eth network is used. To specify that OpenMPI uses the InfiniBand network, include --mca btl openib,sm,self : | ||
− | : module load openmpi/gnu | + | : <code>module load openmpi/gnu</code> |
− | : export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64 | + | : <code>export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64</code> |
− | : mpirun --mca btl openib,sm,self -np $NPROCS --hostfile $MPDHOSTS mympiapp | + | : <code>mpirun --mca btl openib,sm,self -np $NPROCS --hostfile $MPDHOSTS mympiapp</code> |
==Monitoring jobs with showq and checkjob== | ==Monitoring jobs with showq and checkjob== | ||
− | + | <code>showq</code> will show the status of your jobs and the number of nodes in use. For more details including nodes assigned use <code>showq -r</code> | |
− | To get information on an individual job, use the | + | To get information on an individual job, use the <code>checkjob</code> command [http://docs.adaptivecomputing.com/mwm/help.htm#commands/checkjob.html%3FTocPath%3D4.0%20Scheduler%20Commands|Commands|_____1]. The <code>checkjob -v</code> gives more verbose information on the job. |
− | To check status of all nodes for availability | + | To check status of all nodes for availability <code>mdiag -n</code> |
==Canceling Jobs== | ==Canceling Jobs== | ||
− | To terminate a job that is currently running or in the queue, use | + | To terminate a job that is currently running or in the queue, use <code>mjobctl -c</code>[http://docs.adaptivecomputing.com/mwm/help.htm#commands/mjobctl.html#cancel] command. The <code>canceljob</code> [http://docs.adaptivecomputing.com/mwm/help.htm#commands/canceljob.html] can also be used, but it is deprecated. |
==Admin Notes== | ==Admin Notes== | ||
Line 126: | Line 79: | ||
To view, set, and unset parameters set in Torque for the batch queue: | To view, set, and unset parameters set in Torque for the batch queue: | ||
− | : qmgr -c "list queue batch" | + | : <code>qmgr -c "list queue batch"</code> |
− | : qmgr -c "set queue batch resources_default.walltime=3600" | + | : <code>qmgr -c "set queue batch resources_default.walltime=3600"</code> |
− | : qmgr -c "unset queue batch resources_default.walltime" | + | : <code>qmgr -c "unset queue batch resources_default.walltime"</code> |
Edit moab.cfg with: | Edit moab.cfg with: | ||
− | : CLASSCFG[DEFAULT] DEFAULT.WCLIMIT=3600 | + | : <code>CLASSCFG[DEFAULT] DEFAULT.WCLIMIT=3600</code> |
Changing default memory allocation was unsuccessful in Moab. In Torque: | Changing default memory allocation was unsuccessful in Moab. In Torque: | ||
− | : qmgr -c "set queue batch resources_default.mem=4gb" | + | : <code>qmgr -c "set queue batch resources_default.mem=4gb"</code> |
will set the total memory allocation for a job. The amount assigned for each process will be its proportional share of the total. If ppn=4, then each process will get 1 gb for this example. Note that the resources_assigned.mem = 4294967296b will automatically get set. It does not look like setting these in Torque enforces any memory restrictions to jobs. | will set the total memory allocation for a job. The amount assigned for each process will be its proportional share of the total. If ppn=4, then each process will get 1 gb for this example. Note that the resources_assigned.mem = 4294967296b will automatically get set. It does not look like setting these in Torque enforces any memory restrictions to jobs. |
Latest revision as of 09:39, 22 September 2017
Adaptive Computing's Moab [1] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [2], described below.
Contents
Submitting Jobs with msub
A job is created by submitting an executable script to the Moab Workload Manager with msub [3]. The msub documentation describes a variety of command line arguments for requesting resources, declaring the job name, specifying the priority or destination queue, defining the mail options, etc.. The script contains the commands that will be executed on the compute node assigned by Moab/TORQUE for the job. For jobs that request multiple nodes, the script will run on a single node and should contain the commands necessary to utilize all the processors assigned to the job. An example of an MPI job script is below. The job scripts can contain PBS directives that replace the need to use the msub command line arguments.
Requesting Resources
There are 22 compute nodes with 32 processors per node in the cluster. If no resources are requested, then a single processor on a node will be assigned. Use the -l flag to request resources [4]. For example, msub -l nodes=4
will allocate 1 processor on each of four nodes for the job, because the default is to assign 1 processor per node requested. To request all the processors on a node, use ppn=32 (i.e., msub -l nodes=4:ppn=32). Other resources that are often requested memory size and walltime.
Examples
Interactive Job
To run a program interactively on a compute node:
-
msub -I
If you want to request a specific node, use the -l
option with the resource request:
-
msub -I -l nodes=n9
Script without PBS directives
A script does not require PBS directives. For instances a simple testjob script to print the host name and ping the management node would contain:
-
#!/bin/shhostname
ping -c 30 scyld
To request 2 nodes and 4 processors per node with a mail message when the job ends, the command line would look like:
-
msub testjob -l nodes=2:ppn=4 -m e -M username@mtech.edu
An output file will be created that contains the hostname that the script ran on and the output from pinging the management node for 30 seconds.
Script with PBS directives
Since scripts are normally submitted several times, it is more convenient to include the msub options in the script file as PBS directives. The previous testjob script would become:
-
#!/bin/sh
#PBS -l nodes=2:ppn=4
#PBS -N PingJob
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:01:00
cd $PBS_O_WORKDIR
hostname
pwd
ping -c 30 scyld
The job is now simply submitted with:
-
msub testjob
Another example is using R to read/write file data:
-
#!/bin/sh#PBS -l nodes=1:ppn=32
#PBS -N PingJob
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:01:00
cd $PBS_O_WORKDIR
module load R/3.1.0
R < parLapply_test.R > parLapply_test.output --no-save
If for some reason your job is submitted from one directory for data and programs in another directory, the working directory can be specified in the script with the -d flag.
-
#!/bin/sh
#PBS -l nodes=2:ppn=4
#PBS -N PingJob
#PBS -d /home/mtech/username/working_dir
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:01:00
hostname
ping -c 30 scyld
Memory Resources
To allocate the correct amount of memory for a job, a user should specify how much memory the job will need. This can be done on command line or with a PBS directive:
-
#PBS -l mem=16gb
The above will allocate 16 GB for a job to be split by the number of processes or tasks assigned to the job. If one node with ppn=4 is requested, then each process will get 4 GB. If only one processor (ppn=1) is requested, then it would get all 16 GB. Moab will assign the job to a node that has at least 16 GB free. A hard limit of 1.1 is set so that if a process exceeds its requested amount by 10% for more than one minute, it will be cancelled.
There are 5 nodes with 128 GB of memory. These nodes can be accessed by requesting the memnode feature:
-
#PBS -l feature=memnode
Script for MPI job
Applications that use MPI require slightly more sophisticated scripts that set the shell and MPI version, identifies the compute nodes allocated for the job, and initiates the mpd daemons on the assigned compute nodes. An example for MPICH2:
-
#!/bin/bash
#PBS -l nodes=4:ppn=32
#PBS -N MPIJob
#PBS -d /home/mtech/username
#PBS -S /bin/bash
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:10:00
MPDHOSTS=mpd.hosts.$PBS_JOBID
sort -u $PBS_NODEFILE > $MPDHOSTS
NODES=`cat $MPDHOSTS | wc -l `
NPROCS=`cat $PBS_NODEFILE | wc -l`
echo "NODES=$NODES"
echo "NPROCS=$NPROCS"
module load mpich2/gnu
mpirun -np $NPROCS --hostfile $MPDHOSTS mympiapp
rm $MPDHOSTS
InfiniBand with OpenMPI
By default the 1 Gig eth network is used. To specify that OpenMPI uses the InfiniBand network, include --mca btl openib,sm,self :
-
module load openmpi/gnu
-
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64
-
mpirun --mca btl openib,sm,self -np $NPROCS --hostfile $MPDHOSTS mympiapp
Monitoring jobs with showq and checkjob
showq
will show the status of your jobs and the number of nodes in use. For more details including nodes assigned use showq -r
To get information on an individual job, use the checkjob
command [5]. The checkjob -v
gives more verbose information on the job.
To check status of all nodes for availability mdiag -n
Canceling Jobs
To terminate a job that is currently running or in the queue, use mjobctl -c
[6] command. The canceljob
[7] can also be used, but it is deprecated.
Admin Notes
Setting some default parameters can be done in Torque and Moab, the Moab settings take precedence.
To view, set, and unset parameters set in Torque for the batch queue:
-
qmgr -c "list queue batch"
-
qmgr -c "set queue batch resources_default.walltime=3600"
-
qmgr -c "unset queue batch resources_default.walltime"
Edit moab.cfg with:
-
CLASSCFG[DEFAULT] DEFAULT.WCLIMIT=3600
Changing default memory allocation was unsuccessful in Moab. In Torque:
-
qmgr -c "set queue batch resources_default.mem=4gb"
will set the total memory allocation for a job. The amount assigned for each process will be its proportional share of the total. If ppn=4, then each process will get 1 gb for this example. Note that the resources_assigned.mem = 4294967296b will automatically get set. It does not look like setting these in Torque enforces any memory restrictions to jobs.