Difference between revisions of "Scyld Clusterware - Admin info"
From Montana Tech High Performance Computing
(Created page with "==Power== ===bpctl (soft shutdown/restart)=== This is the recommended way of restarting/shutting down nodes. *Shutdown **<code>bpctl -S all -P</code> *Restart **<code>bpctl -...") |
(No difference)
|
Latest revision as of 10:03, 18 September 2017
Contents
Power
bpctl (soft shutdown/restart)
This is the recommended way of restarting/shutting down nodes.
- Shutdown
bpctl -S all -P
- Restart
bpctl -S all -R
IPMI (hard shutdown, cold boot)
Not recommended, use as a last resort (ie, servers won't come online)
- Get power status of nodes (just because the node is on, doesn't necessarily mean its been booted)
for i in {0..19};do ipmitool -H n$i-ipmi -U admin -P ****** power status;done
- Force shutdown of all nodes (Not Recommended, use bpctl above if nodes are online, running. ONLY USE IF NODES ARE OFF)
for i in {0..19};do ipmitool -H n$i-ipmi -U admin -P ****** power off;done
- Power on all nodes (You will need to do this if you used bpctl to shut a node down)
for i in {0..19};do ipmitool -H n$i-ipmi -U admin -P ****** power on;done
Logs
Logs from all compute nodes are collected on the management node in /var/log/messages
System Event Logs (SEL) can be collected locally or remotely with ipmitool:
ipmitool sel save /tmp/sel.save
ipmitool -H n0-ipmi -U admin -P ****** sel save /tmp/sel.n0
Torque server and schedule logs are stored in subdirectories in /var/spool/torque
Updating
Scyld Software
It is a good idea to check the latest release notes at http://www.penguincomputing.com/files/scyld-docs/CW6/ReleaseNotes.pdf
Certain packages should NOT be upgraded from CentOS or EPEL repositories. So far, these include:
- beobootutils
- beoconfig
- beoconfig-devel
- beoconfig-libs
- beosi
- bproc
- bproc-devel
- bproc-libs
- bproc-python
- kernel
- kernel-devel
- kernel-firmware
- kernel-headers
- kmod-aacraid
- kmod-bproc
- kmod-filecache
- kmod-igb
- kmod-task_packer
- nodescripts
- openmpi-scyld
- openmpi-scyld-gnu
- openmpi-scyld-intel
- openmpi-scyld-pgi
- scyld-doc
- scyld-doc-HTML
- scyld-doc-HTTPD
- scyld-doc-PDF
- scyld-doc-indexhtml
- scyld-release
- beonss-kickbackclient
- beostat-sendstats
- scyld-insight
YUM
To find a specific program, use yum search <searchterms>
Installing Packages
- Install a package:
yum install <packagename>
Updating Packages
If upgrading to a new version of Scyld, then read and follow the release notes - http://www.penguincomputing.com/services-support/documentation/
- To list all updates:
yum list updates
- To list updates packages from a specific repository (ie, Fedora-EPEL)
yum --disablerepo "*" --enablerepo "Fedora-EPEL" list updates
- To run a full system update
yum update
- To update a specific package
yum install <packagename>
Removing Packages
- Remove a package:
yum remove <packagename>
How to actually do an update
yum update
If the kernel was updated:
bpctl -S all -P
shutdown -r now
(sometimes this will need to be entered twice)
If the kernel was not updated:
bpctl -S all -R
InfiniBand Problems after an update
When /etc/beowulf/init.d/15openib and 16ipoib are modified after an update, two changes maybe required: 15openib, line 45 - change Infiniband to Mellanox 16iboip, line 38 - change Infiniband to Mellanox Then reboot all nodes.