Resources
Reservation Policy for the Gigabit Cluster (cl01, cl02,..., cl32)
Overview
The cl computer cluster, installed in late 2002, is comprised of 32 nodes with 2.4GHz Pentium 4 processors, 1 GB RAM (2GB in cl29-cl32), and 2 - 40GB EIDE disk drives. Nodes are connected to a Foundry FastIron switch using Gigabit Ethernet connections. As configured, the switch is non-blocking.
Use of this cluster is available to all approved account holders. Unlike previous clusters, the cl cluster uses a reservation scheduling system. This allows users needing dedicated nodes to exclusively reserve them for one or more days, preventing anyone else from using the reserved nodes and inadvertently affecting test results. For users interested in compute cycles only, any non-reserved node may be used (although we recommend that compute-cycle users start reservations at higher numbers, e.g., cl32, cl31, etc., and work down sequentially; lower-numbered cl systems should then be used for timing studies.)
How it works
Login access to nodes in the cl cluster is controlled by the /etc/passwd file on each node (NIS is not used). By default, anyone with approved access has their account information in these /etc/passwd files and can log in. When a reservation is about to begin, the passwd file is updated so that only the user(s) specified in the reservation request have the ability to log in. When the reservation ends, the passwd file reverts to the "full" version.
Currently, reservations begin at 0:00AM and end at 11:59PM. Thus a cron job runs once a day (at 11:59PM) and updates the passwd files - releasing nodes whose reservations have ended that day and reserving nodes for the next (and perhaps following) day(s) as per the reservation schedule.
Note that when a reservation begins, each node being reserved is rebooted. This assures that the reservation begins with a "clean" environment on each node.
Web Front-end
Future reservations are made and current reservations viewed via a web page interface. These are accessed from the main cluster page at http://www.cs.arizona.edu/computing/cluster. This page is password-protected and provides links to the reservations pages.
Gigabit Cluster (cl) Reservations - this page shows reservation status for all nodes for the next 7 days (beginning with today) in a table format. When a reservation exists for a node on a particular date, the username(s) associated with that reservation appear in the appropriate cell of the table. Links for 'Prev 7 days' and 'Next 7 days' allow for scrolling through the calendar. There is also a link to the page for making reservations.
Gigabit Cluster (cl) Reservation Request - this is the page where
reservations are scheduled. A reservation request consists of:
| username(s) | - | one or more usernames |
| begin date | - | date reservation is to begin |
| end date | - | date reservation is to begin |
| node(s) | - | nodes to be reserved |
When the reservation request is submitted, it is checked for validity and for conflicts with previously scheduled requests (a link to the view reservations page is provided to aid in the scheduling process). Once the request is validated, a page is displayed to show the details of the reservation (also with a link to the 'view' page -- for visual confirmation).
Questions, Special Handling
Obviously this reservation system does not cover everything. It is expected that shortcomings will be identified and addressed as the system is used. Some come to mind immediately:
Q. How do I become an approved user?
A. A faculty member can sponsor themselves and/or others to become
approved cluster users by sending email to lab*. With your approval your
name is added to the web interface pull-down list for reservations and you
receive, via email, the username/password for accessing the web interface.
A. From vochelle, run
gsh cl_cluster w
This will show who's on each node and the load average.
Q. Why can't I reserve a node today and start using it
immediately?
A. Since the cron job doesn't run until 11:59pm, the reservation page
doesn't allow you to reserve for today. Also, there may be compute cycle
users on the desired node. If absolutely necessary, this could be handled
manually by the Lab staff.
Q. What if I'm through with my work early - how do I remove my
nodes from the reservation schedule?
A. Send email to the mail alias lab*. - we will do it
manually.
Q. What if I need to extend my reservation? I don't want the node to
reboot and kill my running jobs.
A. Provided no one else has reserved the node for the next day, we
can modify the reservation. This will prevent someone else from reserving the
node which would cause a reboot. Send email to the mail alias lab*.
Q. What if I need to change a previously scheduled reservation?
There's no way to delete it.
A. In order to prevent one user from deleting another's reservations, the
'delete' function was purposely omitted. Send email to the mail alias
lab*.
A. Send email to the mail alias lab*.
Q. Why can't I access files in a shared subdirectory of another
user's home directory using "~" references?
A. Because a reservation reduces the /etc/passwd file to only the
reservation holder, the "~" cannot be resolved. However, the automount maps are
always in place so full references (i.e., /home/username/...) will work. Often
the better alternative is to have a project directory set up under /cs (rather
than sharing within /home).
Q. How do I share comments or discuss issues with other cluster
users?
A. Send mail to the mail alias cl_cluster*. All cluster users will receive
a copy.
* @cs.arizona.edu
Last updated Monday, 07-Jan-2008 09:32:28 MST, by John Luiten
Send questions about this page to