The University of Arizona

Resources

Reservation Policy for the Gigabit Cluster (cl01, cl02,..., cl32)

Overview

The cl computer cluster, installed in late 2002, is comprised of 32 nodes with 2.4GHz Pentium 4 processors, 1 GB RAM (2GB in cl29-cl32), and 2 - 40GB EIDE disk drives.  Nodes are connected to a Foundry FastIron switch using Gigabit Ethernet connections.  As configured, the switch is non-blocking.

Use of this cluster is available to all approved account holders.  Unlike previous clusters, the cl cluster uses a reservation scheduling system.  This allows users needing dedicated nodes to exclusively reserve them for one or more days, preventing anyone else from using the reserved nodes and inadvertently affecting test results.  For users interested in compute cycles only, any non-reserved node may be used (although we recommend that compute-cycle users start reservations at higher numbers, e.g., cl32, cl31, etc., and work down sequentially; lower-numbered cl systems should then be used for timing studies.)

How it works

Login access to nodes in the cl cluster is controlled by the /etc/passwd file on each node (NIS is not used).  By default, anyone with approved access has their account information in these /etc/passwd files and can log in.  When a reservation is about to begin, the passwd file is updated so that only the user(s) specified in the reservation request have the ability to log in.  When the reservation ends, the passwd file reverts to the "full" version.

Currently, reservations begin at 0:00AM and end at 11:59PM.  Thus a cron job runs once a day (at 11:59PM) and updates the passwd files - releasing nodes whose reservations have ended that day and reserving nodes for the next (and perhaps following) day(s) as per the reservation schedule.

Note that when a reservation begins, each node being reserved is rebooted. This assures that the reservation begins with a "clean" environment on each node.

Web Front-end

Future reservations are made and current reservations viewed via a web page interface. These are accessed from the main cluster page at http://www.cs.arizona.edu/computing/cluster. This page is password-protected and provides links to the reservations pages.

Gigabit Cluster (cl) Reservations - this page shows reservation status for all nodes for the next 7 days (beginning with today) in a table format.  When a reservation exists for a node on a particular date, the username(s) associated with that reservation appear in the appropriate cell of the table.  Links for 'Prev 7 days' and 'Next 7 days' allow for scrolling through the calendar.  There is also a link to the page for making reservations.

Gigabit Cluster (cl) Reservation Request - this is the page where reservations are scheduled. A reservation request consists of:

username(s) - one or more usernames
begin date - date reservation is to begin
end date - date reservation is to begin
node(s) - nodes to be reserved

When the reservation request is submitted, it is checked for validity and for conflicts with previously scheduled requests (a link to the view reservations page is provided to aid in the scheduling process).  Once the request is validated, a page is displayed to show the details of the reservation (also with a link to the 'view' page -- for visual confirmation).

Questions, Special Handling

Obviously this reservation system does not cover everything.  It is expected that shortcomings will be identified and addressed as the system is used.  Some come to mind immediately:

Q.  How do I become an approved user?
A.  A faculty member can sponsor themselves and/or others to become approved cluster users by sending email to lab*. With your approval your name is added to the web interface pull-down list for reservations and you receive, via email, the username/password for accessing the web interface.

Q.  I'm only concerned with compute cycles, but I'd like to pick nodes that aren't already bogged down with other processes.   How do I know which ones are busy?
A.  From vochelle, run
gsh cl_cluster w

This will show who's on each node and the load average.

Q. Why can't I reserve a node today and start using it immediately?
A. Since the cron job doesn't run until 11:59pm, the reservation page doesn't allow you to reserve for today. Also, there may be compute cycle users on the desired node. If absolutely necessary, this could be handled manually by the Lab staff.

Q. What if I'm through with my work early - how do I remove my nodes from the reservation schedule?
A. Send email to the mail alias lab*. - we will do it manually.

Q. What if I need to extend my reservation? I don't want the node to reboot and kill my running jobs.
A. Provided no one else has reserved the node for the next day, we can modify the reservation. This will prevent someone else from reserving the node ­ which would cause a reboot. Send email to the mail alias lab*.

Q. What if I need to change a previously scheduled reservation? There's no way to delete it.
A.
In order to prevent one user from deleting another's reservations, the 'delete' function was purposely omitted. Send email to the mail alias lab*.

Q. There are users in the Reservations username(s) drop-down menu that shouldn't be -- or users are missing. How is this corrected?
A.
Send email to the mail alias lab*.

Q. Why can't I access files in a shared subdirectory of another user's home directory using "~" references?
A
. Because a reservation reduces the /etc/passwd file to only the reservation holder, the "~" cannot be resolved. However, the automount maps are always in place so full references (i.e., /home/username/...) will work. Often the better alternative is to have a project directory set up under /cs (rather than sharing within /home).

Q. How do I share comments or discuss issues with other cluster users?
A
. Send mail to the mail alias cl_cluster*. All cluster users will receive a copy.

* @cs.arizona.edu


Last updated Monday, 07-Jan-2008 09:32:28 MST, by John Luiten
Send questions about this page to