Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

ISC Network Services Downtime

If any downtime is necessary for any of our services, please try to follow

Scheduled Downtime

  1. Downtime checklist
    Design a downtime procedure that includes a checklist or plan of the things that you plan to do during the downtime and a backout procedure in case there are problems. This procedure should try to minimize full disruption of service. Any steps that can be taken without taking the service down should be considered.
    Examples of plans:

  2. Determine downtime window
    Check the Services responsibility list for the normal downtime window for the service and coordinate with the person who is the primary contact for the service to determine the necessary downtime window. Most service changes should be scheduled from Tuesday through Thursday.

  3. Notification
    Notification of the scheduled downtime should go out at least one week prior to the actual downtime. It's not always possible to give one week notice but we should aim for it. The following people should be notified:
    • Engineering
    • Network management
    • ProDesk
    • Primary users of the service
      Use either a mailing list associated with the service or if it is a central service, a note should go to the NOC so that they can send out an outage notice using the Outages app.

      Example of an outage note to be sent to the NOC:
      Subject: Outage for www.upenn.edu
      
      Date:                   August 17, 2005
      Start Time:             5:00 am
      Duration:               1 hour 
      Building(s) Affected:   Entire campus 
      Service(s) Affected:    www.upenn.edu 
      Description:            www.upenn.edu will be unavailable to all users on
                              August 17, 2005 from 5:00am to 6:00am while we 
                              upgrade hardware.  We do not anticipate 
                              that the server will be unavailable during this 
                              entire window but are reserving this outage in 
                              case of problems. 
      
    • Add an entry to the nt-dtime@zimbra.upenn.edu calendar. This is a public calendar. Use this URL, "https://zimbra.upenn.edu/home/nt-dtime/Calendar", to set up your calendar client to view the calendar or https://zimbra.upenn.edu/home/nt-dtime/Calendar?fmt=html to see an html version. Only those people who are responsible for sending out announcements should have admin access to update this calendar.
      • Subject: - should contain the name of the server/service experiencing the downtime
      • Locaton: - should be either "Off-site" which indicates that staff will do wor remotely from home or "On-site" which indicates that staff will be on campus
      • Attendees: - the invite list should include those staff members who will be required for the work and other interested parties who will be overlooking the work
      • Description: - description of the work to be done. Try to include the outage message that was sent about the outage also.


  4. Application announcement
    If appropriate, applications associated with the service should give a warning that the service will not be available. Many of our user service web sites have a status block and an announcement should be placed there.

  5. Suppress alarms
    Plan to suppress all monitoring alarms that may be associated with the service.

  6. Test changes
    If appropriate, please test any service changes once applied. Be prepared to respond to any trouble reports about the service immediately. If there are any problems, we should send updates to the notification list above. A summary of the outage, whether there are problems or there is success should be sent to internal staff.

  7. Update documentation
    Modify internal documentation in the source code repository or Wiki indicating anything changed as a result of the downtime. For example, if the downtime requires a failover to a system in another facility, modify the ActiveSystemsLocation document on the Wiki.

Unscheduled Downtime

  1. Send notification out following step #3 for scheduled downtime but eliminate adding an entry to the downtime calendar. For an unscheduled downtime it may be easier to call ProDesk (215-573-4017) and the NOC (215-573-9631) rather than sending email.
  2. Add an announcement on the user service web site.
top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania