[Issues] Groningen Millipede cluster maintenance

Ewout Helmich helmich at astro.rug.nl
Thu Feb 13 12:23:50 CET 2014


Below I reproduce the email and login message regarding the upcoming 
maintenance on millipede.

Regards,
Ewout Helmich



------------------------------------- Login message on millipede 
----------------------------------------------

Planned Maintenance:


- 13/2/2014: disable medium-queues (3 day queue)
- 15/2/2014: disable normal-queue (1 day queue)

- 17/02/2014: Tuning/maintenance of /data storage, maintenance IB switch
- 17/02/2014: /data_old read-only

------------------------------------------ Mail from CIT 
-------------------------------------------------------------

Dear Millipede-user,

Next Monday (17 February) we will do our planned maintenance. There are 
connection problems with the Inifiniband switch, which is one of the 
most critical pieces of the cluster. One of steps will be to 
restart/power cycle this switch. HPC-experts from Clustervision will be 
stand-by/helping out in case of trouble.

Expected downtime for the cluster will be 4 hours, but after completion 
of the maintenance we will have to do some proper testing, to make sure 
that everything is stable again. If all goes well millipede will be 
usable and the end of the afternoon/evening on Monday.

Yesterday we have made some progress in rebooting/reattaching a 
storage-node to the IB-switch. Up until now this storage-server has a 
healthy stable connection. So it looks promising.

Worst-case scenario would be that the switch is really broken and has to 
replaced. That will mean extra downtime for the cluster (days).

We will keep you informed.

-- 
vr.gr. Ger Strikwerda

Opérateur d'ordinateur
Rijksuniversiteit Groningen
Donald Smits Centrum voor Informatie Technologie
Unit Serverinfrastructuur

Zernikeborg
Nettelbosje 1
9747 AJ Groningen
Tel. 050 363 9276

"God is hard, God is fair, some men he gave brains,
  others he gave hair"i


More information about the Issues mailing list