[Issues] Groningen millipede cluster down

Ewout M. Helmich helmich at astro.rug.nl
Tue Sep 30 12:53:45 CEST 2014


Here's an updated status from CIT:

-------------------------------------------------------------------------------------------------------------------------------------

Dear millipede-user,

Update on the situation of millipede hpc-cluster:

- yesterday evening a decision was made to shutdown all compute-nodes of 
millpede because of cooling problems at datacenter. All running jobs have 
to considered broken/killed/stopped.

- this morning the main headnode collapsed (kernel-panic) resulting in a 
downed /home and /cm/shared/apps. This also resulted in a downed 
login-node (because of no access to /home)

- head-node and login-node are operational again. You can login and 
connect to your data.

- we still have to wait for the green-lights of the cooling technicians, 
we first want to make sure the cooling system is working properly, 
before we switch on the compute-nodes. We expect to restart the 
compute-nodes by the end of this afternoon/perhaps this evening/worst 
case tomorrow.

- all queues have therefore been taken offline.

Please do not submit new jobs, and do not run your jobs/code directly on 
the login-node.

--------------------------------------------------------------------------------------------------------------------------------------

Regards,
Ewout Helmich


On 09/30/2014 09:46 AM, Ewout M. Helmich wrote:
> Please note: the millipede cluster in Groningen in down at the moment. 
> See below the message from the CIT.
>
> Regards,
> Ewout Helmich
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------ 
>
>
> Dear millipede-user,
>
> Due to severe problems with the cooling systems of our datacenter, the 
> decision is made to switch off the millipede cluster to prevent 
> possible downtime on other systems. Millipede is one the biggest 
> consumers of the cooling system/biggest impact on the temperature.
>
> When the cooling problems are resolved, we will restart/re-enable the 
> millipede cluster.
>
> ---------------------------------------------------------------------------------------------------------------------------------------- 
>
> _______________________________________________
> Issues mailing list
> Issues at astro-wise.org
> http://mailman.astro-wise.org/mailman/listinfo/issues



More information about the Issues mailing list