[Issues] Groningen millipede cluster down
Ewout M. Helmich
helmich at astro.rug.nl
Tue Sep 30 12:53:45 CEST 2014
Here's an updated status from CIT:
-------------------------------------------------------------------------------------------------------------------------------------
Dear millipede-user,
Update on the situation of millipede hpc-cluster:
- yesterday evening a decision was made to shutdown all compute-nodes of
millpede because of cooling problems at datacenter. All running jobs have
to considered broken/killed/stopped.
- this morning the main headnode collapsed (kernel-panic) resulting in a
downed /home and /cm/shared/apps. This also resulted in a downed
login-node (because of no access to /home)
- head-node and login-node are operational again. You can login and
connect to your data.
- we still have to wait for the green-lights of the cooling technicians,
we first want to make sure the cooling system is working properly,
before we switch on the compute-nodes. We expect to restart the
compute-nodes by the end of this afternoon/perhaps this evening/worst
case tomorrow.
- all queues have therefore been taken offline.
Please do not submit new jobs, and do not run your jobs/code directly on
the login-node.
--------------------------------------------------------------------------------------------------------------------------------------
Regards,
Ewout Helmich
On 09/30/2014 09:46 AM, Ewout M. Helmich wrote:
> Please note: the millipede cluster in Groningen in down at the moment.
> See below the message from the CIT.
>
> Regards,
> Ewout Helmich
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>
> Dear millipede-user,
>
> Due to severe problems with the cooling systems of our datacenter, the
> decision is made to switch off the millipede cluster to prevent
> possible downtime on other systems. Millipede is one the biggest
> consumers of the cooling system/biggest impact on the temperature.
>
> When the cooling problems are resolved, we will restart/re-enable the
> millipede cluster.
>
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> _______________________________________________
> Issues mailing list
> Issues at astro-wise.org
> http://mailman.astro-wise.org/mailman/listinfo/issues
More information about the Issues
mailing list