[Issues] Groningen millipede cluster down

Ewout Helmich helmich at astro.rug.nl
Wed Oct 1 13:35:40 CEST 2014


The millipede is up and running again.

Regards,
Ewout

On 09/30/2014 12:53 PM, Ewout M. Helmich wrote:
> Here's an updated status from CIT:
>
> ------------------------------------------------------------------------------------------------------------------------------------- 
>
>
> Dear millipede-user,
>
> Update on the situation of millipede hpc-cluster:
>
> - yesterday evening a decision was made to shutdown all compute-nodes 
> of millpede because of cooling problems at datacenter. All running jobs 
> have to considered broken/killed/stopped.
>
> - this morning the main headnode collapsed (kernel-panic) resulting in 
> a downed /home and /cm/shared/apps. This also resulted in a downed 
> login-node (because of no access to /home)
>
> - head-node and login-node are operational again. You can login and 
> connect to your data.
>
> - we still have to wait for the green-lights of the cooling 
> technicians, we first want to make sure the cooling system is working 
> properly, before we switch on the compute-nodes. We expect to restart 
> the compute-nodes by the end of this afternoon/perhaps this 
> evening/worst case tomorrow.
>
> - all queues have therefore been taken offline.
>
> Please do not submit new jobs, and do not run your jobs/code directly 
> on the login-node.
>
> -------------------------------------------------------------------------------------------------------------------------------------- 
>
>
> Regards,
> Ewout Helmich
>
>
> On 09/30/2014 09:46 AM, Ewout M. Helmich wrote:
>> Please note: the millipede cluster in Groningen in down at the 
>> moment. See below the message from the CIT.
>>
>> Regards,
>> Ewout Helmich
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>> Dear millipede-user,
>>
>> Due to severe problems with the cooling systems of our datacenter, 
>> the decision is made to switch off the millipede cluster to prevent 
>> possible downtime on other systems. Millipede is one the biggest 
>> consumers of the cooling system/biggest impact on the temperature.
>>
>> When the cooling problems are resolved, we will restart/re-enable the 
>> millipede cluster.
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> _______________________________________________
>> Issues mailing list
>> Issues at astro-wise.org
>> http://mailman.astro-wise.org/mailman/listinfo/issues
>
> _______________________________________________
> Issues mailing list
> Issues at astro-wise.org
> http://mailman.astro-wise.org/mailman/listinfo/issues



More information about the Issues mailing list