[Issues] Groningen millipede cluster down
Ewout Helmich
helmich at astro.rug.nl
Wed Oct 1 13:35:40 CEST 2014
The millipede is up and running again.
Regards,
Ewout
On 09/30/2014 12:53 PM, Ewout M. Helmich wrote:
> Here's an updated status from CIT:
>
> -------------------------------------------------------------------------------------------------------------------------------------
>
>
> Dear millipede-user,
>
> Update on the situation of millipede hpc-cluster:
>
> - yesterday evening a decision was made to shutdown all compute-nodes
> of millpede because of cooling problems at datacenter. All running jobs
> have to considered broken/killed/stopped.
>
> - this morning the main headnode collapsed (kernel-panic) resulting in
> a downed /home and /cm/shared/apps. This also resulted in a downed
> login-node (because of no access to /home)
>
> - head-node and login-node are operational again. You can login and
> connect to your data.
>
> - we still have to wait for the green-lights of the cooling
> technicians, we first want to make sure the cooling system is working
> properly, before we switch on the compute-nodes. We expect to restart
> the compute-nodes by the end of this afternoon/perhaps this
> evening/worst case tomorrow.
>
> - all queues have therefore been taken offline.
>
> Please do not submit new jobs, and do not run your jobs/code directly
> on the login-node.
>
> --------------------------------------------------------------------------------------------------------------------------------------
>
>
> Regards,
> Ewout Helmich
>
>
> On 09/30/2014 09:46 AM, Ewout M. Helmich wrote:
>> Please note: the millipede cluster in Groningen in down at the
>> moment. See below the message from the CIT.
>>
>> Regards,
>> Ewout Helmich
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> Dear millipede-user,
>>
>> Due to severe problems with the cooling systems of our datacenter,
>> the decision is made to switch off the millipede cluster to prevent
>> possible downtime on other systems. Millipede is one the biggest
>> consumers of the cooling system/biggest impact on the temperature.
>>
>> When the cooling problems are resolved, we will restart/re-enable the
>> millipede cluster.
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Issues mailing list
>> Issues at astro-wise.org
>> http://mailman.astro-wise.org/mailman/listinfo/issues
>
> _______________________________________________
> Issues mailing list
> Issues at astro-wise.org
> http://mailman.astro-wise.org/mailman/listinfo/issues
More information about the Issues
mailing list