[Issues] Peregrine cluster down
Ewout Helmich
helmich at astro.rug.nl
Wed Jul 4 10:25:40 CEST 2018
Please note that it is possible to use the DPU normally used for
coaddition. Select it from the awe-prompt:
awe> dpu.set_dpu_client('coadddpu.astro.target.astro-wise.org')
or update your configuration file (~/.awe/Environment.cfg) by adding (or
changing) the dpu_name option:
dpu_name : coadddpu.astro.target.astro-wise.org
Follow the status of your jobs by selecting the dpu from the drop-down
menu on the webpage:
https://dpu.hpc.rug.astro-wise.org/
---
Of course it's also possible to use your local CPU, by replacing:
awe> dpu.run(....)
with
awe> lpu.run(...) (remove options specific to the DPU, like
dpu_time)
Regards,
Ewout
On 07/03/2018 06:41 PM, Ewout Helmich wrote:
>
> Hi everyone,
>
> Just got this from the CIT.
>
> Kind regards,
>
> Ewout
>
> ---
>
> Dear Peregrine user,
>
>
> As you might have noticed, during last week the Peregrine filesystems
> have encountered multiple crashes. Currently making /data available in
> a stable way does not even seem to be possible.
> This is why have decided to perform unscheduled maintenance, and start
> with the upgrade of the storage environment on short notice.
> Fortunately we have already done a lot of preparation. We can
> therefore minimize the downtime considerably, due to the use of
> temporary storage. We will use this temporary storage, while we
> upgrade the original Peregrine storage systems.
>
> The maintenance will have the following consequences:
>
> * From now on until Friday 06-07 Peregrine will be unavailable. If
> we finish earlier the system will be made available again sooner.
> * After the downtime Peregrine will be configured with temporary
> storage. This means that in the future we will plan scheduled
> downtime to switch back to the original upgraded storage.
> * All running jobs will unfortunately be lost. Waiting jobs will be
> suspended.
> * Since we don't have a copy of /scratch, this file system will be
> empty when we resume operations.
> * We will, however, provide read only access to the old /scratch for
> one week to allow you to copy important data.
> * If possible we will make the login nodes available for read-only
> access to the data. Some reboots will be necessary however.
>
> We apologize for any inconvenience this unscheduled maintenance will
> cause.
>
> Kind regards,
>
> Fokke Dijkstra
> HPC-team
>
>
> _______________________________________________
> Issues mailing list
> Issues at astro-wise.org
> http://mailman.astro-wise.org/mailman/listinfo/issues
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.astro-wise.org/pipermail/issues/attachments/20180704/791683cd/attachment.html>
More information about the Issues
mailing list