Friday, March 28, 2008

SouthGrid Update

The first stage of the HPC cluster is running LCG jobs, and is being correctly accounted for.

The HPC WNs have AMD 2218 cores, 2.6GHz; these are said to be
1.745 KSi2K each core.
Currently gridpp can run max 32 jobs on this small stage 1 HPC cluster;

Santanu is continuing to work with LHCb to solve all the problems running their code at Cambridge.
The WNs will be upgraded to SL4 within the next few weeks.

When over 100 of the 120 Babar cluster died after a power shutdown at the end of January, it was deemed not worth restoring the cluster. Two twin 1 u servers have been bought to replace this which will provide 32 cores and 78.4KSI2K.
The old escience cluster is being setup as an SL4 ce and WN farm in as a template for the way they will drive the new University 'Blue Bear' HPC cluster. This cluster is made up of 31 dual 3GHz xeons.
The main grid cluster (aka the atlas cluster) has been expanded to 60 cores.

The PPS is not being maintained at the moment.

Chirs has got space tokens working at RALPP ( updated to dCach1.8-12p6 - from p4 - and also rebooted everything after putting in the srmSpaceManager enabled config files).
The new hardware has been installed:
8 boxes, 16 nodes, 32 CPUs so 128 cores.

CPUs are "E5410 @ 2.33GHz" not sure of the kSI2k rating yet.

Running stable. WNs were updated to SL4 earlier this year.

Quotes to move the kit to Begbroke seem too high so we are going to adopt a DIY approach.
Draining t2se01 is taking for ever. The dpm-drain command terminates sometimes after only 20mins (~6GB data transfered). We did however have a good run on the night of the 26th which lasted over 10 hours.
Oddly the error log files are often the same size although not totally consistent.

-rw-r--r-- 1 root root 22519 Feb 22 14:32 dpm-drain-errorlog-se01-1
-rw-r--r-- 1 root root 22519 Feb 22 15:56 dpm-drain-errorlog-se01-2
-rw-r--r-- 1 root root 22564 Feb 22 17:05 dpm-drain-errorlog-se01-3
-rw-r--r-- 1 root root 22519 Feb 22 18:20 dpm-drain-errorlog-se01-4
-rw-r--r-- 1 root root 22519 Feb 25 12:35 dpm-drain-errorlog-se01-5
-rw-r--r-- 1 root root 22519 Feb 25 13:04 dpm-drain-errorlog-se01-6
-rw-r--r-- 1 root root 22519 Feb 25 16:53 dpm-drain-errorlog-se01-7
-rw-r--r-- 1 root root 22519 Mar 26 13:51 dpm-drain-errorlog-se01-8
-rw-r--r-- 1 root root 22519 Mar 26 14:17 dpm-drain-errorlog-se01-9
-rw-r--r-- 1 root root 1193287 Mar 27 00:56 dpm-drain-errorlog-se01-10
-rw-r--r-- 1 root root 567836 Mar 27 13:59 dpm-drain-errorlog-se01-11
-rw-r--r-- 1 root root 25241 Mar 27 14:57 dpm-drain-errorlog-se01-12
-rw-r--r-- 1 root root 22598 Mar 27 15:37 dpm-drain-errorlog-se01-13
-rw-r--r-- 1 root root 22598 Mar 27 16:22 dpm-drain-errorlog-se01-14
-rw-r--r-- 1 root root 22598 Mar 27 17:11 dpm-drain-errorlog-se01-15
-rw-r--r-- 1 root root 22598 Mar 27 21:55 dpm-drain-errorlog-se01-16
-rw-r--r-- 1 root root 22598 Mar 27 23:15 dpm-drain-errorlog-se01-17

No comments: