Thursday, October 26, 2006

A torque security flaw was made public last week, most southgrid sites shutdown queues on Friday night.
Then patched up over the next 3 days.
Cambridge was not affected as they use Condor.
Sysman meeting at Cambridge followed by the SouthGrid Technical meeting, shared support was discussed and generally agreed, but needs to be formalised and put in place at Birmingham. RALPPD were not there to discuss.
Pete had another meeting with the Oxford Campus Network guys, who have been doing some tests and managed to get better through put by tweeking the kernel. It is now thought that the change that happened on August 15th may have increased the latency between sites and this may be the cause of the reduced bandwidth.
Further tests will be done.

Friday, October 13, 2006

SouthGrid has been busy taking part in the Service Challenge Throughput tests.
Oxford carried out tests on September 26th and 27th. The first Oxford to RALpp went ok although slowly, the second RALPP to Oxford was extremly disapointing. Practically zero bandwidth.

Further iperf tests have been carried out between servers both within Oxford and SouthGrid. At the moment the main cause of the problem seems to be the installation of a new Campus Firewall on August 15th. See: the Gridmon web site.

Work with David Wallom of Oxford Grid has continued in order to enable the NGS VO on Oxford's cluster.

Another worker node PSU failed two weeks ago and was replaced under warrenty by Dell.