<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-33333025</id><updated>2011-10-17T09:43:50.621+01:00</updated><category term='torque nfs problem'/><category term='SL4'/><category term='dCache'/><category term='Atlas'/><category term='byChris'/><category term='gLite'/><category term='Space Tokens'/><title type='text'>SouthGrid</title><subtitle type='html'>SouthGrid is a regional Tier 2 centre for GridPP, and LCG, distributed between the Universities of Birmingham, Bristol, Cambridge, and Oxford, and the Rutherford Appleton Laboratory.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>87</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-33333025.post-4454443509143163859</id><published>2011-09-06T13:53:00.004+01:00</published><updated>2011-09-06T23:41:44.937+01:00</updated><title type='text'>Installing and Deploying a Cluster Publisher</title><content type='html'>As part of the battle to replace out LCG-CEs with CreamCEs I realised that the reason one of our new CreamCEs was not getting many jobs was because it was not publishing a cluster/subcluster into the BDII (despite having a &lt;code&gt;/var/lib/bdii/gip/static-file-Cluster.ldif&lt;/code&gt; file) and so I guess wasn't matching any resources.&lt;br /&gt;&lt;br /&gt;Since, I eventually wanted to go to a stand alone Cluster Publisher I thought it would be easiest to push ahead and install that rather than try to install one one the CreamCE and remove it later.&lt;br /&gt;&lt;br /&gt;So with a shiny new VM in hand and certificate I plunged onwards.&lt;br /&gt;&lt;br /&gt;First step was to define the cluster variables in site-info.def (or in this case a specific node file):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;cat /opt/glite/yaim/etc/nodes/heplnv146.pp.rl.ac.uk&lt;br /&gt;CE_HOST_heplnx206_pp_rl_ac_uk_CE_TYPE=cream&lt;br /&gt;CE_HOST_heplnx206_pp_rl_ac_uk_CE_InfoJobManager=pbs&lt;br /&gt;CE_HOST_heplnx206_pp_rl_ac_uk_QUEUES="grid"&lt;br /&gt;CE_HOST_heplnx207_pp_rl_ac_uk_CE_TYPE=cream&lt;br /&gt;CE_HOST_heplnx207_pp_rl_ac_uk_CE_InfoJobManager=pbs&lt;br /&gt;CE_HOST_heplnx207_pp_rl_ac_uk_QUEUES="grid"&lt;br /&gt;CLUSTER_HOST=heplnv146.pp.rl.ac.uk&lt;br /&gt;CLUSTERS=GRID&lt;br /&gt;CLUSTER_GRID_CLUSTER_UniqueID=grid.pp.rl.ac.uk&lt;br /&gt;CLUSTER_GRID_CLUSTER_Name=grid.pp.rl.ac.uk&lt;br /&gt;CLUSTER_GRID_SITE_UniqueID=UKI-SOUTHGRID-RALPP&lt;br /&gt;CLUSTER_GRID_CE_HOSTS="heplnx206.pp.rl.ac.uk heplnx207.pp.rl.ac.uk"&lt;br /&gt;CLUSTER_GRID_SUBCLUSTERS="GRID"&lt;br /&gt;SUBCLUSTER_GRID_SUBCLUSTER_UniqueID=grid.pp.rl.ac.uk&lt;br /&gt;SUBCLUSTER_GRID_HOST_ApplicationSoftwareRunTimeEnvironment="&lt;br /&gt;        LCG-2&lt;br /&gt;        LCG-2_1_0&lt;br /&gt;        LCG-2_1_1&lt;br /&gt;        LCG-2_2_0&lt;br /&gt;        LCG-2_3_0&lt;br /&gt;        LCG-2_3_1&lt;br /&gt;        LCG-2_4_0&lt;br /&gt;        LCG-2_5_0&lt;br /&gt;        LCG-2_6_0&lt;br /&gt;        LCG-2_7_0&lt;br /&gt;        GLITE-3_0_0&lt;br /&gt;        RALPP&lt;br /&gt;        SOUTHHGRID&lt;br /&gt;        GRIDPP&lt;br /&gt;        R-GMA&lt;br /&gt;"&lt;br /&gt;SUBCLUSTER_GRID_HOST_ArchitectureSMPSize=4&lt;br /&gt;SUBCLUSTER_GRID_HOST_ArchitecturePlatformType=x86_64&lt;br /&gt;SUBCLUSTER_GRID_HOST_BenchmarkSF00=0&lt;br /&gt;SUBCLUSTER_GRID_HOST_BenchmarkSI00=2390&lt;br /&gt;SUBCLUSTER_GRID_HOST_MainMemoryRAMSize=2000&lt;br /&gt;SUBCLUSTER_GRID_HOST_MainMemoryVirtualSize=2000&lt;br /&gt;SUBCLUSTER_GRID_HOST_NetworkAdapterInboundIP=FALSE&lt;br /&gt;SUBCLUSTER_GRID_HOST_NetworkAdapterOutboundIP=TRUE&lt;br /&gt;SUBCLUSTER_GRID_HOST_OperatingSystemName=ScientificSL&lt;br /&gt;SUBCLUSTER_GRID_HOST_OperatingSystemRelease=5.4&lt;br /&gt;SUBCLUSTER_GRID_HOST_OperatingSystemVersion=Boron&lt;br /&gt;SUBCLUSTER_GRID_HOST_ProcessorClockSpeed=2300&lt;br /&gt;SUBCLUSTER_GRID_HOST_ProcessorModel=Xeon&lt;br /&gt;SUBCLUSTER_GRID_HOST_ProcessorOtherDescription='Cores=3.7656,Benchmark=9.56-HEP-SPEC06'&lt;br /&gt;SUBCLUSTER_GRID_HOST_ProcessorVendor=Intel&lt;br /&gt;SUBCLUSTER_GRID_SUBCLUSTER_Name=grid.pp.rl.ac.uk&lt;br /&gt;SUBCLUSTER_GRID_SUBCLUSTER_PhysicalCPUs=546&lt;br /&gt;SUBCLUSTER_GRID_SUBCLUSTER_LogicalCPUs=2056&lt;br /&gt;SUBCLUSTER_GRID_SUBCLUSTER_WNTmpDir=/scratch&lt;/pre&gt;&lt;br /&gt;Then it was a simple case of installing the rpms and running YAIM:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;yum install emi-cluster&lt;br /&gt;/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-CLUSTER&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;At that point we seemed to have a working system, the BDII was running and queriable, I count connect to the gridftp server and it had set up expriment and cluster directories in &lt;code&gt;/opt/edg/var/info/&lt;/code&gt;  and  &lt;code&gt;/opt/glite/var/info/&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;Fine, next step was to rsync the contents of those directories from the torque server that then exports them to the CEs - well actually to &lt;code&gt;/export/gridtags&lt;/code&gt; and &lt;code&gt;/export/glitetags&lt;/code&gt; and symlink the previous locations to those. cfengine had already set the node up as a nfs server for me so exporting the new areas and updating the CEs to mount it from there was a matter of moments.&lt;br /&gt;&lt;br /&gt;A quick check of the resource BDII looked fine so it was a simple matter to add the new source into the site bdii and tweak the &lt;code&gt;static-file-CE.ldif&lt;/code&gt; file on the CreamCE to assign it to the new cluster.&lt;br /&gt;&lt;br /&gt;One thing remained, when testing the gridftp server with uberftp* I'd noticed that I was not mapped to my usual pool account - not surprising as I had not mounted the site gridmapdir so it was using its local one. However, reasoning that the gridftp server was the same rpm as the one on the CreamCE that was using Argus for authentication and mapping I had a poke around on the CreamCE and in YAIM and tried installing the &lt;code&gt;argus-gsi-pep-callout&lt;/code&gt; rpm and coping over &lt;code&gt;/etc/grid-security/gsi-authz.conf&lt;/code&gt; and &lt;code&gt;/etc/grid-security/gsi-pep-callout.conf&lt;/code&gt; from the CreamCE.&lt;br /&gt;&lt;br /&gt;Another quick test with uberftp and yes, I am mapped to my normal pool account so it appears I have a Cluster Publisher with Argus integration working. That means the one things at the site not using Argus are the gLite CreamCE which will be replaced soon by another EMI one and dCache which will get banning from Argus when I update to the next Golden Release.&lt;br /&gt;&lt;br /&gt;*&lt;code&gt;uberftp heplnv146.pp.rl.ac.uk "ls /etc"&lt;/code&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4454443509143163859?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4454443509143163859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4454443509143163859&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4454443509143163859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4454443509143163859'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2011/09/installing-and-deploying-cluster.html' title='Installing and Deploying a Cluster Publisher'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4779138750294857780</id><published>2011-07-22T10:11:00.002+01:00</published><updated>2011-07-22T10:41:02.292+01:00</updated><title type='text'>EMI CREAM</title><content type='html'>We have installed emi creamce at Oxford. It was quite straight forward and apparently everything was setup by yaim properly except that emi cream uses normal /etc/, /usr/ directories instead of /opt/glite. It uses just one repository for all packages, no more separate TORQUE_* repositories. &lt;br /&gt;Jobs were running perfectly and all test jobs completed successfully. But it was only getting lhcbpilot jobs and after looking more closely it was the classic "GlueCEStateWaitingJobs: 444444" problem.&lt;br /&gt;&lt;br /&gt;Drilling through many layer of wrapper it comes to this issue&lt;br /&gt;/sbin/runuser  -s /bin/sh ldap -c "diagnose -g --host=t2ce02.physics.ox.ac.uk"&lt;br /&gt;ERROR:    'diagnose' failed&lt;br /&gt;ERROR:    user 'ldap' is not authorized to execute command 'diagnose'&lt;br /&gt;&lt;br /&gt;I think this is the less documented part of emi creamce. In glite, slapd and bdii-update process was run by edguser but with emi it is run by ldap user. &lt;br /&gt;Edited maui.cfg file&lt;br /&gt;ADMIN3                  edginfo rgma edguser ldap&lt;br /&gt;&lt;br /&gt;It solved the problem as I was using our site wide maui.cfg file instead of default created by yaim. Just a heads-up if you are planning to install emi creamce&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4779138750294857780?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4779138750294857780/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4779138750294857780&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4779138750294857780'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4779138750294857780'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2011/07/emi-cream.html' title='EMI CREAM'/><author><name>Kashif Mohammad</name><uri>http://www.blogger.com/profile/15365806609014519714</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6030593384667598090</id><published>2011-03-11T18:36:00.000+01:00</published><updated>2011-03-11T18:37:38.128+01:00</updated><title type='text'>SAM to MyEGEE to finally MyEGI</title><content type='html'>I have updated to latest release of wlcg nagios to gridppnagios. It is a major release in the sense that it stopped configuring MyEGEE for portal and replaced it with MyEGI. MyEGEE would be there until I drop the myegee DB from gridppnagios machine but don't trust it anymore. I got two complain about MyEGEE within few hours of updating it so I can say that people are looking at it.&lt;br /&gt;The other main change is that now Nagios Configuration Generator(NCG) is using Aggregated Topology Provider(ATP) instead of SAMDB to configure nagios. ATP is part of the ROC/NGI nagios package  which aggregate information from GOCDB, Top BDII and VO feed etc and it is single authoritative information source with topology information.  But it is the central ATP(http://grid-monitoring.cern.ch/atp) which is being used by all ROC/NGI's for topology configuration for the sake of uniformity and probably reliability . Old SAM infrastructure can now retire in peace.&lt;br /&gt;So MyEGI, It is a kind of all in one (https://gridppnagios.physics.ox.ac.uk/myegi).&lt;br /&gt;It has Gridmap, metric status, history and so on. Aesthetically MyEGEE was better but MyEGI has more functionality and if you are still not convince then check the comparison of SAM, MyEGEE and MyEGI here (https://tomtools.cern.ch/confluence/display/SAM/MyEGI+vs+MyEGEE+vs+SAM+Portal ).&lt;br /&gt;MyEGI have very good search options and also has advanced filter so you can optimize your search and add URL to your bookmark for instance status of your site.&lt;br /&gt;I just discovered  two bugs and the irritating things is that it is showing advance date on history bar. So if you want to see the status at 11 March, check for 12 March !&lt;br /&gt;A bug has been opened and hopefully it will be fixed soon&lt;br /&gt;https://tomtools.cern.ch/jira/browse/SAM-1325 &lt;br /&gt;https://tomtools.cern.ch/jira/browse/SAM-1326&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6030593384667598090?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6030593384667598090/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6030593384667598090&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6030593384667598090'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6030593384667598090'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2011/03/sam-to-myegee-to-finally-myegi.html' title='SAM to MyEGEE to finally MyEGI'/><author><name>Kashif Mohammad</name><uri>http://www.blogger.com/profile/15365806609014519714</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-2012929240380349075</id><published>2011-02-28T15:17:00.002+01:00</published><updated>2011-02-28T16:26:24.130+01:00</updated><title type='text'>Going through the Argus Valley</title><content type='html'>Being an early adopter site for Argus, Oxford got one of the first MUPJ from ATLAS using glexec through Argus and it failed! although we were passing ops glexec tests for long. &lt;br /&gt;Our understanding of Argus was that it must have a policy which authorize  pilots to switch to a normal user, so I had a policy like this to authorize pilot for glexec&lt;br /&gt; &lt;br /&gt;resource "http://authz-interop.org/xacml/resource/resource-type/wn" {&lt;br /&gt;     obligation "http://glite.org/xacml/obligation/local-environment-map" {&lt;br /&gt;      }&lt;br /&gt;&lt;br /&gt;      action "http://glite.org/xacml/action/execute" {&lt;br /&gt;          rule permit { pfqan="/ops/Role=pilot" }&lt;br /&gt;          rule permit { pfqan="/atlas/Role=pilot" }&lt;br /&gt;          rule permit { pfqan="/cms/Role=pilot" }&lt;br /&gt;      }&lt;br /&gt; }   &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;After discussion with Argus experts on mailing list, it turned out that when pilot framework ask glexec to switch user from pilot to the effective user, LCMAP PEP plugin send the proxy of effective user to ARGUS server for authorization and mapping. So Argus must have policy which authorize effective user also. I have changed policy to look like that&lt;br /&gt;&lt;br /&gt;            rule permit {pfqan = "/atlas/Role=pilot" }&lt;br /&gt;            rule permit {pfqan = "/atlas/Role=lcgadmin" }&lt;br /&gt;            rule permit {pfqan = "/atlas/Role=production" }&lt;br /&gt;            rule permit {pfqan = "/atlas/" }&lt;br /&gt;&lt;br /&gt;It solved the problem.  Doesn't it look like that every atlas user is allowed to switch identity through glexec ? As for as Argus is concerned, yes. But glexec configuration is defined at WN and only groups which are whitelisted at /opt/glite/etc/glexec.conf are allowed to use glexec, any other user trying glexec will be shot down at WN itself. By default only pilot users are whitelisted at WN.&lt;br /&gt;So in nutshell, policies at Argus should resemble that of the CE.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-2012929240380349075?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/2012929240380349075/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=2012929240380349075&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2012929240380349075'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2012929240380349075'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2011/02/going-through-argus-valley.html' title='Going through the Argus Valley'/><author><name>Kashif Mohammad</name><uri>http://www.blogger.com/profile/15365806609014519714</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5524030900042011843</id><published>2011-01-07T14:47:00.000+01:00</published><updated>2011-01-07T14:48:43.097+01:00</updated><title type='text'>glite-APEL Node</title><content type='html'>On Thursday 9th December we brought the new glite-APEL box on line.&lt;br /&gt;&lt;br /&gt;The VM hosted by t2delltest, had already been installed and Kashif had installed the cert.&lt;br /&gt;&lt;br /&gt;We ran apel on all the ce's and t2torque02 and then one last time on t2mon02.&lt;br /&gt;&lt;br /&gt;Then  reconfigured t2ce02 to point at the new apel box and ran apel on it. We  saw new records created on the box. (After sorting some permissions  issues, need to rerun yaim with each ce (and t2torque02) set in the  site-info.def file. Each run did the magic to allow that node to write  to the db. (FQDN's should be used).&lt;br /&gt;We then changed the reference to  t2mon02 to t2apel01 in the site-info.def file on pplxconfig and it  propagated round the other nodes.&lt;br /&gt;The first run that night failed due to a java out of memory error.&lt;br /&gt;I tweaked the config file /opt/glite/etc/glite-apel-publisher/publisher-config-yaim.xml&lt;br /&gt;to&lt;br /&gt;&lt;br /&gt;  150000&lt;br /&gt;from the original 300000&lt;br /&gt;&lt;br /&gt;All apel logfiles on all ce's , t2torque02 and t2apel01 now appear to be good.&lt;br /&gt;Cristina can see records appearing at RAL.&lt;br /&gt;&lt;br /&gt;The old mysql database from t2mon02 has been backed up in /data/sysadmin (pplxfs2)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5524030900042011843?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5524030900042011843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5524030900042011843&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5524030900042011843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5524030900042011843'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2011/01/glite-apel-node.html' title='glite-APEL Node'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1793506079114150306</id><published>2010-09-09T16:14:00.001+01:00</published><updated>2010-09-09T16:16:27.317+01:00</updated><title type='text'>Tracing a Grid Job (A recap)</title><content type='html'>&lt;div class="post-header"&gt;  &lt;/div&gt;  Just in case we should forget how to trace a grid job I record some steps below.&lt;br /&gt;&lt;br /&gt;For example you discover via a&lt;a href="http://lxarda16.cern.ch/dashboard/request.py/latestresultssmry?siteSelect3=All%20Sites&amp;amp;serviceTypeSelect3=vo&amp;amp;sites=T3_UK_SGrid_Oxford&amp;amp;services=CE&amp;amp;services=SRMv2&amp;amp;tests=1301&amp;amp;tests=133&amp;amp;tests=111&amp;amp;tests=6&amp;amp;tests=1261&amp;amp;tests=76&amp;amp;tests=64&amp;amp;tests=20&amp;amp;tests=281&amp;amp;tests=882&amp;amp;tests=1321&amp;amp;exitStatus=all"&gt; CMS SAM page&lt;/a&gt; you are failing some test (could equally be any other SAM page such as&lt;a href="http://dashb-lhcb-sam.cern.ch/dashboard/request.py/latestresultssmry?siteSelect3=500&amp;amp;serviceTypeSelect3=0&amp;amp;sites=LCG.Oxford.uk&amp;amp;services=CE&amp;amp;services=CREAMCE&amp;amp;services=FTS&amp;amp;services=LFC_C&amp;amp;services=LFC_L&amp;amp;services=RB&amp;amp;services=SRMv2&amp;amp;services=VOBOX&amp;amp;services=gRB&amp;amp;tests=398&amp;amp;tests=404&amp;amp;tests=405&amp;amp;tests=406&amp;amp;tests=403&amp;amp;tests=407&amp;amp;tests=37624&amp;amp;tests=399&amp;amp;tests=2&amp;amp;tests=5&amp;amp;tests=7&amp;amp;tests=14&amp;amp;tests=25&amp;amp;tests=37732&amp;amp;tests=37700&amp;amp;tests=37703&amp;amp;tests=37710&amp;amp;tests=37715&amp;amp;tests=37760&amp;amp;tests=51&amp;amp;tests=50&amp;amp;tests=37638&amp;amp;tests=37553&amp;amp;tests=37554&amp;amp;tests=37555&amp;amp;tests=37636&amp;amp;tests=37637&amp;amp;tests=37556&amp;amp;tests=37557&amp;amp;tests=37643&amp;amp;tests=37399&amp;amp;exitStatus=all&amp;amp;table=true%22"&gt; LHCb&lt;/a&gt;) , you click on the detailed out put and see a reference to the job id:&lt;br /&gt;on t2ce05 contains the string: sOFavxScVKU-GbSYaCmx-A&lt;br /&gt;on t2ce05&lt;br /&gt;&lt;span style="font-style: italic;"&gt; grep sOFavxScVKU-GbSYaCmx-A /opt/edg/var/gatekeeper/grid-jobmap_20100906&lt;br /&gt;&lt;/span&gt;reveals the batch system job id: lrmsID=2998805.t2torque02.physics.ox.ac.uk&lt;br /&gt;on the batch server t2torque02 in our case, either:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;tracejob 2998805&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;grep 2998805 /var/spool/pbs/server_logs/20100909&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The tracejob option is easier!&lt;br /&gt;&lt;br /&gt;This  will let you know which worker node ran the job. You can then have a  look at it to check for full disks, memory faults etc or segfaults in  the log files......&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Now in reverse&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A job is misbehaving on your node and you need to see who is running it.&lt;br /&gt;The special case here is that its an ATLAS pilot job, this does not have a normal grid job id.&lt;br /&gt;&lt;br /&gt;Get the PID from top, use&lt;br /&gt;pstree -H pid&lt;br /&gt;to highlight the processes parents.&lt;br /&gt;(Use pstree -A -H pid if on an putty window on Windows)&lt;br /&gt;&lt;br /&gt;This reveals which pbs job it is&lt;br /&gt;eg 3020508.t2torque02.physics.ox.ac.uk&lt;br /&gt;&lt;br /&gt;The job can be traced on the &lt;a href="http://panda.cern.ch:25980/server/pandamon/query?"&gt;panda monitor&lt;/a&gt;, using the search facility on the LH toolbar.&lt;br /&gt;This  gives the job details including the users name. A GGUS ticket could  then be raised against ATLAS asking for the user to be informed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1793506079114150306?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1793506079114150306/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1793506079114150306&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1793506079114150306'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1793506079114150306'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2010/09/tracing-grid-job-recap.html' title='Tracing a Grid Job (A recap)'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3192290548303344439</id><published>2010-09-01T13:14:00.001+01:00</published><updated>2010-09-01T13:14:32.716+01:00</updated><title type='text'>APEL on ngsce-test</title><content type='html'>APEL was failing on ngsce-test with the following error.&lt;br /&gt;&lt;br /&gt;java.io.FileNotFoundException: /var/spool/pbs/server_priv/accounting/20090522 (Too many open files)&lt;br /&gt;&lt;br /&gt;The solution was to type:&lt;br /&gt;ulimit -n 10240&lt;br /&gt;&lt;br /&gt;I've added this to the /opt/glite/bin/apel-pbs-log-parser script.&lt;br /&gt;&lt;br /&gt;A fix is in test, so a new version of APEL will fix it.&lt;br /&gt;see GGUS ticket&lt;br /&gt;&lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=60674"&gt;https://gus.fzk.de/ws/ticket_info.php?ticket=60674&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3192290548303344439?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/3192290548303344439/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=3192290548303344439&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3192290548303344439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3192290548303344439'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2010/09/apel-on-ngsce-test.html' title='APEL on ngsce-test'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7809965026465196693</id><published>2010-08-27T11:54:00.002+01:00</published><updated>2010-08-27T12:24:38.993+01:00</updated><title type='text'>Argus Server at Oxford</title><content type='html'>We finally managed to install Argus server at Oxford with messy workaround. Installation and configuration was reasonably ok, and once policy structure was clear then writing and loading policy was also easy. Details are here http://www.gridpp.ac.uk/wiki/Oxford.&lt;br /&gt;&lt;br /&gt;The main issue was host certificate issued by UK CA which contains an "emailAddress" and supposedly this is depreciated year(s) ago and most developers assume that there is no "emailAddress" in host certificate. Although still it is a bug in Argus and hopefully would be resolved in next release.&lt;br /&gt;So the workaround&lt;br /&gt;By default pap-admin command uses host certificate in /etc/grid-security/ if started from root but since there is a problem with host certificate so I copied my personal certificate proxy from UI and started pap-admin using that proxy. Then added ACE &lt;br /&gt;pap-admin ace&lt;br /&gt;"/C=UK/O=eScience/OU=Oxford/L=OeSC/CN=t2argus02.physics.ox.ac.uk/OID.1.2.840.113549.1.9.1=lcg_manager@physics.ox.ac.uk" ALL&lt;br /&gt;This workaround was suggested by Andrea Ceccanti &lt;br /&gt;&lt;br /&gt;The only issue is that if you want to restart pap service then first remove ACE using remove-ace command, restart pap and then add ACE again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7809965026465196693?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7809965026465196693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7809965026465196693&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7809965026465196693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7809965026465196693'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2010/08/argus-server-at-oxford.html' title='Argus Server at Oxford'/><author><name>Kashif Mohammad</name><uri>http://www.blogger.com/profile/15365806609014519714</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8394038291116845184</id><published>2010-06-23T15:03:00.005+01:00</published><updated>2010-09-09T16:20:05.741+01:00</updated><title type='text'>Oxford's blanking panels</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pogTNV-B63A/TCIUZyyBh4I/AAAAAAAABEk/Uw8ZAOc2dmc/s1600/dsc_6099.jpg"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 320px; height: 214px;" src="http://1.bp.blogspot.com/_pogTNV-B63A/TCIUZyyBh4I/AAAAAAAABEk/Uw8ZAOc2dmc/s320/dsc_6099.jpg" alt="" id="BLOGGER_PHOTO_ID_5485969729451558786" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Having just read Stuart's ScotGrid blog post about cooling in the top of  racks I thought I'd let you know about the panels we use.&lt;br /&gt;&lt;br /&gt;We  have been specifying that all empty racks slots should be filled by  blanking panels since our 2007 purchase. The they used to use metal blanking  panels.&lt;br /&gt;&lt;br /&gt;These days they tend to supply the 1U APC plastic clip in  panels, as can be seen in the RH rack in the photo.&lt;br /&gt;These cost  £25-£30 per pack of 10 but we managed to get a bulk (200) purchase in  2008 which worked out at about £1.69 each.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.apc.com/resource/include/techspec_index.cfm?base_sku=AR8136BLK200"&gt;&lt;span style="text-decoration: underline;"&gt;http://www.apc.com/resource/include/techspec_index.cfm?base_sku=AR8136BLK200&lt;/span&gt;&lt;/a&gt;&lt;a href="http://www.pcwb.co.uk/catalogue/item/APC8136?cidp=Froogle"&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8394038291116845184?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8394038291116845184/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8394038291116845184&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8394038291116845184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8394038291116845184'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2010/06/having-just-read-stuarts-scotgrid-blog.html' title='Oxford&apos;s blanking panels'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_pogTNV-B63A/TCIUZyyBh4I/AAAAAAAABEk/Uw8ZAOc2dmc/s72-c/dsc_6099.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5257875138843670170</id><published>2010-05-18T12:44:00.003+01:00</published><updated>2010-05-18T13:18:54.719+01:00</updated><title type='text'>Jobs with analysis role</title><content type='html'>It started with a ticket from dzero about job failure at creamce at oxford. On investigation it was found that these jobs were coming with &lt;span class="solution"&gt; /dzero/users/Role=analysis/Capability=NULL and expectantly   lcmaps failing with this error  "no entry found for /dzero/users/Role=NULL/Capability=NULL ".&lt;br /&gt;But the jobs from the same user were running on lcg-CE so on further investigation it turn out that lcmaps-voms plugins were failing on lcg-CE too but as per lcmaps policy it runs lcmaps-poolacount plugin after voms plugin failure and lcmaps-poolaccount uses individual DN mapping from  grid-mapfile. So lcg-CE was mapping correctly to dzero pool account but through wrong procedure.&lt;br /&gt;creamce don't use edg-mkgridmap file for creating grid-mapfile so  no individual mapping is defined in grid-mapfile.&lt;br /&gt;Solution was quite easy and we have to just define MAP_WILDCARDS=yes in vo.d/dzero and rerunning yaim created a slightly different grid-mapfile and groupmapfile with wild-cards.&lt;br /&gt;&lt;br /&gt;dzero/Role=lcgadmin/Capability=NULL" dzerosgm&lt;br /&gt;"/dzero/Role=lcgadmin" dzerosgm&lt;br /&gt;"/dzero/Role=production/Capability=NULL" dzeroprd&lt;br /&gt;"/dzero/Role=production" dzeroprd&lt;br /&gt;"/dzero/*/Role=*" .dzero&lt;br /&gt;"/dzero/*" .dzero&lt;br /&gt;"/dzero/Role=NULL/Capability=NULL" .dzero&lt;br /&gt;"/dzero" .dzero&lt;br /&gt;&lt;br /&gt;So any job coming with different Role would be mapped to normal pool account.&lt;br /&gt;The issue was discussed in this ticket https://savannah.cern.ch/bugs/index.php?26990&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5257875138843670170?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5257875138843670170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5257875138843670170&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5257875138843670170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5257875138843670170'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2010/05/jobs-with-analysis-role.html' title='Jobs with analysis role'/><author><name>Kashif Mohammad</name><uri>http://www.blogger.com/profile/15365806609014519714</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5588505910833921468</id><published>2009-11-07T12:25:00.004+01:00</published><updated>2009-11-07T13:10:37.918+01:00</updated><title type='text'>A week of upgrades for the RAL Tier 2 - Part 1 -The Network</title><content type='html'>Well it has been a long week at the RAL Tier 2. We've finally had our much postponed downtime to update our dCache installation (delayed once when one of the disk servers got a corrupt filesystem, then to avoid a CMS analysis test and finally to avoid an Atlas analysis test). The delays, however, did mean we could also include the long planned network upgrade in the downtime - this was probably a good thing.&lt;br /&gt;&lt;br /&gt;So we had quite a programme of work for a five day downtime:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Replace the PNFS namespace in dCache with Chimera&lt;/li&gt;&lt;li&gt;Update dCache from 1.9.1 to the "Golden Release" 1.9.5&lt;/li&gt;&lt;li&gt;Install a new network switch and set up a 10Gb/s link between the two halves of our farm&lt;/li&gt;&lt;/ol&gt;Indeed, heading into work on Friday with neither dCache nor the network working, I thought I would be extending the downtime into the next week but by lunchtime things had improved and we were able to come out of the downtime on time at 5pm - although despite a full suit of "OK" SAM tests GridView still has us down until nearly eight o'clock.&lt;br /&gt;&lt;br /&gt;Taking the last of the upgrades first: before last week we had the two halves of our farm in two different rooms. Each half of the farm has it's own Nortel 55XX network stack. Most of the storage is in the room known as Lab 8 in the R1 office building with a 10Gb/s connection to site Router A, whilst most of the compute nodes are in the Atlas lower machine room, A5Lower, with a 2x1Gb/s connection to Site Router A. That 2x1Gb/s connection between the storage and compute nodes was our main bottleneck - it would regularly run at over 99% capacity for days during Atlas Hammercloud tests.&lt;br /&gt;&lt;br /&gt;The Plan was to install a Nortel 5650 switch into the stack in A5Lower then set up a direct 10Gb/s fibre link from there to Lab 8 - cutting out the 2x1GB/s link and Router A. That sounded fairly trivial and when I went down with Networking on Thursday afternoon to set it up I expected to be back in a hour to carry on struggling with our, at that time, broken dCache.&lt;br /&gt;&lt;br /&gt;Due to cabling issues we had to re-order the switches in the stack and I also had to swap out a 5510 I had borrowed from the Tier 1 and replace it with a new one. So we broke up the current stack and tried to stack the 5650 with one of the 5510s. According to everything we had read they should have see each other, the 5650 should have downloaded an updated version of the firmware and software to the older 5510 and then the should have joined together as a single switch. But ours did not talk to each other.&lt;br /&gt;&lt;br /&gt;Well possibly the version of the software on the 5510s was too old, so we went to each switch in turn, set it up with an IP address downloaded a new version on the firmware and software and restarted it.&lt;br /&gt;&lt;br /&gt;By the end of Thursday - we were more-or-less back where we had started - we had a stack of 5510s (still without the 5650) .&lt;br /&gt;&lt;br /&gt;On Friday morning Nick found a setting on the 5650 to allow "hybid stack mode" and suddenly everything worked.&lt;br /&gt;&lt;br /&gt;We soon had all the correct VLANs set up and the two halves of our network were talking over the new fast link.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5588505910833921468?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5588505910833921468/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5588505910833921468&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5588505910833921468'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5588505910833921468'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2009/11/week-of-upgrades-for-ral-tier-2-part-1.html' title='A week of upgrades for the RAL Tier 2 - Part 1 -The Network'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8849314625156380676</id><published>2009-10-20T09:33:00.003+01:00</published><updated>2009-10-21T13:21:52.997+01:00</updated><title type='text'>Backing up MySQL databases</title><content type='html'>Oxford have installed a simple script to backup the DPM mysql db once a day at 6am.&lt;br /&gt;The script was loosely based on Glasgow's example &lt;a href="http://www.gridpp.ac.uk/wiki/MySQL_Backups"&gt;here &lt;/a&gt; .&lt;br /&gt;&lt;br /&gt;In order to restrict the file names produced to just 7, I've opted to use the current day rather than date.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;[root@t2se01 ~]# cat /root/mysql-dump-pdg.pl&lt;br /&gt;#!/usr/bin/perl&lt;br /&gt;#&lt;br /&gt;# Loosely based on the Glasgow script but simplified.&lt;br /&gt;#&lt;br /&gt;# Select the current day only as we want to have just seven unique file names which will be overwritten&lt;br /&gt;# thus reducing the total backup size.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;@weekDays = qw(Sunday Monday Tuesday Wednesday Thursday Friday Saturday);&lt;br /&gt;($second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $dayOfWeek, $dayOfYear, $daylightSavings) = localtime();&lt;br /&gt;$theTime = "$weekDays[$dayOfWeek]";&lt;br /&gt;#print $theTime;&lt;br /&gt;&lt;br /&gt;$backup_dir="/var/lib/mysqldumps";&lt;br /&gt;$mysql_user="root";&lt;br /&gt;$mysql_pw_file="/root/mysql-pw";&lt;br /&gt;$keep_days=7;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;# Read mysql password&lt;br /&gt;open(PW, $mysql_pw_file) || die "Failed to open password file $mysql_pw_file: $!\n";&lt;br /&gt;$mysql_pw=&lt;pw&gt;;&lt;br /&gt;chomp $mysql_pw;&lt;br /&gt;close PW;&lt;br /&gt;&lt;br /&gt;# Dump the db now&lt;br /&gt;chdir $backup_dir || die "Failed to change to backup directory $backup_dir: $!\n";&lt;br /&gt;&lt;br /&gt;system "/usr/bin/mysqldump --user=$mysql_user --password=$mysql_pw --opt --all-databases | gzip -c &gt; mysql-dump-$theTime.sql.gz";&lt;br /&gt;die "Mysql failed died with exit code $?\n" if $? != 0;&lt;br /&gt;&lt;br /&gt;&lt;/pw&gt;&lt;/span&gt;&lt;/span&gt;This is run by &lt;span style="font-weight: bold;"&gt;/etc/cron.d/mysql-dump&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;PATH=/sbin:/bin:/usr/sbin:/usr/bin&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;0 6 * * * root /root/mysql-dump-pdg.pl&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So far it seems to work in testing!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8849314625156380676?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8849314625156380676/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8849314625156380676&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8849314625156380676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8849314625156380676'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2009/10/backing-up-mysql-databases.html' title='Backing up MySQL databases'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5541911723638568023</id><published>2009-10-19T10:35:00.002+01:00</published><updated>2009-10-19T10:39:48.510+01:00</updated><title type='text'>Oxford Grid now SL5</title><content type='html'>All but one worker node on the Oxford Grid site has been reinstalled running SL5.&lt;br /&gt;Currently these are served by one ce, t2ce05, but more will be added shortly to offer resilience.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5541911723638568023?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5541911723638568023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5541911723638568023&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5541911723638568023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5541911723638568023'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2009/10/oxford-grid-now-sl5.html' title='Oxford Grid now SL5'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1484892950894989371</id><published>2009-10-14T10:49:00.004+01:00</published><updated>2009-10-20T10:10:57.845+01:00</updated><title type='text'>Quarterly Report DPM script</title><content type='html'>Each quarter we needs to report on disk usage at our sites.&lt;br /&gt;This can be tricky but the following script will help at DPM sites:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;#!/bin/bash&lt;br /&gt;&lt;br /&gt;DAY=`date +%F`&lt;br /&gt;echo $DAY&lt;br /&gt;for zz in `dpns-ls /dpm/physics.ox.ac.uk/home/`;do&lt;br /&gt;dpns-du -z -s /dpm/physics.ox.ac.uk/home/$zz&gt;&gt;Oxford-SE-Usage-$DAY;&lt;br /&gt;done&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;You will need to modify it appropriately for your site.&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Extra added 20.10.09&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;This makes use of the dpns-du command in the gridpp-dpm toolkit available from :&lt;br /&gt;&lt;a href="http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/"&gt;http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Details of the other commands are on the &lt;a href="http://www.gridpp.ac.uk/wiki/DPM-admin-tools"&gt;wiki&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1484892950894989371?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1484892950894989371/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1484892950894989371&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1484892950894989371'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1484892950894989371'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2009/10/quarterly-report-dpm-script.html' title='Quarterly Report DPM script'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-83204316038716753</id><published>2009-03-05T10:47:00.004+01:00</published><updated>2009-03-05T10:53:06.127+01:00</updated><title type='text'>120 new cores for EFDA-JET</title><content type='html'>30 new Sunfire 2200 m2 servers have been incorporated into the EFDA-JET site.  Each has dual processor dual core Opteron   2218 processors, so that increases the number of Worker Nodes cores by 120 up to 254.   Each node has 8GB RAM.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-83204316038716753?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/83204316038716753/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=83204316038716753&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/83204316038716753'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/83204316038716753'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2009/03/120-new-cores-for-efda-jet.html' title='120 new cores for EFDA-JET'/><author><name>David Robson  (EFDA-JET)</name><uri>http://www.blogger.com/profile/18400364515735153027</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8593612509908426582</id><published>2009-02-26T13:49:00.002+01:00</published><updated>2009-02-26T14:08:25.497+01:00</updated><title type='text'>CMS at Oxford</title><content type='html'>Oxford was failing a ce CMS SAM test with a warning, probably due to some permissions problems in the se.&lt;br /&gt;Following commands illuminated things:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;This extract from /var/log/dpm/log&lt;br /&gt;02/26 10:49:30  3869,24 dpm_srv_proc_put: processing request c75ce541-b2cd-4bdc-bf8f-c86ecb0be6ed from /C=UK/O=eScience/OU=CLRC/L=RAL/CN=chris cms brew&lt;br /&gt;02/26 10:49:30  3869,24 dpm_srv_proc_put: calling Cns_stat&lt;br /&gt;02/26 10:49:30  3869,24 dpm_srv_proc_put: calling Cns_creatx&lt;br /&gt;02/26 10:49:30  3869,24 dpm_srv_proc_put: srm://t2se01.physics.ox.ac.uk:8446/srm/managerv2?SFN=/dpm/physics.ox.ac.uk/home/cms/store/user/test/oneEvt.root: DPM_FAILED (Permission denied)&lt;br /&gt;02/26 10:49:30  3869,24 dpm_srv_proc_put: returns 0, status=DPM_FAILED (Permission denied)&lt;br /&gt;&lt;br /&gt;Shows the test file creation failing&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/&lt;br /&gt;drwxrwxr-x   1 24135    1399                      0 Jan 13 18:55 PhEDEx_Debug&lt;br /&gt;drwxrwxr-x   2 24135    3490                      0 Oct 13 12:15 PhEDEx_LoadTest07&lt;br /&gt;drwxrwxr-x   0 24135    1399                      0 Feb 26 12:20 brew&lt;br /&gt;drwxrwxr-x   2 24135    1399                      0 Jan 27 15:16 mc&lt;br /&gt;drwxrwxr-x   2 24351    3422                      0 Feb 06 18:57 unmerged&lt;br /&gt;drwxrwxr-x   1 24352    3406                      0 Jan 21 18:21 user&lt;br /&gt;[root@t2se01 dpm]# dpns-listgrpmap |grep 1399&lt;br /&gt;    1399 cms&lt;br /&gt;[root@t2se01 dpm]# dpns-listgrpmap |grep 3406&lt;br /&gt;    3406 cms/Role=lcgadmin&lt;br /&gt;[root@t2se01 dpm]# dpns-getacl /dpm/physics.ox.ac.uk/home/cms/store/&lt;br /&gt;# file: /dpm/physics.ox.ac.uk/home/cms/store/&lt;br /&gt;# owner: /C=UK/O=eScience/OU=CLRC/L=RAL/CN=chris cms brew&lt;br /&gt;# group: cms/Role=cmst1admin&lt;br /&gt;user::rwx&lt;br /&gt;group::rwx              #effective:rwx&lt;br /&gt;group:cms/Role=lcgadmin:rwx             #effective:rwx&lt;br /&gt;group:cms/Role=production:rwx           #effective:rwx&lt;br /&gt;mask::rwx&lt;br /&gt;other::r-x&lt;br /&gt;default:user::rwx&lt;br /&gt;default:group::rwx&lt;br /&gt;default:group:cms/Role=lcgadmin:rwx&lt;br /&gt;default:group:cms/Role=production:rwx&lt;br /&gt;default:mask::rwx&lt;br /&gt;default:other::r-x&lt;br /&gt;[root@t2se01 dpm]# dpns-getacl /dpm/physics.ox.ac.uk/home/cms/store/brew&lt;br /&gt;# file: /dpm/physics.ox.ac.uk/home/cms/store/brew&lt;br /&gt;# owner: /C=UK/O=eScience/OU=CLRC/L=RAL/CN=chris cms brew&lt;br /&gt;# group: cms&lt;br /&gt;user::rwx&lt;br /&gt;group::rwx              #effective:rwx&lt;br /&gt;group:cms/Role=lcgadmin:rwx             #effective:rwx&lt;br /&gt;group:cms/Role=production:rwx           #effective:rwx&lt;br /&gt;mask::rwx&lt;br /&gt;other::r-x&lt;br /&gt;default:user::rwx&lt;br /&gt;default:group::rwx&lt;br /&gt;default:group:cms/Role=lcgadmin:rwx&lt;br /&gt;default:group:cms/Role=production:rwx&lt;br /&gt;default:mask::rwx&lt;br /&gt;default:other::r-x&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/&lt;br /&gt;drwxrwxr-x   1 24135    1399                      0 Jan 13 18:55 PhEDEx_Debug&lt;br /&gt;drwxrwxr-x   2 24135    3490                      0 Oct 13 12:15 PhEDEx_LoadTest07&lt;br /&gt;drwxrwxr-x   0 24135    1399                      0 Feb 26 12:20 brew&lt;br /&gt;drwxrwxr-x   2 24135    1399                      0 Jan 27 15:16 mc&lt;br /&gt;drwxrwxr-x   2 24351    3422                      0 Feb 06 18:57 unmerged&lt;br /&gt;drwxrwxr-x   1 24352    3406                      0 Jan 21 18:21 user&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/user&lt;br /&gt;drwxrwxr-x   1 24352    3406                      0 Jan 21 18:21 test&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/user/test&lt;br /&gt;drwxrwxr-x   1 24352    3406                      0 Jan 21 18:21 SAM-t2se01.physics.ox.ac.uk&lt;br /&gt;[root@t2se01 dpm]# dpns-chgrp 1399 /dpm/physics.ox.ac.uk/home/cms/store/user&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/user&lt;br /&gt;drwxrwxr-x   1 24352    3406                      0 Jan 21 18:21 test&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/&lt;br /&gt;drwxrwxr-x   1 24135    1399                      0 Jan 13 18:55 PhEDEx_Debug&lt;br /&gt;drwxrwxr-x   2 24135    3490                      0 Oct 13 12:15 PhEDEx_LoadTest07&lt;br /&gt;drwxrwxr-x   1 24135    1399                      0 Feb 26 12:33 brew&lt;br /&gt;drwxrwxr-x   2 24135    1399                      0 Jan 27 15:16 mc&lt;br /&gt;drwxrwxr-x   2 24351    3422                      0 Feb 06 18:57 unmerged&lt;br /&gt;drwxrwxr-x   1 24352    1399                      0 Jan 21 18:21 user&lt;br /&gt;[root@t2se01 dpm]# dpns-chgrp 1399 /dpm/physics.ox.ac.uk/home/cms/store/user/test&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/brew&lt;br /&gt;-rw-rw-r--   1 24135    1399                4788418 Feb 26 12:34 oneEvt.root&lt;br /&gt;[root@t2se01 dpm]# dpns-ls -l /dpm/physics.ox.ac.uk/home/cms/store/user/test&lt;br /&gt;drwxrwxr-x   1 24352    3406                      0 Jan 21 18:21 SAM-t2se01.physics.ox.ac.uk&lt;br /&gt;-rw-rw-r--   1 24135    1399                4788418 Feb 26 12:36 oneEvt.root&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8593612509908426582?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8593612509908426582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8593612509908426582&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8593612509908426582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8593612509908426582'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2009/02/cms-at-oxford.html' title='CMS at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6933683899870303231</id><published>2008-12-19T10:50:00.003+01:00</published><updated>2008-12-19T11:13:33.367+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='torque nfs problem'/><title type='text'>Automount problems on torque server</title><content type='html'>We've been having a few problems with our torque server failing to automout disks randomly.&lt;br /&gt;&lt;br /&gt;Most of the time the mounts succeeded but occasionally they would fail with just:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Dec 19 08:05:06 heplnx201 kernel: RPC: error 5 connecting to server nfsserver&lt;br /&gt;Dec 19 08:05:06 heplnx201 automount[23438]: &gt;&gt; mount: nfsserver:/opt/ppd/mount: can't read superblock&lt;br /&gt;Dec 19 08:05:06 heplnx201 automount[23438]: mount(nfs): nfs: mount failure nfsserver:/opt/ppd/mount on /net/mount&lt;br /&gt;Dec 19 08:05:06 heplnx201 automount[23438]: failed to mount /net/mount&lt;br /&gt;Dec 19 08:05:07 heplnx201 kernel: RPC: Can't bind to reserved port (98).&lt;br /&gt;Dec 19 08:05:07 heplnx201 kernel: RPC: can't bind to reserved port.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;With the wonders of Google I was able to find out that error 98 is address in use and that what is going on is that the client is unable to find a free port in it's port range to initiate the connection to the server.&lt;br /&gt;&lt;br /&gt;The culprit seems to be torque, which when I checked with a &lt;span style="font-family: courier new;"&gt;netstat -a&lt;/span&gt; was using very single port from 600 to 1023, which quite neatly overlaid the nfs client port range of 600-1023.&lt;br /&gt;&lt;br /&gt;Here Google failed me and I was unable to find anyway to limit the port range used by torque.&lt;br /&gt;&lt;br /&gt;So for now I've taken the quick option of extending the nfs client port range down to port 300 with:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;echo 300 &gt; /proc/sys/sunrpc/min_resvport&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I think I'd like to move the nfs client port range out of the priveledged port range altogether. I think this should be possible, the RFC says that it SHOULD use a port below 1023 but MAY use a higher port, but I'd like to test it a bit before I configure a major server like that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6933683899870303231?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6933683899870303231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6933683899870303231&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6933683899870303231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6933683899870303231'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/12/automount-problems-on-torque-server.html' title='Automount problems on torque server'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4960919526692576459</id><published>2008-12-19T10:41:00.002+01:00</published><updated>2008-12-19T10:44:53.780+01:00</updated><title type='text'>static-file-Cluster.ldif edit required post yaim at Oxford</title><content type='html'>Every time we run yaim at Oxford we have to fix the number of cpu's in our cluster by hand.&lt;br /&gt;on t2ce02:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:85%;"&gt;diff static-file-Cluster.ldif-fixed /opt/glite/etc/gip/ldif/static-file-Cluster.ldif&lt;br /&gt;64c64&lt;br /&gt;&lt; GlueSubClusterPhysicalCPUs: 384&lt;br /&gt;---&lt;br /&gt;&gt; GlueSubClusterPhysicalCPUs: 2&lt;br /&gt;[root@t2ce02 ~]# cp static-file-Cluster.ldif-fixed /opt/glite/etc/gip/ldif/static-file-Cluster.ldif&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;On t2ce04:&lt;br /&gt;Physical cpu's needs to be 74. After the change the ldap query shows:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt; ldapsearch -x -H ldap://t2bdii01.physics.ox.ac.uk:2170 -b Mds-vo-name=UKI-SOUTHGRID-OX-HEP,o=grid|grep -i physicalcpu&lt;br /&gt;GlueSubClusterPhysicalCPUs: 74&lt;br /&gt;GlueSubClusterPhysicalCPUs: 384&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4960919526692576459?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4960919526692576459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4960919526692576459&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4960919526692576459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4960919526692576459'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/12/static-file-clusterldif-edit-required.html' title='static-file-Cluster.ldif edit required post yaim at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6393589651847241240</id><published>2008-12-16T16:30:00.008+01:00</published><updated>2008-12-17T14:12:26.697+01:00</updated><title type='text'>EFDA-JET Service nodes upgraded to glite 3.1</title><content type='html'>We upgraded our service nodes to Scientific Linux 4.7 and glite-3.1.  The worker nodes had been upgraded earlier. The  problems/issues we had while upgrading  to Scientific Linux 4.7 are listed below:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Storage Engine&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;While installing the SE glite middleware (glite-SE_dpm_mysql), there was&lt;br /&gt;a missing dependency issue for the perl-SOAP-Lite package.&lt;br /&gt;&lt;br /&gt;Error: Missing Dependency: perl-SOAP-Lite &gt;= 0.67 is needed by package&lt;br /&gt;gridview-wsclient-common&lt;br /&gt;&lt;br /&gt;Doing a&lt;br /&gt;&lt;br /&gt;# yum install perl-SOAP-Lite&lt;br /&gt;&lt;br /&gt;only installs perl-SOAP-Lite-0.65, which is lower than the version needed.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;The perl-SOAP-Lite rpm was downloaded from a different repository. We&lt;br /&gt;initially downloaded the perl-SOAP-Lite-0.67.el4 but this one failed to install as it needed MQSeries and other packages to be installed. We finally downloaded perl-SOAP-Lite-0.67-1.1.fc1.rf.noarch.rpm and it installed without any problems.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;When the node was configured by yaim, the following error was obtained&lt;br /&gt;&lt;br /&gt;sed: can't read /opt/bdii/etc/schemas: No such file or directory&lt;br /&gt;&lt;br /&gt;The file /opt/bdii/etc/schemas was missing. The fix is to copy the schemas.example file to schemas&lt;br /&gt;&lt;br /&gt;# cp -i /opt/bdii/doc/schemas.example /opt/bdii/etc/schemas&lt;br /&gt;&lt;br /&gt;First SAM test failed.  lcg-lr was missing, we needed to install lcg_util.&lt;br /&gt;This installed a new version of lcg_util that was on the other nodes. lcg_util&lt;br /&gt;was then updated on all the nodes.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Compute Element (&amp;amp; site BDII)&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We run the compute element service and the site BDII service on the same node.&lt;br /&gt;&lt;br /&gt;While installing the glite-BDII packages, we obtained the following dependency errors.&lt;br /&gt;&lt;br /&gt;Error: Missing Dependency: glite-info-provider-ldap = 1.1.0-1 is needed by package glite-BDII&lt;br /&gt;Error: Missing Dependency: glue-schema = 1.3.0-3 is needed by package glite-BDII&lt;br /&gt;Error: Missing Dependency: bdii = 3.9.1-5 is needed by package glite-BDII&lt;br /&gt;&lt;br /&gt;Using yum to install the missing packages, installs these packages at a higher level and still causes the installation of glite-BDII packages to fail, as it needs these packages at the versions listed above. These packages were instead installed by hand. A GGUS ticket (Ticket-ID: 42456), which suggested that this problem is fixed in the latest release (update 34).&lt;br /&gt;&lt;br /&gt;As with the SE install above, we had the same problem with the schemas file, missing. The above fix was repeated here.&lt;br /&gt;&lt;br /&gt;When running yaim, we had the following errors,&lt;br /&gt;&lt;br /&gt;grep: a: No such file or directory&lt;br /&gt;grep: VO: No such file or directory&lt;br /&gt;grep: or: No such file or directory&lt;br /&gt;grep: a: No such file or directory&lt;br /&gt;grep: VOMS: No such file or directory&lt;br /&gt;grep: FQAN: No such file or directory&lt;br /&gt;grep: as: No such file or directory&lt;br /&gt;grep: an: No such file or directory&lt;br /&gt;grep: argument: No such file or directory&lt;br /&gt;qmgr: Syntax error - cannot locate attribute&lt;br /&gt;set queue lhcb acl_groups += /opt/glite/yaim/bin/yaim: supply a VO or a VOMS FQAN as an argument&lt;br /&gt;&lt;br /&gt;To fix it we edited the file /opt/glite/yaim/functions/utils/users_getvogroup and commented out&lt;br /&gt;&lt;br /&gt;#echo "$0: supply a VO or a VOMS FQAN as an argument"&lt;br /&gt;&lt;br /&gt;On Gstat web monitoring page, it was being reported that the SE service  was missing ('SE missing in Gstat service'). To fix this problem, we edited the file /opt/bdii/etc/bdii-update.conf and add the following line for our SE.&lt;br /&gt;&lt;br /&gt;SE ldap://grid001.jet.efda.org:2170/mds-vo-name=resource,o=grid&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Mon Box&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When running yaim, we had the following errors&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Problem starting rgma-servicetool&lt;br /&gt;&lt;br /&gt;Starting rgma-servicetool:                                 [FAILED]&lt;br /&gt;For more details check /var/log/glite/rgma-servicetool.log&lt;br /&gt;Stopping rgma-gin:                                         [  OK  ]&lt;br /&gt;Starting rgma-gin:                                         [FAILED]&lt;br /&gt;&lt;br /&gt;Fixed by defining a new java by adding the following to the site-info.def&lt;br /&gt;&lt;br /&gt;HOSTNAME=`hostname`&lt;br /&gt;if [ "$HOSTNAME" == "$MON_HOST" ] ; then&lt;br /&gt;JAVA_LOCATION="/usr/lib/jvm/jre-1.5.0-sun"&lt;br /&gt;else&lt;br /&gt;JAVA_LOCATION="/usr/java/j2sdk1.4.2_12"&lt;br /&gt;fi&lt;br /&gt;&lt;br /&gt;We had the same 'schemas' file missing problem here as well.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Networking &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;EFDA-JET has a slightly unusually set up as we are restricted to a small number of external IP addresses.  All nodes are on the same LAN with private IP addresses, whilst the service nodes also have external addresses.  In the hosts files on the service nodes, all service nodes are referenced by their external addresses, whilst on the worker nodes, the service nodes are referenced by their private addresses.&lt;br /&gt;&lt;br /&gt;This worked well for glite 3.0, but not for glite 3.1, where we saw clients on the worker nodes trying to contact the service nodes via their external addresses. It looks like glite 3.1 iservices are passing IP addresses for clients to be call back on at a later time.  The complete solution was to run iptables on the worker nodes and NAT translate outgoing connections for external addresses of the service nodes to their corresponding internal addresses. This was done by adding the following to /etc/rc.local on the worker nodes.&lt;br /&gt;&lt;br /&gt;/sbin/service iptables start&lt;br /&gt;/sbin/iptables -A OUTPUT  -t nat -d &amp;lt;CE-ext-addr&amp;gt; -j DNAT \&lt;br /&gt;--to-destination &amp;lt;CE-int-addr&amp;gt;&lt;br /&gt;/sbin/iptables -A OUTPUT  -t nat -d &amp;lt;SE-ext-addr&amp;gt; -j DNAT \&lt;br /&gt;--to-destination &amp;lt;SE-int-addr&amp;gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6393589651847241240?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6393589651847241240/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6393589651847241240&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6393589651847241240'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6393589651847241240'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/12/efda-jet-services-nodes-upgraded-to.html' title='EFDA-JET Service nodes upgraded to glite 3.1'/><author><name>David Robson  (EFDA-JET)</name><uri>http://www.blogger.com/profile/18400364515735153027</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4605462625128128792</id><published>2008-12-04T17:22:00.003+01:00</published><updated>2008-12-04T17:57:40.837+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dCache'/><category scheme='http://www.blogger.com/atom/ns#' term='byChris'/><title type='text'>dCache Update</title><content type='html'>We updated dCache this morning to 1.9.0. Now that sounds like a major jump but reading the release notes it is only a minor step up from the 18.0.15pX series of releases.&lt;br /&gt;&lt;br /&gt;The upgrade itself was trivial, just installing the new dcache-server rpm and running install.sh across all the nodes.&lt;br /&gt;&lt;br /&gt;We also took the opportunity to update the version of Postgresql on the head node from 8.3.1 to 8.3.5 using rpms from &lt;a href="http://yum.pgsqlrpms.org/8.3/redhat/"&gt;pgsqlrpms.org&lt;/a&gt;. I'm hoping that I will now be able to use their prebuilt slony-1 rpm to set up master slave mirroring of the databases from the dCache head node to a live mirror node.&lt;br /&gt;&lt;br /&gt;Finally we updated the SL version of all the dCache nodes to SL4.6 from a mix of SL4.4, SL4.5 and SL4.6. We're now using the SL-Contrib xfs kernel modules on all nodes and the Araca drivers complied into the 2.6.9-78 series of kernels on all nodes with Areca raid cards rather than our own builds and have had no issues.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4605462625128128792?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4605462625128128792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4605462625128128792&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4605462625128128792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4605462625128128792'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/12/dcache-update.html' title='dCache Update'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4769913079172158510</id><published>2008-10-15T15:45:00.002+01:00</published><updated>2008-10-15T15:48:56.818+01:00</updated><title type='text'>Fix ACls on ATLASLOCALGROUPDISK at Oxford</title><content type='html'>&lt;span style="font-size:100%;"&gt;Today I ran Graeme's script to fix the acls on the ATLASLOCALGROUPDISK space token&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-size:85%;"&gt;.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;Should have done this a few weeks ago but ..&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;There is nothing currntly stored here yet.&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;[root@t2se01 ~]# ./atlas-uk-local-dpm-token-fix.sh&lt;br /&gt;Debug: t2se01.physics.ox.ac.uk - physics.ox.ac.uk - atlaslocalgroupdisk&lt;br /&gt;Fixing permissions on /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk...&lt;br /&gt;Searching /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk...&lt;br /&gt;&lt;br /&gt;dpns-ls /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk&lt;br /&gt;shows nothing&lt;br /&gt;&lt;br /&gt; dpns-getacl /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk&lt;br /&gt;# file: /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk&lt;br /&gt;# owner: root&lt;br /&gt;# group: atlas/uk&lt;br /&gt;user::rwx&lt;br /&gt;group::rwx              #effective:rwx&lt;br /&gt;group:atlas/Role=lcgadmin:rwx           #effective:rwx&lt;br /&gt;group:atlas/Role=production:rwx         #effective:rwx&lt;br /&gt;group:atlas/uk:rwx              #effective:rwx&lt;br /&gt;mask::rwx&lt;br /&gt;other::r-x&lt;br /&gt;default:user::rwx&lt;br /&gt;default:group::rwx&lt;br /&gt;default:group:atlas/Role=lcgadmin:rwx&lt;br /&gt;default:group:atlas/Role=production:rwx&lt;br /&gt;default:group:atlas/uk:rwx&lt;br /&gt;default:mask::rwx&lt;br /&gt;default:other::r-x&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4769913079172158510?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4769913079172158510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4769913079172158510&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4769913079172158510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4769913079172158510'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/10/fix-acls-on-atlaslocalgroupdisk-at.html' title='Fix ACls on ATLASLOCALGROUPDISK at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5150546295964070728</id><published>2008-10-09T12:33:00.002+01:00</published><updated>2008-10-09T12:40:20.886+01:00</updated><title type='text'>SouthGrid update</title><content type='html'>The Birmingham site has suffered some reliability problems caused by Site Networking problems.&lt;br /&gt;Physics are working with central IS to resolve these issues.&lt;br /&gt;&lt;br /&gt;Bristol has been having problems with their SE. Despite work over the weekend to fsck all the partitions by hand, the array is still causing problems.&lt;br /&gt;&lt;br /&gt;Oxford has recently been having a strange Maui problem that means only about 60-70% of the available cores get allocated jobs. Manually 'qrun' ing the jobs causes them to run ok.&lt;br /&gt;Then more recently the maui process actually started crashing. Investigations are ongoing although things seem a bit better just now.&lt;br /&gt;&lt;br /&gt;RalPPD have installed the latest purchase of WNs into production adding another 160 job slots worth 270 kSI2k bringing us up to 1025kSI2k Total. Further disk servers have arrived but will take a month to commission.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5150546295964070728?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5150546295964070728/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5150546295964070728&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5150546295964070728'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5150546295964070728'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/10/southgrid-update.html' title='SouthGrid update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3945631848518526394</id><published>2008-08-28T12:32:00.004+01:00</published><updated>2008-08-28T13:25:25.060+01:00</updated><title type='text'>Upgrade at RALPP</title><content type='html'>We had a downtime this morning to:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Upgrade the kernel on the Grid Service and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;dCache&lt;/span&gt; pool nodes&lt;/li&gt;&lt;li&gt;Install the latest &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;gLite&lt;/span&gt; updates on the service nodes&lt;/li&gt;&lt;li&gt;Upgrade &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;dCache&lt;/span&gt; to the most recent patch version&lt;/li&gt;&lt;/ol&gt;Upgrading &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;gLite&lt;/span&gt; and the kernel on the service nodes seems to have gone smoothly (still waiting for the SAM Admin jobs I submitted to get beyond "Waiting").&lt;br /&gt;&lt;br /&gt;However, I had a bit more fun upgrading the kernel on the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;dCache&lt;/span&gt; Pool nodes. This is supposed to be much easier now the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;Areca&lt;/span&gt; drivers are in the stock &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;SL&lt;/span&gt;4 kernel and the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;xfs&lt;/span&gt; kernel modules are in the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;SL&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;Contrib&lt;/span&gt; yum repository so we don't have to build our own &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;rpms&lt;/span&gt; like we have in the past, and indeed both these parts worked fine. But five of the nodes with 3ware cards did not appear again after I (remotely) &lt;span style="font-family: courier new;"&gt; shutdown -r &lt;/span&gt;&lt;span style="font-family: courier new;"&gt;now&lt;/span&gt;'d them. Of course these are the nodes in the Atlas Center so I had to walk across to try to find out what the problem was. The all seemed to have hung up at the end of the shutdown "&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_11"&gt;Unmounting&lt;/span&gt; the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_12"&gt;filesystems&lt;/span&gt;". All came back cleanly after I hit the reset buttons.&lt;br /&gt;&lt;br /&gt;The second problem (which had me worried for a time ) was with one of the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_13"&gt;Areca&lt;/span&gt; nodes. I was checking them to see if the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_14"&gt;XFS&lt;/span&gt; kernel modules had installed correctly and that the raid partition was mounted and on this node it wasn't, but the kernel modules had installed correctly. Looking a bit harder I found that the whole device seemed to be missing. Connecting to the RAID Card web interface I find that instead of two RAID sets (system and data) it has the two system disks in a &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15"&gt;RAIDo&lt;/span&gt; pair and 22 free disks (cue heart palpitations). Looking (in a rather panicked fashion) through the admin interface options I find "Rescue RAID set" and give it a go. After a reboot I connected to the web interface again and now I see both RAID sets. Phew! It's too early to start the celebrations though &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_16"&gt;becuase&lt;/span&gt; when I log in the partition isn't mounted and when I try by hand it complains that the Logical Volume isn't there. Uh oh, cue much googling and reading of man pages.&lt;br /&gt;&lt;br /&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_17"&gt;pvscan&lt;/span&gt; sees the physical volume, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_18"&gt;vgscan&lt;/span&gt; sees the volume group and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_19"&gt;lvscan&lt;/span&gt; sees the logical volume but it's "&lt;span style="font-family: courier new;"&gt;NOT Available&lt;/span&gt;". I tried &lt;span style="font-family: courier new;"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_20"&gt;vgscan&lt;/span&gt; --&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_21"&gt;mknodes&lt;/span&gt;&lt;/span&gt;, no that didn't work. I finally got it working with:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_22"&gt;vgchange&lt;/span&gt; --available y --verbose raid&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Then  I could mount the partition and all the data appeared to be there.&lt;br /&gt;&lt;br /&gt;After all that the upgrade of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_23"&gt;dCache&lt;/span&gt; was very simple. Just a case of adding the latest rpm to my yum repository, running &lt;span style="font-family: courier new;"&gt;yum update &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_24"&gt;dcache&lt;/span&gt;-server&lt;/span&gt; then &lt;span style="font-family: courier new;"&gt;/opt/d-cache/install/install.sh&lt;/span&gt;. The latter complained of some &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_25"&gt;depreciated&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_26"&gt;config&lt;/span&gt; file options I'll have to look at but &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_27"&gt;dCache&lt;/span&gt; came up&lt;br /&gt;smoothly.&lt;br /&gt;&lt;br /&gt;I'd &lt;span style="font-family: courier new;"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_28"&gt;qsig&lt;/span&gt; -s STOP&lt;/span&gt;'d all the jobs whist doing the obviously and here's an &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_29"&gt;interesting&lt;/span&gt; ploy of the network traffic into the Worker Nodes over the last day.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_IqJOLctNCx8/SLaXix6NCDI/AAAAAAAAAGg/30Wc2J_l_9M/s1600-h/network-graph.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_IqJOLctNCx8/SLaXix6NCDI/AAAAAAAAAGg/30Wc2J_l_9M/s320/network-graph.png" alt="" id="BLOGGER_PHOTO_ID_5239541840260958258" border="0" /&gt;&lt;/a&gt;As you can see once I restarted the jobs they more or less picked up without missing a beat. And yes, they are reading data at 600 MB/sec and the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_30"&gt;dCache&lt;/span&gt; is quite happily serving to them at that rate.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="file:///C:/DOCUME%7E1/cajb89/LOCALS%7E1/Temp/moz-screenshot.jpg" alt="" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3945631848518526394?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/3945631848518526394/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=3945631848518526394&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3945631848518526394'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3945631848518526394'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/08/upgrade-at-ralpp.html' title='Upgrade at RALPP'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_IqJOLctNCx8/SLaXix6NCDI/AAAAAAAAAGg/30Wc2J_l_9M/s72-c/network-graph.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7165251618167622557</id><published>2008-08-27T10:02:00.001+01:00</published><updated>2008-08-27T10:16:18.980+01:00</updated><title type='text'>More spacetokens at Oxford</title><content type='html'>Expanding the spacetokens at Oxford showed that the dpm-updatespace command has to have integer values so for 4.5T use 4500G&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;/opt/lcg/bin/dpm-updatespace --token_desc ATLASMCDISK --gspace 4500G --lifetime Inf&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I used Graemes script to setup the ATLASGROUPDISK permissions after the reservespace command:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;/opt/lcg/bin/dpm-reservespace --gspace 2T --lifetime Inf --group atlas/Role=production --token_desc ATLASGROUPDISK&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Graeme's script:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;[root@t2se01 ~]# more atlas-group-disk-dpm.sh&lt;br /&gt;#!/bin/bash&lt;br /&gt;&lt;br /&gt;DOMAIN=$(hostname -d)&lt;br /&gt;&lt;br /&gt;dpns-mkdir /dpm/$DOMAIN/home/atlas/atlasgroupdisk/&lt;br /&gt;dpns-chgrp atlas/Role=production /dpm/$DOMAIN/home/atlas/atlasgroupdisk/&lt;br /&gt;dpns-setacl -m d:g:atlas/Role=production:7,d:m:7 /dpm/$DOMAIN/home/atlas/atlasgroupdisk/&lt;br /&gt;&lt;br /&gt;for physgrp in exotics higgs susy beauty sm; do&lt;br /&gt;    dpns-entergrpmap --group atlas/phys-$physgrp/Role=production&lt;br /&gt;    dpns-mkdir /dpm/$DOMAIN/home/atlas/atlasgroupdisk/phys-$physgrp&lt;br /&gt;    dpns-chgrp atlas/phys-$physgrp/Role=production /dpm/$DOMAIN/home/atlas/atlasgroupdisk/phy&lt;br /&gt;s-$physgrp&lt;br /&gt;    dpns-setacl -m d:g:atlas/phys-$physgrp/Role=production:7,d:m:7 /dpm/$DOMAIN/home/atlas/at&lt;br /&gt;lasgroupdisk/phys-$physgrp&lt;br /&gt;done&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;ATLASDATADISK space was increased to 15TB&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;dpm-updatespace --token_desc ATLASDATADISK --gspace 15T --lifetime Inf&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;ATLASLOCALGROUPDISK was created and setup:&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;/opt/lcg/bin/dpm-reservespace --gspace 1T --lifetime Inf --group atlas --token_desc ATLASLOCALGROUPDISK&lt;br /&gt;&lt;br /&gt;dpns-mkdir /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk&lt;br /&gt;&lt;br /&gt; dpns-chgrp atlas/uk /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk   &lt;br /&gt; dpns-setacl -m d:g:atlas/uk:7,m:7 /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk&lt;br /&gt; dpns-setacl -m g:atlas/uk:7,m:7 /dpm/physics.ox.ac.uk/home/atlas/atlaslocalgroupdisk&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7165251618167622557?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7165251618167622557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7165251618167622557&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7165251618167622557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7165251618167622557'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/08/more-spacetokens-at-oxford.html' title='More spacetokens at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7581863353603798898</id><published>2008-08-20T16:42:00.005+01:00</published><updated>2008-08-20T16:49:32.087+01:00</updated><title type='text'>Brief Bristol Update</title><content type='html'>Brief Bristol update: new hardware to replace HPC CE received &amp;amp;&lt;br /&gt;being built. New hardware for StoRM SE &amp;amp; a gridftp nodes received,&lt;br /&gt;Dr Wakelin building them.&lt;br /&gt;Our 50TB of new storage should be ready in September.&lt;br /&gt;&lt;br /&gt;New hardware to replace MON received, being built. Will replace small&lt;br /&gt;cluster WN this fall (possibly increase number) &amp;amp; possibly also&lt;br /&gt;its CE &amp;amp; DPM SE.&lt;br /&gt;&lt;br /&gt;Both clusters mostly stable, except for occasional gpfs timeouts on&lt;br /&gt;HPC &amp;amp; recent intermittent problems with SCSI resets on DPM SE.&lt;br /&gt;&lt;br /&gt;Delays due to Yves, Jon &amp;amp; Winnie very busy with other very high prio.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7581863353603798898?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7581863353603798898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7581863353603798898&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7581863353603798898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7581863353603798898'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/08/brief-bristol-update.html' title='Brief Bristol Update'/><author><name>Winnie Lacesso</name><uri>http://www.blogger.com/profile/05432798830140675943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6496714113996632955</id><published>2008-08-18T12:16:00.005+01:00</published><updated>2008-08-28T15:06:15.033+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Space Tokens'/><category scheme='http://www.blogger.com/atom/ns#' term='dCache'/><category scheme='http://www.blogger.com/atom/ns#' term='Atlas'/><title type='text'>Setting up the Atlas Space Tokens on dCache</title><content type='html'>Well the request from Atlas to have space tokens set up is quite complicated but here's my first attempt at setting them up for dCache:&lt;br /&gt;&lt;br /&gt;The want to have different permissions on different space tokens. I think the only way to do that is to create different LinkGroups to associate with the space tokens.  Here is the section from my LinkGroupAuorization.conf file for Atlas now:&lt;br /&gt;&lt;pre&gt;LinkGroup atlas-link-group&lt;br /&gt;/atlas/Role=production&lt;br /&gt;&lt;br /&gt;LinkGroup atlas-group-link-group&lt;br /&gt;/atlas/Role=production&lt;br /&gt;/atlas/phys-exotics/Role=production&lt;br /&gt;/atlas/phys-higgs/Role=production&lt;br /&gt;/atlas/phys-susy/Role=production&lt;br /&gt;/atlas/phys-beuty/Role=production&lt;br /&gt;/atlas/phys-sm/Role=production&lt;br /&gt;&lt;br /&gt;LinkGroup atlas-user-link-group&lt;br /&gt;/atlas/Role=*&lt;br /&gt;&lt;br /&gt;LinkGroup atlas-localgroup-link-group&lt;br /&gt;/atlas/uk/Role=*&lt;br /&gt;&lt;/pre&gt;However it appears a Link can only be associated with one LinkGroup so we also have to create a Link for each of these. Luckily it appears that a PoolGroup can be associated with multiple links so we don't have to split up the Atlas space (phew).&lt;br /&gt;&lt;br /&gt;So I created a bunch of Links and LinkGroups in the PoolManager like this:&lt;br /&gt;&lt;pre&gt;psu create link atlas-localgroup-link world-net atlas&lt;br /&gt;psu set link atlas-localgroup-link -readpref=20 -writepref=20 -cachepref=20 -p2ppref=-1&lt;br /&gt;psu add link atlas-localgroup-link atlas-pgroup&lt;br /&gt;psu add link atlas-localgroup-link atlas&lt;br /&gt;psu create linkGroup atlas-localgroup-link-group&lt;br /&gt;psu set linkGroup custodialAllowed atlas-localgroup-link-group false&lt;br /&gt;psu set linkGroup replicaAllowed atlas-localgroup-link-group true&lt;br /&gt;psu set linkGroup nearlineAllowed atlas-localgroup-link-group false&lt;br /&gt;psu set linkGroup outputAllowed atlas-localgroup-link-group false&lt;br /&gt;psu set linkGroup onlineAllowed atlas-localgroup-link-group true&lt;br /&gt;psu addto linkGroup atlas-localgroup-link-group atlas-localgroup-link&lt;br /&gt;&lt;/pre&gt;Obviously repeated for each of the other extra LinkGroups&lt;br /&gt;&lt;br /&gt;Then it's just a case of creating the space tokens in the SrmSpaceManager:&lt;br /&gt;&lt;pre&gt;reserve -vog=/atlas -vor=NULL -acclat=ONLINE -retpol=REPLICA -desc=ATLASUSERDISK -lg=atlas-user-link-group 2500000000000 "-1"&lt;br /&gt;reserve -vog=/atlas/uk -vor=NULL -acclat=ONLINE -retpol=REPLICA -desc=ATLASLOCALGROUPDISK -lg=atlas-localgroup-link-group 9000000000000 "-1"&lt;br /&gt;reserve -vog=/atlas -vor=production -acclat=ONLINE -retpol=REPLICA -desc=ATLASGROUPDISK -lg=atlas-group-link-group 3000000000000 "-1"&lt;br /&gt;&lt;/pre&gt;&lt;span style="font-size:78%;"&gt;I'm not sure the last one will work as expected I don't know how the -vog=/atlas will map with the multiple VOMS groups in the LinkGroupAuthorization.conf file. But I've no idea how to specify multiple VOMS groups there.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;OK, that should get us the Space tokens, but Atlas are also requesting specific permissions on directories and that's completely orthogonal to the space tokens. All I've got to play with there are the normal UNIX users and groups.&lt;br /&gt;&lt;br /&gt;So I start off by creating 6 extra groups and amking it the primary group for a single pool account (which is also in the main atlas group) , I also add the atlasprd account to the physics group groups since they want that to have write access to the group areas. Here's the relevant bit from /etc/groups, you can work out the changes to /etc/passwd yourselves.&lt;br /&gt;&lt;pre&gt;atlas:x:24259:atlas002,atlas003,atlas004,atlas005,atlas006,atlas007&lt;br /&gt;atl-exo:x:24358:atlasprd,atlas002&lt;br /&gt;atl-higg:x:24359:atlasprd,atlas003&lt;br /&gt;atl-susy:x:24360:atlasprd,atlas004&lt;br /&gt;atl-b:x:24361:atlasprd,atlas005&lt;br /&gt;atl-sm:x:24362:atlasprd,atlas006&lt;br /&gt;atl-uk:x:24365:atlas007&lt;br /&gt;&lt;/pre&gt;Now I've got the users and groups set up I can create the directories:&lt;br /&gt;&lt;pre&gt;mkdir /pnfs/pp.rl.ac.uk/data/atlas/atlaslocalgroupdisk&lt;br /&gt;chown atlas007:atl-uk /pnfs/pp.rl.ac.uk/data/atlas/atlaslocalgroupdisk&lt;br /&gt;chmod 755 /pnfs/pp.rl.ac.uk/data/atlas/atlaslocalgroupdisk&lt;br /&gt;&lt;/pre&gt;or&lt;br /&gt;&lt;pre&gt;[root@heplnx204 etc]# ls -l /pnfs/pp.rl.ac.uk/data/atlas/atlasgroupdisk/&lt;br /&gt;total 3&lt;br /&gt;drwxrwxr-x  1 atlas005 atl-b    512 Aug 18 13:21 phys-beauty&lt;br /&gt;drwxrwxr-x  1 atlas002 atl-exo  512 Aug 18 13:21 phys-exotics&lt;br /&gt;drwxrwxr-x  1 atlas003 atl-higg 512 Aug 18 13:21 phys-higgs&lt;br /&gt;drwxrwxr-x  1 atlas006 atl-sm   512 Aug 18 13:21 phys-sm&lt;br /&gt;drwxrwxr-x  1 atlas004 atl-susy 512 Aug 18 13:21 phys-susy&lt;br /&gt;&lt;/pre&gt;But no I have to make sure dCache maps the right voms credentials to the correct account:&lt;br /&gt;First of in /etc/grid-security/storage-authzdb&lt;br /&gt;&lt;pre&gt;authorize atlas001 read-write 37101 24259 / / /&lt;br /&gt;authorize atlas002 read-write 37102 24358 / / /&lt;br /&gt;authorize atlas003 read-write 37103 24359 / / /&lt;br /&gt;authorize atlas004 read-write 37104 24360 / / /&lt;br /&gt;authorize atlas005 read-write 37105 24361 / / /&lt;br /&gt;authorize atlas006 read-write 37106 24362 / / /&lt;br /&gt;authorize atlas007 read-write 37107 24365 / / /&lt;br /&gt;authorize atlasprd read-write 51000 24259 / / /&lt;br /&gt;&lt;/pre&gt;and in /etc/grid-security/grid-vorolemap&lt;br /&gt;&lt;pre&gt;# Added role /alice/Role=production&lt;br /&gt;"*" "/alice/Role=production" aliceprd&lt;br /&gt;&lt;br /&gt;# Added role /atlas&lt;br /&gt;"*" "/atlas" atlas001&lt;br /&gt;"*" "/atlas/phys-exotics" atlas002&lt;br /&gt;"*" "/atlas/phys-higgs" atlas003&lt;br /&gt;"*" "/atlas/phys-susy" atlas004&lt;br /&gt;"*" "/atlas/phys-beauty" atlas005&lt;br /&gt;"*" "/atlas/phys-sm" atlas006&lt;br /&gt;"*" "/atlas/uk" atlas007&lt;br /&gt;&lt;br /&gt;# Added role /atlas/Role=lcgadmin&lt;br /&gt;"*" "/atlas/Role=lcgadmin" atlas001&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This has not been fully tested yet, in particular it's not clear the the ATLASGROUPDISK space token will handle the way I expect.&lt;br /&gt;&lt;br /&gt;Oh, and doing this has once again made me realise that I don't really understand what Units and Links are and do in dCache, so I'm offering a beer to anyone who can explain this to me.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update on 28/08/08&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It looks like this doesn't work fully, dCache doesn't support secondary groups so the &lt;span style="font-family: courier new;"&gt;atlasprd&lt;/span&gt; user who is in group &lt;span style="font-family: courier new;"&gt;atlas &lt;/span&gt;cannot write to the &lt;span style="font-family: courier new;"&gt;/pnfs/pp.rl.ac.uk/data/atlas/atlasgroupdisk/* &lt;/span&gt;areas even though it has secondary group membership of the groups which do have write access. I'm now waiting for feedback from atlas to know how they want the permissions configured in view of this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6496714113996632955?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6496714113996632955/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6496714113996632955&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6496714113996632955'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6496714113996632955'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/08/setting-up-atlas-space-tokens-on-dcache.html' title='Setting up the Atlas Space Tokens on dCache'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6349475103571659725</id><published>2008-07-25T09:59:00.000+01:00</published><updated>2008-07-25T10:00:28.172+01:00</updated><title type='text'>Adding multiple clusters to get different memory limit queues</title><content type='html'>I've been thinking about doing this for ages. The aim is to have different queues with different memory limits to better direct jobs with higher memory requirements to nodes with more memory.&lt;br /&gt;&lt;br /&gt;The current method of doing this is to set up up a separate queue for each level with the default memory requirement set. Then in the information system publish separate clusters and subclusters for each of the queues.&lt;br /&gt;&lt;br /&gt;So I first created grid500, grid1000 and grid2000 queues with no memory limits, configured them normal using yaim so they would accept jobs from all my supported VOs and checked that job submission to them worked as expected.&lt;br /&gt;&lt;br /&gt;I then edited the &lt;span style="font-family:courier new;"&gt;static-file-Cluster.ldif&lt;/span&gt; file on the CE to add extra clusters and subclusters for each of the queues and set the memory for each of the clusters to the memory for the queue. So for example for the grid500 queue I created a 500.pp.rl.ac.uk cluster like so:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;dn: GlueClusterUniqueID=500.pp.rl.ac.uk,mds-vo-name=resource,o=grid&lt;br /&gt;objectClass: GlueClusterTop&lt;br /&gt;objectClass: GlueCluster&lt;br /&gt;objectClass: GlueInformationService&lt;br /&gt;objectClass: GlueKey&lt;br /&gt;objectClass: GlueSchemaVersion&lt;br /&gt;GlueClusterName: 500.pp.rl.ac.uk&lt;br /&gt;GlueClusterService: heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid500&lt;br /&gt;GlueClusterUniqueID: 500.pp.rl.ac.uk&lt;br /&gt;GlueForeignKey: GlueSiteUniqueID=UKI-SOUTHGRID-RALPP&lt;br /&gt;GlueForeignKey: GlueCEUniqueID=heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid500&lt;br /&gt;GlueInformationServiceURL: ldap://heplnx206.pp.rl.ac.uk:2170/mds-vo-name=resource,o=grid&lt;br /&gt;GlueSchemaVersionMajor: 1&lt;br /&gt;GlueSchemaVersionMinor: 3&lt;br /&gt;&lt;br /&gt;dn: GlueSubClusterUniqueID=500.pp.rl.ac.uk, GlueClusterUniqueID=500.pp.rl.ac.uk,mds-vo-name=resource,o=grid&lt;br /&gt;objectClass: GlueClusterTop&lt;br /&gt;objectClass: GlueSubCluster&lt;br /&gt;objectClass: GlueHostApplicationSoftware&lt;br /&gt;objectClass: GlueHostArchitecture&lt;br /&gt;objectClass: GlueHostBenchmark&lt;br /&gt;objectClass: GlueHostMainMemory&lt;br /&gt;objectClass: GlueHostNetworkAdapter&lt;br /&gt;objectClass: GlueHostOperatingSystem&lt;br /&gt;objectClass: GlueHostProcessor&lt;br /&gt;objectClass: GlueInformationService&lt;br /&gt;objectClass: GlueKey&lt;br /&gt;objectClass: GlueSchemaVersion&lt;br /&gt;GlueChunkKey: GlueClusterUniqueID=heplnx206.pp.rl.ac.uk&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_1_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_1_1&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_2_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_3_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_3_1&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_4_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_5_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_6_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_7_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: GLITE-3_0_0&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: RALPP&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: SOUTHHGRID&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: GRIDPP&lt;br /&gt;GlueHostApplicationSoftwareRunTimeEnvironment: R-GMA&lt;br /&gt;GlueHostArchitectureSMPSize: 2&lt;br /&gt;GlueHostArchitecturePlatformType: i586&lt;br /&gt;GlueHostBenchmarkSF00: 0&lt;br /&gt;GlueHostBenchmarkSI00: 1000&lt;br /&gt;GlueHostMainMemoryRAMSize: 500&lt;br /&gt;GlueHostMainMemoryVirtualSize: 1000&lt;br /&gt;GlueHostNetworkAdapterInboundIP: FALSE&lt;br /&gt;GlueHostNetworkAdapterOutboundIP: TRUE&lt;br /&gt;GlueHostOperatingSystemName: ScientificSL&lt;br /&gt;GlueHostOperatingSystemRelease: 4.4&lt;br /&gt;GlueHostOperatingSystemVersion: Beryllium&lt;br /&gt;GlueHostProcessorClockSpeed: 2800&lt;br /&gt;GlueHostProcessorModel: PIV&lt;br /&gt;GlueHostProcessorVendor: intel&lt;br /&gt;lueSubClusterName: 500.pp.rl.ac.uk&lt;br /&gt;GlueSubClusterUniqueID: 500.pp.rl.ac.uk&lt;br /&gt;GlueSubClusterPhysicalCPUs: 0&lt;br /&gt;GlueSubClusterLogicalCPUs: 0&lt;br /&gt;GlueSubClusterTmpDir: /tmp&lt;br /&gt;GlueSubClusterWNTmpDir: /tmp&lt;br /&gt;GlueInformationServiceURL: ldap://heplnx206.pp.rl.ac.uk:2170/mds-vo-name=resource,o=grid&lt;br /&gt;GlueSchemaVersionMajor: 1&lt;br /&gt;GlueSchemaVersionMinor: 3&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that there has to be a blank line in the file after the end of the subcluster definition or else the gip script that adds the VO tags doesn't.&lt;br /&gt;&lt;br /&gt;I also had to edit the CE/Queue entries in &lt;span style="font-family:courier new;"&gt;static-file-CE.ldif&lt;/span&gt; to change these two entries for each of the grid500, grid1000 and grid2000 queues&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;GlueCEHostingCluster: 500.pp.rl.ac.uk&lt;br /&gt;GlueForeignKey: GlueClusterUniqueID=500.pp.rl.ac.uk&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;These clusters seem to have appeared correctly on the gStat pages.&lt;br /&gt;&lt;br /&gt;So now when I edg-job-list-match a jdl with the following requirements I get:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:courier new;"&gt;Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID)  &amp;amp;&amp;amp; other.GlueHostMainMemoryRAMSize &gt;=  500);&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;***************************************************************************&lt;br /&gt;COMPUTING ELEMENT IDs LIST&lt;br /&gt;The following CE(s) matching your job requirements have been found:&lt;br /&gt;&lt;br /&gt;*CEId*&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-dteam&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid1000&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid2000&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid500&lt;br /&gt;***************************************************************************&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:courier new;"&gt;Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID)  &amp;amp;&amp;amp; other.GlueHostMainMemoryRAMSize &gt;= 1000);&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;***************************************************************************&lt;br /&gt;COMPUTING ELEMENT IDs LIST&lt;br /&gt;The following CE(s) matching your job requirements have been found:&lt;br /&gt;&lt;br /&gt;*CEId*&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-dteam&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid1000&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid2000&lt;br /&gt;***************************************************************************&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:courier new;"&gt;Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID)  &amp;amp;&amp;amp; other.GlueHostMainMemoryRAMSize &gt;= 1500);&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;***************************************************************************&lt;br /&gt;COMPUTING ELEMENT IDs LIST&lt;br /&gt;The following CE(s) matching your job requirements have been found:&lt;br /&gt;&lt;br /&gt;*CEId*&lt;br /&gt;heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid2000&lt;br /&gt;***************************************************************************&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:courier new;"&gt;Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID)  &amp;amp;&amp;amp; other.GlueHostMainMemoryRAMSize &gt;= 2001);&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;===================== edg-job-list-match failure ======================&lt;br /&gt;No Computing Element matching your job requirements has been found!&lt;br /&gt;======================================================================&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Which looks very much like what I want to do.&lt;br /&gt;&lt;br /&gt;Then to let Torque/Maui know about the memory requirements for each of these new queues I set a default memory requirement for each with something like:&lt;br /&gt;&lt;pre&gt;qmgr -c "set queue grid1000 resources_default.mem = 1000mb"&lt;br /&gt;&lt;/pre&gt;(I think this is a non-enforcing requirement so jobs will not be killed for going over it. To do that I think you need to set "resources_max.mem".)&lt;br /&gt;&lt;br /&gt;Now I just need to do the same configuration on heplnx207, phase out the "per VO" queues and persuade users to put the memory requirements of their jobs in their JDL files&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6349475103571659725?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6349475103571659725/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6349475103571659725&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6349475103571659725'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6349475103571659725'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/07/adding-multiple-clusters-to-get.html' title='Adding multiple clusters to get different memory limit queues'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8963976765794567215</id><published>2008-07-11T14:22:00.002+01:00</published><updated>2008-07-11T14:25:55.381+01:00</updated><title type='text'>Oxford adds the ATLAS proddisk space token</title><content type='html'>&lt;span style="font-size:85%;"&gt;Using the same procedure as before I added the new SPACE TOKEN for ATLAS:&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;export DPNS_HOST=t2se01.physics.ox.ac.uk&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;/opt/lcg/bin/dpm-reservespace --gspace 2T --lifetime Inf --group atlas/Role=production --token_desc ATLASPRODDISK&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;/opt/lcg/bin/dpns-mkdir /dpm/physics.ox.ac.uk/home/atlas/atlasproddisk&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;/opt/lcg/bin/dpns-chgrp atlas/Role=production /dpm/physics.ox.ac.uk/home/atlas/atlasproddisk&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;dpns-chmod 775 /dpm/physics.ox.ac.uk/home/atlas/atlasproddisk&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;dpns-setacl -m d:g:atlas/Role=production:7,m:7 /dpm/physics.ox.ac.uk/home/atlas/atlasproddisk&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;dpns-setacl -m g:atlas/Role=production:7,m:7 /dpm/physics.ox.ac.uk/home/atlas/atlasproddisk&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8963976765794567215?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8963976765794567215/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8963976765794567215&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8963976765794567215'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8963976765794567215'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/07/oxford-adds-atlas-proddisk-space-token.html' title='Oxford adds the ATLAS proddisk space token'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4040674201912679158</id><published>2008-07-02T11:37:00.003+01:00</published><updated>2008-07-02T12:34:26.397+01:00</updated><title type='text'>SouthGrid Update</title><content type='html'>Just to update that the rebuild of the Bristol&lt;br /&gt;DPM SE to SL4 on 19th June by Yves &amp;amp; Winnie was a smooth success.&lt;br /&gt;&lt;br /&gt;Oxford had a strange error on its site BDII (still running SL3) after a re run of yaim it stopped advertising the site web address and several other entries so gstat had a warning. The following were missing.:&lt;br /&gt;dn: GlueSiteUniqueID=UKI-SOUTHGRID-OX-HEP,mds-vo-name=UKI-SOUTHGRID-OX-HEP,o=g&lt;br /&gt;objectClass: GlueSite&lt;br /&gt;GlueSiteUniqueID: UKI-SOUTHGRID-OX-HEP&lt;br /&gt;GlueSiteName: UKI-SOUTHGRID-OX-HEP&lt;br /&gt;GlueSiteDescription: LCG Site&lt;br /&gt;GlueSiteUserSupportContact: mailto:lcg_manager@physics.ox.ac.uk&lt;br /&gt;GlueSiteSysAdminContact: mailto:lcg_manager@physics.ox.ac.uk&lt;br /&gt;GlueSiteSecurityContact: mailto:lcg_manager@physics.ox.ac.uk&lt;br /&gt;GlueSiteLocation: Oxford, UK&lt;br /&gt;GlueSiteLatitude: 51.7595&lt;br /&gt;GlueSiteLongitude: -1.2595&lt;br /&gt;GlueSiteWeb: http://www.physics.ox.ac.uk&lt;br /&gt;GlueSiteSponsor: none&lt;br /&gt;GlueSiteOtherInfo: TIER-2&lt;br /&gt;GlueSiteOtherInfo: rl.ac.uk&lt;br /&gt;&lt;br /&gt;It transpired that the file /opt/bdii/etc/bdii-update.conf&lt;br /&gt;had the entry GIP pointing to&lt;br /&gt;&lt;br /&gt;&gt; GIP  file:///opt/glite/libexec/glite-info-wrapper&lt;br /&gt;&lt;br /&gt;which did not exist, changing it to&lt;br /&gt;&lt;br /&gt;&lt; GIP  file:///opt/lcg/libexec/lcg-info-wrapper&lt;br /&gt;&lt;br /&gt;and restarting bdii has fixed the missing web address etc problem.&lt;br /&gt;&lt;br /&gt;We were about to setup an SL4 based site bdii anyway so Ewan did so and curiously got the same errors, even though the glite 3.1 files do all now exist in the /opt/glite directories.&lt;br /&gt;Investigations continue but for now we are sticking with the working glite 3.0 based site bdii.&lt;br /&gt;&lt;br /&gt;PS A thread on LCG-ROLLOUT mentions the same errors:&lt;br /&gt;&lt;a href="http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0806&amp;amp;L=lcg-rollout&amp;amp;T=0&amp;amp;F=&amp;amp;S=&amp;amp;X=3AEC8F06E68018C3DE&amp;amp;Y=p.gronbech1%40physics.ox.ac.uk&amp;amp;P=4737"&gt;http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0806&amp;amp;L=lcg-rollout&amp;amp;T=0&amp;amp;F=&amp;amp;S=&amp;amp;X=3AEC8F06E68018C3DE&amp;amp;Y=p.gronbech1%40physics.ox.ac.uk&amp;amp;P=4737&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4040674201912679158?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4040674201912679158/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4040674201912679158&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4040674201912679158'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4040674201912679158'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/07/southgrid-update.html' title='SouthGrid Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4295392960256772050</id><published>2008-06-17T13:45:00.002+01:00</published><updated>2008-06-23T16:24:18.531+01:00</updated><title type='text'>SouthGrid update</title><content type='html'>Yves will be helping Bristol upgrade the se to SL4 on Thursday.&lt;br /&gt;They had problems with transtec raid array, specifically the battery backed up cache. New parts have now arrived.&lt;br /&gt;&lt;br /&gt;The MESC cluster at Birmingham is working well now, the four LHC VOs are supported, but it is not full yet.  (Both Atlas and LHCb are aware of it)&lt;br /&gt;We may have to advertise it to the VOs.&lt;br /&gt;&lt;br /&gt;Yves hopes to be able to start configuring Blue Bear (The new HPC cluster) after the grand opening next week. There will also be an meeting between NGS people and Birmingham to see how they can work better together, probably by NGS enabling Blue Bear.&lt;br /&gt;&lt;br /&gt;64 bit tarball was used at Bristol and (32 bit on MESC). Some extra i386 rpms are required.&lt;br /&gt;&lt;br /&gt;JET have had problems, failing SAM tests but OK for real jobs.  Will reinstall the ce.&lt;br /&gt;&lt;br /&gt;Stop Press: Now working since the reinstall.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Oxford have not published accounting since the introduction of two SL4 based ce's. We will be working on fixing this so we have the data for the Quaterly report.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4295392960256772050?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4295392960256772050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4295392960256772050&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4295392960256772050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4295392960256772050'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/06/southgrid-update.html' title='SouthGrid update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-2035006864930891721</id><published>2008-05-21T11:27:00.002+01:00</published><updated>2008-05-21T11:47:11.057+01:00</updated><title type='text'>Second phase of Oxford's move</title><content type='html'>Over the last couple of days Ewan and I have moved five 9TB disk servers, five twin worker nodes, a couple of head nodes and two UPS's. We had help moving the rack, and then we were able to reassemble it all.  The site was back up and passing SAM tests in time for us to come out of scheduled maintenance at 1700. &lt;br /&gt;The current setup means we have three ce's, the original SL3 based ce is driving most of the newer SL4 based WNs. The two new SL4 based ce's send jobs to a new torque server, and onto two subclusters, one for the 32 bit hardware (Dell 2.8GHz xeons), and the other the Intel Clovertown quads. We will migrate the workers off the old ce onto the new ones over the next few days, before decommissioning the original SL3 based ce.&lt;br /&gt;&lt;br /&gt;The migration of data from our old se head node, is complete. There were three files that were listed in the database but did not exist on the physical storage. I used the scripts from :&lt;br /&gt;&lt;a href="https://twiki.cern.ch/twiki//bin/view/LCG/CheckDpmConsistency"&gt;https://twiki.cern.ch/twiki//bin/view/LCG/CheckDpmConsistency&lt;/a&gt;&lt;br /&gt;which matched the three files names that were left in my dpm-drain logs.&lt;br /&gt;&lt;br /&gt;Two of the files could be removed with rfrm but one refused to appear in the normal dpns-ls listing so was not removed.&lt;br /&gt;We decided to ignore this one and remove the files system from the pool with the command.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;dpm-rmfs --server t2se01.physics.ox.ac.uk --fs /storage&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;The new DPM head node will be setup and then the mysql database dumped and restored on to it shortly.&lt;br /&gt;&lt;br /&gt;Meanwhile we are awaiting the backplanes in our storage servers to be swapped out to avoid the burnout issue we have suffered on one of them.&lt;br /&gt;&lt;br /&gt;SouthGrid technical meeting will be held tomorrow at Birmingham.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-2035006864930891721?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/2035006864930891721/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=2035006864930891721&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2035006864930891721'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2035006864930891721'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/05/second-phase-of-oxfords-move.html' title='Second phase of Oxford&apos;s move'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6081269329286823275</id><published>2008-04-28T14:47:00.003+01:00</published><updated>2008-04-28T14:51:29.675+01:00</updated><title type='text'>Oxford DPM progressing Slowly</title><content type='html'>Removing zero length files from the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;DPM&lt;/span&gt; storage pool with the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;rfrm&lt;/span&gt; command has helped the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;dpm&lt;/span&gt;-drain command to start progressing again.&lt;br /&gt;The command still fails after 10-20 files and the transfer speed in very low, but at least we are making progress.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6081269329286823275?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6081269329286823275/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6081269329286823275&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6081269329286823275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6081269329286823275'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/04/oxford-dpm-progressing-slowly.html' title='Oxford DPM progressing Slowly'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4024277028777508413</id><published>2008-04-14T16:33:00.003+01:00</published><updated>2008-04-15T08:02:15.707+01:00</updated><title type='text'>SouthGrid Update</title><content type='html'>Having installed half of the Oxford cluster at Begbroke last Tuesday. The Air conditioning failed during the night,  a valve on the Chillers failed cutting off the water supply to the Chillers which in turn switched themselves off. The room rapidly heated up to &gt;40 degrees. After investigation and repairs the AC went back on and all has been well so far. More automated warning systems are required.&lt;br /&gt;&lt;br /&gt;Cambridge have set up space tokens for both ATLASDATADISK and ATLASMCDISK. They have also started upgrading to SL4 (64bit) worker nodes.&lt;br /&gt;&lt;br /&gt;Britol completed upgrading the Worker nodes to SL4 on Monday. They had some problems caused by se linux, preventing normal loging but all now appears well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4024277028777508413?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4024277028777508413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4024277028777508413&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4024277028777508413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4024277028777508413'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/04/southgrid-update.html' title='SouthGrid Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6605857084622345326</id><published>2008-04-08T11:11:00.004+01:00</published><updated>2008-04-08T11:16:20.513+01:00</updated><title type='text'>Oxford Update</title><content type='html'>Last week Ewan and I started the DIY move to Begbroke. We moved 40 1 u servers over two mornings. One of the (now empty) Dell racks was moved on Wednesday afternoon.&lt;br /&gt;The worker nodes were reinstalled in that rack on Thursday, we had one psu failure out of 27 nodes. These nodes will be installed with sl4 shortly.&lt;br /&gt;&lt;br /&gt;This week we have emptied one of the Viglen racks and moved the servers yesterday.&lt;br /&gt;We hope to move the rack this afternoon and get the worker nodes back on line asap as we are at half capacity currently.&lt;br /&gt;&lt;br /&gt;On Firday 28th March we had one of our new 9TB file servers burn out its backplane. This is very similar to the problems RAL have been seeing. The backplane was swapped out and the server is back on line now.&lt;br /&gt;We are in talks with the supplier.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6605857084622345326?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6605857084622345326/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6605857084622345326&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6605857084622345326'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6605857084622345326'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/04/oxford-update_08.html' title='Oxford Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5468452041517307703</id><published>2008-04-08T11:11:00.001+01:00</published><updated>2008-04-08T11:11:10.600+01:00</updated><title type='text'>Oxford Update</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5468452041517307703?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5468452041517307703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5468452041517307703&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5468452041517307703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5468452041517307703'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/04/oxford-update.html' title='Oxford Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1264723066795221882</id><published>2008-03-31T16:36:00.004+01:00</published><updated>2008-10-15T15:32:20.304+01:00</updated><title type='text'>Oxford adds another space token</title><content type='html'>With some advice from Graeme I have added another space token at Oxford&lt;br /&gt;Commands recorded for posterity.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-size:85%;" &gt;dpm-reservespace --gspace 3T --lifetime Inf --group atlas/Role=production --token_desc ATLASMCDISK&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-size:85%;" &gt;dpns-mkdir /dpm/physics.ox.ac.uk/home/atlas/atlasmcdisk&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-size:85%;" &gt;dpns-chgrp atlas/Role=production /dpm/physics.ox.ac.uk/home/atlas/atlasmcdisk&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-size:85%;" &gt;dpns-chmod 775 /dpm/physics.ox.ac.uk/home/atlas/atlasmcdisk&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-size:85%;" &gt;dpns-setacl -m d:g:atlas/Role=production:7,m:7 /dpm/physics.ox.ac.uk/home/atlas/atlasmcdisk&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-size:85%;" &gt;dpns-setacl -m g:atlas/Role=production:7,m:7 /dpm/physics.ox.ac.uk/home/atlas/atlasmcdisk&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;We started our DIY move today, two trips in the lab van with 8  1U servers each time.&lt;br /&gt;Tomorrow we plan to move more old WN's and then an empty rack later in the week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1264723066795221882?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1264723066795221882/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1264723066795221882&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1264723066795221882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1264723066795221882'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/03/oxford-adds-another-space-token.html' title='Oxford adds another space token'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-2252788041069172234</id><published>2008-03-28T14:40:00.003+01:00</published><updated>2008-03-28T14:55:42.319+01:00</updated><title type='text'>SouthGrid Update</title><content type='html'>Bristol:&lt;br /&gt;The first stage of the HPC cluster is running LCG jobs, and is being correctly accounted for.&lt;br /&gt;&lt;br /&gt;The  HPC WNs have AMD 2218 cores, 2.6GHz; these are said to be&lt;br /&gt;1.745 KSi2K each core.&lt;br /&gt;Currently gridpp can run max 32 jobs on this small stage 1 HPC cluster;&lt;br /&gt;&lt;br /&gt;Cambridge:&lt;br /&gt;Santanu is continuing to work with LHCb to solve all the problems running their code at Cambridge.&lt;br /&gt;The WNs will be upgraded to SL4 within the next few weeks.&lt;br /&gt;&lt;br /&gt;Birmingham:&lt;br /&gt;When over 100 of the 120 Babar cluster died after a power shutdown at the end of January, it was deemed not worth restoring the cluster. Two twin 1 u servers have been bought to replace this which will provide 32 cores and 78.4KSI2K.&lt;br /&gt;The old escience cluster is being setup as an SL4 ce and WN farm in as a template for the way they will drive the new University 'Blue Bear' HPC cluster. This cluster is made up of 31 dual 3GHz xeons.&lt;br /&gt;The main grid cluster (aka the atlas cluster) has been expanded to 60 cores.&lt;br /&gt;&lt;br /&gt;The PPS is not being maintained at the moment.&lt;br /&gt;&lt;br /&gt;RALPPD:&lt;br /&gt;Chirs has got space tokens working at RALPP ( updated to dCach1.8-12p6 - from p4 -  and also rebooted everything after putting in the srmSpaceManager enabled config files).&lt;br /&gt;The new hardware has been installed:&lt;br /&gt;8 boxes, 16 nodes, 32 CPUs so 128 cores.&lt;br /&gt;&lt;br /&gt;CPUs are "E5410 @ 2.33GHz" not sure of the kSI2k rating yet.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;JET:&lt;br /&gt;Running stable. WNs were updated to SL4 earlier this year.&lt;br /&gt;&lt;br /&gt;Oxford:&lt;br /&gt;Quotes to move the kit to Begbroke seem too high so we are going to adopt a DIY approach.&lt;br /&gt;Draining t2se01 is taking for ever. The dpm-drain command terminates sometimes after only 20mins (~6GB data transfered). We did however have a good run on the night of the 26th which lasted over 10 hours.&lt;br /&gt;Oddly the error log files are often the same size although not totally consistent.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;-rw-r--r--    1 root     root        22519 Feb 22 14:32 dpm-drain-errorlog-se01-1&lt;br /&gt;-rw-r--r--    1 root     root        22519 Feb 22 15:56 dpm-drain-errorlog-se01-2&lt;br /&gt;-rw-r--r--    1 root     root        22564 Feb 22 17:05 dpm-drain-errorlog-se01-3&lt;br /&gt;-rw-r--r--    1 root     root        22519 Feb 22 18:20 dpm-drain-errorlog-se01-4&lt;br /&gt;-rw-r--r--    1 root     root        22519 Feb 25 12:35 dpm-drain-errorlog-se01-5&lt;br /&gt;-rw-r--r--    1 root     root        22519 Feb 25 13:04 dpm-drain-errorlog-se01-6&lt;br /&gt;-rw-r--r--    1 root     root        22519 Feb 25 16:53 dpm-drain-errorlog-se01-7&lt;br /&gt;-rw-r--r--    1 root     root        22519 Mar 26 13:51 dpm-drain-errorlog-se01-8&lt;br /&gt;-rw-r--r--    1 root     root        22519 Mar 26 14:17 dpm-drain-errorlog-se01-9&lt;br /&gt;-rw-r--r--    1 root     root      1193287 Mar 27 00:56 dpm-drain-errorlog-se01-10&lt;br /&gt;-rw-r--r--    1 root     root       567836 Mar 27 13:59 dpm-drain-errorlog-se01-11&lt;br /&gt;-rw-r--r--    1 root     root        25241 Mar 27 14:57 dpm-drain-errorlog-se01-12&lt;br /&gt;-rw-r--r--    1 root     root        22598 Mar 27 15:37 dpm-drain-errorlog-se01-13&lt;br /&gt;-rw-r--r--    1 root     root        22598 Mar 27 16:22 dpm-drain-errorlog-se01-14&lt;br /&gt;-rw-r--r--    1 root     root        22598 Mar 27 17:11 dpm-drain-errorlog-se01-15&lt;br /&gt;-rw-r--r--    1 root     root        22598 Mar 27 21:55 dpm-drain-errorlog-se01-16&lt;br /&gt;-rw-r--r--    1 root     root        22598 Mar 27 23:15 dpm-drain-errorlog-se01-17&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-2252788041069172234?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/2252788041069172234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=2252788041069172234&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2252788041069172234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2252788041069172234'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/03/southgrid-update.html' title='SouthGrid Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-2834753569812812904</id><published>2008-03-25T12:02:00.002+01:00</published><updated>2008-03-25T12:07:46.897+01:00</updated><title type='text'>Oxford Update</title><content type='html'>The Original 74 cpu SL3 Dell workernodes have been taken down in preparation for reinstallation as SL4 worker nodes.&lt;br /&gt;We will maintain a separate ce to drive these but intend to separate out the torque server.&lt;br /&gt;The new ce will be an SL4 ce.&lt;br /&gt;The ce and torque server are likely to be virtual machines running under VMware.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The move of the 4 racks up to Begbroke is still uncertain.&lt;br /&gt;We are awaiting quotes from companies to move the equipment for us. The DIY price of just paying for a truck and driver has also been looked into but issues of insurance and warantee's may prevent use of this option.&lt;br /&gt;&lt;br /&gt;The steps up to the new computer room have been made smaller to allow installation of a scissor  lift, to lift racks to the false floor height. The date for the lift to be installed is still unclear.&lt;br /&gt;&lt;br /&gt;The grand opening on the 15th April is all too close.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-2834753569812812904?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/2834753569812812904/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=2834753569812812904&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2834753569812812904'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2834753569812812904'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/03/oxford-update.html' title='Oxford Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4649679696949025523</id><published>2008-02-05T11:57:00.000+01:00</published><updated>2008-02-05T12:03:24.423+01:00</updated><title type='text'>SouthGrid Technical Meeting at JET</title><content type='html'>The SouthGrid Technical Board met at JET.&lt;br /&gt;All sites are moving towards SL4. The recent updates will be applied shortly.&lt;br /&gt;The SouthGrid vo has been setup and a central LFC is being provided at RAL Tier 1 for it.&lt;br /&gt;The outstanding tickets were looked at and all found to be solved. A problem was found, tickets still open in footprints were closed in GGUS, so the link may not be working properly.&lt;br /&gt;Birmingham is setting up some ex escience nodes to be the interface to the new HPC cluster. This will bring back some of the spec int power lost due to hardware failures after recent electrical work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4649679696949025523?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4649679696949025523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4649679696949025523&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4649679696949025523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4649679696949025523'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/02/southgrid-technical-meeting-at-jet.html' title='SouthGrid Technical Meeting at JET'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3593313826924062768</id><published>2008-01-22T13:11:00.000+01:00</published><updated>2008-01-22T13:30:56.510+01:00</updated><title type='text'>Oxford Update</title><content type='html'>Plans to move the Oxford gridpp cluster up to Begbroke are being formulated.&lt;br /&gt;The first part of the plan is to ensure that only these nodes are using the subnet in question. We did some tidying up over the last week or so, before having the subnet  rerouted to both Physics and Begbroke. This change  was made this morning  at 8:50, and mostly went smoothly.&lt;br /&gt;Our ui needs to be moved back on to the physics subnet to allow NFS mounting of home directories to work.&lt;br /&gt;A new rack, PDU and network switch has been ordered to allow us to move a few test nodes up to Begbroke in advance of the main move.&lt;br /&gt;We aim to complete the move late Jan/ early Feb.&lt;br /&gt;&lt;br /&gt;The disk on our installation server which holds ganglia data and central syslog data failed today. We will restore from backups.&lt;br /&gt;t2wn05 has a failed hard disk which may have been acting as a black hole over the weekend.&lt;br /&gt;&lt;br /&gt;Working with ZEUS and LHCb VO's to improve usage of our cluster  uncovered some configuration problems.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Not all the nodes had the latest DESY VOMS server certs applied (stopped zeus working)&lt;/li&gt;&lt;li&gt;sgm ROLES were not mapped correctly for LHCB.&lt;/li&gt;&lt;/ol&gt;Finally the APEL problems seem to be behind us.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Configuration seemed to have changed at the last running of yaim before Christmas which stopped any records getting published&lt;/li&gt;&lt;li&gt;Installing the latest Development Apel rpms fixed the problem of not seeing the newer spec value for our new ce.&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3593313826924062768?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/3593313826924062768/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=3593313826924062768&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3593313826924062768'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3593313826924062768'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/01/oxford-update.html' title='Oxford Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6547899410739697419</id><published>2008-01-04T15:47:00.000+01:00</published><updated>2008-01-04T16:57:03.259+01:00</updated><title type='text'>Scheduled Power outage at Birmingham causes problems</title><content type='html'>The scheduled power outage at Birmingham on Saturday 8th December caused 19 Babar SL4 systems to fail. 4 bad disks appeared on the SL3 cluster. The age of this equipment is a cause for concern.&lt;br /&gt;&lt;br /&gt;There has been some concern expressed at small sites such at Bristol that the number of Atlas tests submitted by Steve Lloyds tests can over whelm their sites.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6547899410739697419?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6547899410739697419/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6547899410739697419&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6547899410739697419'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6547899410739697419'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2008/01/scheduled-power-outage-at-birmingham.html' title='Scheduled Power outage at Birmingham causes problems'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8858583145935354231</id><published>2007-12-11T10:48:00.000+01:00</published><updated>2007-12-11T11:35:53.283+01:00</updated><title type='text'>Oxford Gridpp Site becomes an NGS Affiliate</title><content type='html'>Not to be out done by Scotgrid, I should also point out that Oxford became an NGS affiliate at the same meeting (Dec 6th).  See &lt;a href="https://www.ngs.ac.uk/guide/affiliates/oxford-gridpp/"&gt;https://www.ngs.ac.uk/guide/affiliates/oxford-gridpp/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Oxford have added support for vo.southgrid.ac.uk, gridpp and supernemo.vo.eu-egee.org&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8858583145935354231?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8858583145935354231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8858583145935354231&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8858583145935354231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8858583145935354231'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/12/oxford-gridpp-site-becomes-ngs.html' title='Oxford Gridpp Site becomes an NGS Affiliate'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1183326272206410654</id><published>2007-12-07T17:52:00.000+01:00</published><updated>2007-12-07T17:56:12.781+01:00</updated><title type='text'>Birmingham HV Network Upgrade</title><content type='html'>High Voltage Network Upgrade, over this weekend, means several systems will be off over the weekend.&lt;br /&gt;It is hoped to keep the core service nodes up and running, but the number of worker nodes will be limited.&lt;br /&gt;&lt;br /&gt;ALICE VO Box was not accessable to the users for a day, no problems were found by Yves.&lt;br /&gt;Now reported as OK.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1183326272206410654?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1183326272206410654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1183326272206410654&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1183326272206410654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1183326272206410654'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/12/birmingham-hv-network-upgrade.html' title='Birmingham HV Network Upgrade'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7994341987829487824</id><published>2007-12-07T10:49:00.000+01:00</published><updated>2007-12-07T10:57:01.240+01:00</updated><title type='text'>SouthGrid Update</title><content type='html'>Bristol:&lt;br /&gt;Had some problems with LHCb users&lt;br /&gt;EDFA-JET:&lt;br /&gt;Upgraded WN's to SL4&lt;br /&gt;Birmingham:&lt;br /&gt;Disk failed on the se raid 5 disk array.&lt;br /&gt;Oxford:&lt;br /&gt;Upgraded the SL3 cluster to update 37. Some problems with the se, the DPM pool nodes had not had the latest lcg-vomscerts rpm applied. Secondly the site-info.def file on some of the nodes had an old entry for the ops vo which meant the gridmap file was not being created correctly.&lt;br /&gt;This was changed to include:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;VO_OPS_VOMS_SERVERS="'vomss://lcg-voms.cern.ch:8443/voms/ops?/ops/'&lt;br /&gt;                     'vomss://voms.cern.ch:8443/voms/ops?/ops/'"&lt;br /&gt;VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops'&lt;br /&gt;               'ops voms.cern.ch     15009 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch     ops'"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The addition of voms.cern.ch being the important bit, (lcg-voms.cern.ch) was the old entry.&lt;br /&gt;&lt;br /&gt;RALPPD:&lt;br /&gt;The BDII failed on Monday 3rd. A reboot fixed this.&lt;br /&gt;&lt;br /&gt;So now that Oxford is uptodate we can go ahead and add support for some new VOs,&lt;br /&gt;SouthGrid, gridpp and supernemo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7994341987829487824?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7994341987829487824/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7994341987829487824&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7994341987829487824'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7994341987829487824'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/12/southgrid-update.html' title='SouthGrid Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3823557069096636425</id><published>2007-12-05T23:41:00.001+01:00</published><updated>2007-12-05T23:44:04.404+01:00</updated><title type='text'>Random rm failures at Oxford</title><content type='html'>Random SAM test failures for rm, and later complaints from ATLAS were traced to one of the DPM pool nodes not having had the latest VOMS certs applied.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3823557069096636425?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/3823557069096636425/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=3823557069096636425&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3823557069096636425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3823557069096636425'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/12/random-rm-failures-at-oxford.html' title='Random rm failures at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4590570245596101026</id><published>2007-10-22T12:26:00.000+01:00</published><updated>2007-10-22T13:28:31.849+01:00</updated><title type='text'>dCache Tuning</title><content type='html'>I've been having a few issues since the start of the CMS CSA07 data challenge with SAM test failures with what seem to be mostly timeouts against my dCache Storage Element so I've been looking at improving my setup.&lt;br /&gt;&lt;br /&gt;One suggestion was to set up separate queues in dCache for local access (dcap, gsidcap and xrootd) and remote access (GridFTP).&lt;br /&gt;&lt;br /&gt;In general this is supposed to help when local farm jobs are reading slowly from lots of files and blocking the queues preventing the short GridFTP jobs from starting. Which is not the current case on my Storage Element, but it might also help by limiting the number of concurrent GridFTP transfers, which are very resource hungry without limiting the local access which is not.&lt;br /&gt;&lt;br /&gt;It was a very easy change to do requiring only changed to the /opt/d-cache/config/dCacheSetup file, not the indevidual batch files (on all the servers of course, though). I uncommented and set the following variables:&lt;br /&gt;&lt;br /&gt;poolIoQueue=dcapQ,gftpQ&lt;br /&gt;gsidcapIoQueue=dcapQ&lt;br /&gt;dcapIoQueue=dcapQ&lt;br /&gt;gsiftpIoQueue=gftpQ&lt;br /&gt;remoteGsiftpIoQueue=gftpQ&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The first variable sets up the two queues (the first queue is also the default on if no queue is specified).&lt;br /&gt;&lt;br /&gt;Then the rest of the settings specify which queue the different doors use.&lt;br /&gt;&lt;br /&gt;Unfortunately, the queue lengths are set per pool in the pool setup file so I had to edit a file for each pool on all the disk servers to change:&lt;br /&gt;&lt;br /&gt;mover set max active NNNN&lt;br /&gt;&lt;br /&gt;to:&lt;br /&gt;&lt;br /&gt;mover set max active -queue=dcapQ 1000&lt;br /&gt;mover set max active -queue=gftpQ 3&lt;br /&gt;&lt;br /&gt;After the changes to the config files I then had to restart all the services to pick up the new config. I also took the opportunity to enable readonly xrootd access to the SE but adding:&lt;br /&gt;&lt;br /&gt;XROOTD=yes&lt;br /&gt;&lt;br /&gt;to /opt/d-cache/etc/node_config on all the nodes&lt;br /&gt;&lt;br /&gt;and setting:&lt;br /&gt;&lt;br /&gt;xrootdIsReadOnly=true&lt;br /&gt;&lt;br /&gt;in the dCacheSetup file.&lt;br /&gt;&lt;br /&gt;After the restart the new queues showed up in the queue info pages and the xrootd doors on all the nodes showed up on the Cell Services page.&lt;br /&gt;&lt;br /&gt;I was also able to read files out from the xrootd door using standard babat tools (and was correctly blocked from writing data).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4590570245596101026?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4590570245596101026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4590570245596101026&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4590570245596101026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4590570245596101026'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/10/dcache-tuning.html' title='dCache Tuning'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1879822475910533640</id><published>2007-10-10T12:41:00.000+01:00</published><updated>2007-10-10T22:21:22.260+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gLite'/><category scheme='http://www.blogger.com/atom/ns#' term='SL4'/><title type='text'>SL4 Worker Node Migration at RALPP</title><content type='html'>Since I've now finished the migration my worker nodes to SL4 I thought I should describe the method used.&lt;br /&gt;&lt;br /&gt;The basic decision was to try to keep running an SL3 service in parallel with the initial test SL4 service and then gradually migrate nodes to the new service once it was production quality. I already had split my Torque/Maui services off onto a separate node and wanted to keep that setup with the SL4 service but did not want to (a) duplicate the torque server or (b) create another 24 queues for all the VOs. To get round this I decided to:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Install a new "SL4" CE pointing to the production PBS node, this needed a different site info.def file with it named as the CE_HOST and the GlueOperatingSystem settings set for SL4 obviously&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Create node properties on the SL3 and SL4 nodes to let the batch system route jobs based on OS&lt;/li&gt;&lt;li&gt;Hack the lcgpbs jobmanagers on the two CEs to apply requirements on the node properties as it submits the job&lt;/li&gt;&lt;/ul&gt;Running multiple CEs all pointing to the same torque server is fairly simple to do, there is a "BATCH_SERVER" setting in YAIM (3.1 and later, TORQUE_SERVER before that) that you just point at your the torque/maui server and that configures the CE to submit it jobs via that machine. Then there are a couple of other things you have to take care of:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The gridmapdir has to be shared between all the CEs. Otherwise there is a possibility that either the same DN will be mapped to multiple pool accounts or worse that different DNs will be mapped to the same pool account by the different CEs.&lt;/li&gt;&lt;li&gt;The worker nodes need to have the ssh host keys for all the CEs to be able to get the job data back but YAIM will only set one up. The fix is to edit the NODES line in "/opt/edg/etc/edg-pbs-knownhosts.conf" to add all the CEs and your torque server&lt;/li&gt;&lt;li&gt;If the CEs are submitting the same worker nodes you might also want to mount the VO tag are across all the CEs so that VOs don't have to publish the same tags to all the CEs&lt;/li&gt;&lt;/ol&gt;Node properties are very easy either just edit the torque nodes file to add them or use "qmgr -c "set node $node properties += SL4". I also added "test" and "prod" properties to all the nodes but more of that below.&lt;br /&gt;&lt;br /&gt;Finally I needed to change the job manager to require the properties to direct jobs going to the different CEs to different classes of workers based on the above properties. The lcgpbs jobmanager already writes a node requirement in into the job script it submits to torque and so it is easy to rewrite this to add node properties as well. If you look in "/opt/globus/setup/globus/lcgpbs.in" you'll see three places where it writes "#PBS -l nodes=" to set the requirement on the number of CPUs and you need to add :SL4 (or :SL3) to the end of the write.&lt;br /&gt;&lt;br /&gt;After doing that, installing some SL4 worker nodes was very simple, about the only necessary change to the site-info.def file was to change the "GLOBUS_TCP_PORT_RANGE" to be space separated rather than comma separated.&lt;br /&gt;&lt;br /&gt;With the above hacks in place I was able to leave my old CE happily submitting jobs to the SL3 nodes while I was testing the SL4 worker nodes then gradually move the worker nodes over to Sl4. Before Moving the final worker nodes over I modified the batch system information provider to report the queues as "Draining"  whatever their real status. Once all the worker nodes were migrated to SL4 I could just remove the changes to the lcgpbs jobmanager changes and both CEs became equivalent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1879822475910533640?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1879822475910533640/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1879822475910533640&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1879822475910533640'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1879822475910533640'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/10/sl4-worker-node-migration-at-ralpp.html' title='SL4 Worker Node Migration at RALPP'/><author><name>ChrisB</name><uri>http://www.blogger.com/profile/15194428640424784638</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-2783421347946972689</id><published>2007-09-24T10:55:00.000+01:00</published><updated>2007-09-27T15:52:27.298+01:00</updated><title type='text'>Oxford's Tier 2 Upgrade is joining the grid.</title><content type='html'>The 22 new worker nodes are starting to come on line now.&lt;br /&gt;They are running SL4 32bit mode for now. They will provide an additional 431 K Spec Int 2000.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pogTNV-B63A/Rvp_KDCqlfI/AAAAAAAAACc/o6Fr09iAPHk/s1600-h/P9240292.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_pogTNV-B63A/Rvp_KDCqlfI/AAAAAAAAACc/o6Fr09iAPHk/s320/P9240292.JPG" alt="" id="BLOGGER_PHOTO_ID_5114540137424524786" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;A second ce t2ce03.physics.ox.ac.uk has been setup to serve the SL4 WN's. We had some trouble with the BDII being on the original ce so have split that function off onto a new node  (Well actually a VM ).&lt;br /&gt;&lt;br /&gt;The upgrade also includes 4 heads nodes with dual PSUs, and mirrored systems disks, which can be used for service functions or as worker nodes. All the head nodes and disk servers are protected by UPS.&lt;br /&gt;&lt;br /&gt;The 11 storage servers (9TB usable each) will be brought on line over the next week.&lt;br /&gt;&lt;br /&gt;The two new (Viglen supplied) racks are on the right hand side, with the older Dell kit on the left.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-2783421347946972689?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/2783421347946972689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=2783421347946972689&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2783421347946972689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/2783421347946972689'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/09/oxfords-tier-2-upgrade-is-joining-grid.html' title='Oxford&apos;s Tier 2 Upgrade is joining the grid.'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pogTNV-B63A/Rvp_KDCqlfI/AAAAAAAAACc/o6Fr09iAPHk/s72-c/P9240292.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8539658533265230540</id><published>2007-09-14T17:26:00.000+01:00</published><updated>2007-09-14T18:14:59.222+01:00</updated><title type='text'>Oxford Local Computer Room Goes Live</title><content type='html'>The local computer room was completed last Friday. All power is ready under each of the 21 rack positions. Each rack position has 4 CAT6 cables connected to the networking rack which can be seen. Other things completed were; ceiling lights, painting, smoke detection system, and door fitting.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pogTNV-B63A/Ruq-pweSnuI/AAAAAAAAACE/QBG1FJLEu8Q/s1600-h/IMAGE_057.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_pogTNV-B63A/Ruq-pweSnuI/AAAAAAAAACE/QBG1FJLEu8Q/s320/IMAGE_057.jpg" alt="" id="BLOGGER_PHOTO_ID_5110106351801114338" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;On Monday 10th two existing compute racks were installed and two empty racks for the cluster upgrade arrived. A rack full of worker nodes for the existing grid cluster can be seen and is up and running.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pogTNV-B63A/Ruq_zQeSnvI/AAAAAAAAACM/3sQl7Q0aFSE/s1600-h/P9130246.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_pogTNV-B63A/Ruq_zQeSnvI/AAAAAAAAACM/3sQl7Q0aFSE/s320/P9130246.JPG" alt="" id="BLOGGER_PHOTO_ID_5110107614521499378" border="0" /&gt;&lt;/a&gt;&lt;p&gt;Today the servers arrived from Viglen and installation has commenced.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pogTNV-B63A/RurAqQeSnwI/AAAAAAAAACU/kYXsnF6W9CA/s1600-h/P9140266.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_pogTNV-B63A/RurAqQeSnwI/AAAAAAAAACU/kYXsnF6W9CA/s320/P9140266.JPG" alt="" id="BLOGGER_PHOTO_ID_5110108559414304514" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8539658533265230540?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8539658533265230540/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8539658533265230540&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8539658533265230540'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8539658533265230540'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/09/oxford-local-computer-room-goes-live.html' title='Oxford Local Computer Room Goes Live'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_pogTNV-B63A/Ruq-pweSnuI/AAAAAAAAACE/QBG1FJLEu8Q/s72-c/IMAGE_057.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7083048375128185648</id><published>2007-08-21T13:44:00.001+01:00</published><updated>2007-08-21T13:50:07.839+01:00</updated><title type='text'>Oxford Computer Room Update</title><content type='html'>Progress on wiring for Power and Networking is scheduled to be completed this week.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pogTNV-B63A/RsrfDItOJHI/AAAAAAAAAB8/qFHY65-kMBg/s1600-h/P8170240.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_pogTNV-B63A/RsrfDItOJHI/AAAAAAAAAB8/qFHY65-kMBg/s320/P8170240.JPG" alt="" id="BLOGGER_PHOTO_ID_5101134772920263794" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pogTNV-B63A/Rsrez4tOJGI/AAAAAAAAAB0/SrQ3QHiFuJo/s1600-h/P8170239.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_pogTNV-B63A/Rsrez4tOJGI/AAAAAAAAAB0/SrQ3QHiFuJo/s320/P8170239.JPG" alt="" id="BLOGGER_PHOTO_ID_5101134510927258722" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The Oxford Grid Cluster upgrade has been ordered, and should be delivered in early September, to be installed here.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7083048375128185648?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7083048375128185648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7083048375128185648&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7083048375128185648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7083048375128185648'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/08/oxford-computer-room-update.html' title='Oxford Computer Room Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pogTNV-B63A/RsrfDItOJHI/AAAAAAAAAB8/qFHY65-kMBg/s72-c/P8170240.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1594882349947767402</id><published>2007-07-18T10:44:00.001+01:00</published><updated>2007-07-18T11:18:17.241+01:00</updated><title type='text'>SL4 progress</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Cambridge&lt;/span&gt; has converted its DPM server to 64bit SL4. Plan to start migrating WNs next week.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Birmingham&lt;br /&gt;&lt;/span&gt;Very easy to deploy, 32bit SL4, yum and yaim used. Using existing second ce to direct jobs to the WNs. Have passed tests from OPS. Babar farm switched off due to Air Conditioning problems. &lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;RALPPD&lt;/span&gt;&lt;br /&gt;dcache servers are running SL4&lt;br /&gt;30 WN cpus now running 32bit SL4 there is a new ce to direct jobs to these. This will be advertised from next Monday (23rd July)&lt;br /&gt;&lt;br /&gt;SouthGrid shared calendar setup in google to help coordinate holidays and meetings.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1594882349947767402?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1594882349947767402/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1594882349947767402&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1594882349947767402'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1594882349947767402'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/07/sl4-progress.html' title='SL4 progress'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8559689480454859182</id><published>2007-07-17T12:49:00.000+01:00</published><updated>2007-07-17T12:59:00.659+01:00</updated><title type='text'>Oxford site ce swamped by Biomed jobs</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pogTNV-B63A/Rpyu9GaKP_I/AAAAAAAAABs/bN4NH6RdsQg/s1600-h/pbsgrapht2-jul07.php"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://1.bp.blogspot.com/_pogTNV-B63A/Rpyu9GaKP_I/AAAAAAAAABs/bN4NH6RdsQg/s320/pbsgrapht2-jul07.php" alt="" id="BLOGGER_PHOTO_ID_5088134043737407474" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;At the end of last week the Oxford ce was swamped by hundreds of biomed jobs, the que was disabled, and the ce rebooted, but manual killing and tidying up was required before the ce stabilised.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8559689480454859182?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8559689480454859182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8559689480454859182&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8559689480454859182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8559689480454859182'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/07/oxford-site-ce-swamped-by-biomed-jobs.html' title='Oxford site ce swamped by Biomed jobs'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_pogTNV-B63A/Rpyu9GaKP_I/AAAAAAAAABs/bN4NH6RdsQg/s72-c/pbsgrapht2-jul07.php' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1043767449514659409</id><published>2007-07-17T12:40:00.001+01:00</published><updated>2007-07-17T12:48:44.700+01:00</updated><title type='text'>Oxford DWB Computer room update</title><content type='html'>The floor is complete&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pogTNV-B63A/Rpyq3maKP8I/AAAAAAAAABU/T8Vp0V5Dq54/s1600-h/P7090277.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_pogTNV-B63A/Rpyq3maKP8I/AAAAAAAAABU/T8Vp0V5Dq54/s320/P7090277.JPG" alt="" id="BLOGGER_PHOTO_ID_5088129551201615810" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;External power boards ready, and live.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pogTNV-B63A/RpyrHWaKP9I/AAAAAAAAABc/Xm_Uy5_qCpM/s1600-h/P7090278.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_pogTNV-B63A/RpyrHWaKP9I/AAAAAAAAABc/Xm_Uy5_qCpM/s320/P7090278.JPG" alt="" id="BLOGGER_PHOTO_ID_5088129821784555474" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pogTNV-B63A/RpyrgmaKP-I/AAAAAAAAABk/4AnufjTRQ9s/s1600-h/P7160280.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_pogTNV-B63A/RpyrgmaKP-I/AAAAAAAAABk/4AnufjTRQ9s/s320/P7160280.JPG" alt="" id="BLOGGER_PHOTO_ID_5088130255576252386" border="0" /&gt;&lt;/a&gt;Walls painted, smoke detections systems installed (red pipes) and the ceiling is being installed this week.&lt;br /&gt;&lt;br /&gt;Under floor electrical wiring and network cabling should start tomorrow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1043767449514659409?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1043767449514659409/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1043767449514659409&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1043767449514659409'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1043767449514659409'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/07/oxford-dwb-computer-room-update.html' title='Oxford DWB Computer room update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pogTNV-B63A/Rpyq3maKP8I/AAAAAAAAABU/T8Vp0V5Dq54/s72-c/P7090277.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-6656472317995777334</id><published>2007-06-22T15:45:00.000+01:00</published><updated>2007-06-22T15:59:43.392+01:00</updated><title type='text'>Oxford local Computer room update</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pogTNV-B63A/RnvinkPoWkI/AAAAAAAAAA0/7b-MA_s-Jsc/s1600-h/P6200227.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_pogTNV-B63A/RnvinkPoWkI/AAAAAAAAAA0/7b-MA_s-Jsc/s320/P6200227.JPG" alt="" id="BLOGGER_PHOTO_ID_5078902174162377282" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Work is progressing on the new local room, which is just as well as there are delays on the Begbroke room, which will not be ready till late summer/early autumn.&lt;br /&gt;The floor has been sealed with Vinyl.&lt;br /&gt;&lt;br /&gt;Electrical switching has been connected up.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pogTNV-B63A/RnvjEEPoWlI/AAAAAAAAAA8/dxgPW372ApA/s1600-h/P6200229.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_pogTNV-B63A/RnvjEEPoWlI/AAAAAAAAAA8/dxgPW372ApA/s320/P6200229.JPG" alt="" id="BLOGGER_PHOTO_ID_5078902663788649042" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;And the false floor is being installed.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pogTNV-B63A/RnvjRkPoWmI/AAAAAAAAABE/1YlJpuQy8L4/s1600-h/P6220231.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_pogTNV-B63A/RnvjRkPoWmI/AAAAAAAAABE/1YlJpuQy8L4/s320/P6220231.JPG" alt="" id="BLOGGER_PHOTO_ID_5078902895716883042" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-6656472317995777334?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/6656472317995777334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=6656472317995777334&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6656472317995777334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/6656472317995777334'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/06/oxford-local-computer-room-update.html' title='Oxford local Computer room update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pogTNV-B63A/RnvinkPoWkI/AAAAAAAAAA0/7b-MA_s-Jsc/s72-c/P6200227.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3027355844382462578</id><published>2007-06-22T15:41:00.001+01:00</published><updated>2007-06-22T15:45:40.312+01:00</updated><title type='text'>Southgrid Update</title><content type='html'>Bristol.&lt;br /&gt;Plans under way to make use of the new HPC cluster. Meetings started to work out a strategy and solve technical problems.&lt;br /&gt;&lt;br /&gt;Cambridge&lt;br /&gt;DPM upgrade was a nightmare, with help from Grieg and Yves, Santanu has now got the se up upgraded to DPM 1.6.4&lt;br /&gt;&lt;br /&gt;Birmingham&lt;br /&gt;Problems publishing APEL data are under investigation&lt;br /&gt;&lt;br /&gt;Oxford&lt;br /&gt;Support for ngs.ac.uk enabled, tests by Steven Young, from NGS at Oxford are starting. Pete attended the NGS User forum and training event held in the OERC building in Oxford.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3027355844382462578?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/3027355844382462578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=3027355844382462578&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3027355844382462578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3027355844382462578'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/06/southgrid-update.html' title='Southgrid Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7426854996592483235</id><published>2007-06-12T09:35:00.001+01:00</published><updated>2007-06-12T09:46:10.596+01:00</updated><title type='text'>Rapid progress on Oxford's local computer room</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pogTNV-B63A/Rm5bDkPoWgI/AAAAAAAAAAU/y-ltGOrf8gw/s1600-h/P4110099.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_pogTNV-B63A/Rm5bDkPoWgI/AAAAAAAAAAU/y-ltGOrf8gw/s320/P4110099.JPG" alt="" id="BLOGGER_PHOTO_ID_5075093946920098306" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This was the space allocated on level 1 just after the old offices had been cleared out on April 11th.&lt;br /&gt;&lt;br /&gt;Since then the walls have been dry lined, the AC units and pipe work are in place.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pogTNV-B63A/Rm5cRUPoWjI/AAAAAAAAAAs/g2CsWEPzVA0/s1600-h/P6060191.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_pogTNV-B63A/Rm5cRUPoWjI/AAAAAAAAAAs/g2CsWEPzVA0/s320/P6060191.JPG" alt="" id="BLOGGER_PHOTO_ID_5075095282654927410" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Heavy electrical work is ongoing and the floor is being prepared.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pogTNV-B63A/Rm5bzUPoWhI/AAAAAAAAAAc/z-lfsRdOSfw/s1600-h/P6080197.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_pogTNV-B63A/Rm5bzUPoWhI/AAAAAAAAAAc/z-lfsRdOSfw/s320/P6080197.JPG" alt="" id="BLOGGER_PHOTO_ID_5075094767258851858" border="0" /&gt;&lt;/a&gt;Also the forth wall has been built.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pogTNV-B63A/Rm5cFEPoWiI/AAAAAAAAAAk/1kftvWn70Bw/s1600-h/P6080199.JPG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://1.bp.blogspot.com/_pogTNV-B63A/Rm5cFEPoWiI/AAAAAAAAAAk/1kftvWn70Bw/s320/P6080199.JPG" alt="" id="BLOGGER_PHOTO_ID_5075095072201529890" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We are hopeful that the room will be complete by the end of July.&lt;br /&gt;The floor will be sealed this week prior to the false floor being installed. Electrical cabling will then commence.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7426854996592483235?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7426854996592483235/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7426854996592483235&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7426854996592483235'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7426854996592483235'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/06/rapid-progress-on-oxfords-local.html' title='Rapid progress on Oxford&apos;s local computer room'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pogTNV-B63A/Rm5bDkPoWgI/AAAAAAAAAAU/y-ltGOrf8gw/s72-c/P4110099.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8517378858283597464</id><published>2007-05-25T15:35:00.000+01:00</published><updated>2007-05-25T15:38:14.551+01:00</updated><title type='text'>Nagios Monitoring</title><content type='html'>Nagios is being setup at Oxford. So far all nodes are tested using ssh to check that they are up and running.&lt;br /&gt;NRPE is being installed to allow check on disk space to be carried out.&lt;br /&gt;Further instructions can be found in the talk by Chris Brew at HEPSYSMAN&lt;br /&gt;&lt;a href="http://hepwww.rl.ac.uk/sysman/may2007/agenda.html"&gt;http://hepwww.rl.ac.uk/sysman/may2007/agenda.html&lt;/a&gt;&lt;br /&gt;or at the System management wiki&lt;br /&gt;&lt;a href="http://www.sysadmin.hep.ac.uk/wiki/Nagios"&gt;http://www.sysadmin.hep.ac.uk/wiki/Nagios&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8517378858283597464?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8517378858283597464/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8517378858283597464&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8517378858283597464'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8517378858283597464'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/05/nagios-monitoring.html' title='Nagios Monitoring'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-913300775386401966</id><published>2007-05-25T15:32:00.000+01:00</published><updated>2007-05-25T15:34:25.418+01:00</updated><title type='text'>SouthGrid Dashboard</title><content type='html'>SouthGrid dashboard setup a la ScotGrid and North Grid.&lt;br /&gt;See &lt;a href="http://www.gridpp.ac.uk/wiki/Southgrid-Dashboard"&gt;http://www.gridpp.ac.uk/wiki/Southgrid-Dashboard&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-913300775386401966?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/913300775386401966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=913300775386401966&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/913300775386401966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/913300775386401966'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/05/southgrid-dashboard.html' title='SouthGrid Dashboard'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-7033071250064464002</id><published>2007-04-30T15:06:00.000+01:00</published><updated>2007-04-30T15:24:10.813+01:00</updated><title type='text'>Multiple failures at Oxford explained</title><content type='html'>Oxford ran out of disk space on its DPM SE. This caused the rm SAM test to fail. This  was due to ATLAS taking up all the available disk space on our SE.  We managed to clear some space from dteam and this allowed us to start passing the tests again. The bigger problem remains , that as there is currently no quota mechanism in DPM, we can not prevent this happening again. We only have two (1.6TB) pools and both are assigned to all VO's. It is not possible to allocate a pool exclusively to ops, or to keep ATLAS on their own without completely removing all data and re designing the pools. This is a non starter.&lt;br /&gt;When more disk space is added consideration will be given to allocate dedicated pools for some VOs.&lt;br /&gt;&lt;br /&gt;Oxford then started failing other tests, this was caused by multiple worker nodes having either full /home or / partitions. This highlights the necessity of monitoring disk usage with Nagios.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-7033071250064464002?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/7033071250064464002/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=7033071250064464002&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7033071250064464002'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/7033071250064464002'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/04/multiple-failures-at-oxford-explained.html' title='Multiple failures at Oxford explained'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3538165239445647404</id><published>2007-03-27T13:07:00.001+01:00</published><updated>2008-04-28T13:59:23.391+01:00</updated><title type='text'>Oxford tries out MonAMI</title><content type='html'>During the gridpp collaboration meeting I was persuaded to give MonAMI a go.&lt;br /&gt;Installing the rpm from the sourceforge web site was easy enough.&lt;br /&gt;&lt;a href="http://monami.sourceforge.net/"&gt;http://monami.sourceforge.net/&lt;/a&gt;&lt;br /&gt;Also see the link from the gridpp wiki &lt;a href="http://www.gridpp.ac.uk/wiki/MonAMI"&gt;http://www.gridpp.ac.uk/wiki/MonAMI&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As I already use ganglia the idea was that I'd run some checks on disk space and DPM and send the output to ganglia. The first thing we noticed was that in order for some of the features to work you need to be running at least v3 of ganglia. I was still running v2.5, a quick upgrade of the gmond rpms and a new gmond.conf was required.&lt;br /&gt;You do also require mysql. (For the DPM plugin - more later)&lt;br /&gt;&lt;br /&gt;The main configuration file is /etc/monami.conf, but this can read further files in /etc/monami.d, so we set about making a basic file to monitor the root file system.&lt;br /&gt;&lt;br /&gt;[filesystem]&lt;br /&gt;name=root-fs&lt;br /&gt;location=/&lt;br /&gt;&lt;br /&gt;[sample]&lt;br /&gt;interval=1m&lt;br /&gt;read = root-fs.blocks.free&lt;br /&gt;write = ganglia&lt;br /&gt;&lt;br /&gt;[snapshot]&lt;br /&gt;name=simple-snapshot&lt;br /&gt;filename=/tmp/monami-simple-snapshot&lt;br /&gt;&lt;br /&gt;[ganglia]&lt;br /&gt;multicast_ip_address = 239.2.11.95&lt;br /&gt;multicast_port = 8656&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;more coming soon....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3538165239445647404?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3538165239445647404'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3538165239445647404'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/03/oxford-tries-out-monami.html' title='Oxford tries out MonAMI'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4863939725404889072</id><published>2007-03-08T13:07:00.000+01:00</published><updated>2007-03-08T13:21:42.228+01:00</updated><title type='text'>CAMONT jobs successfuly running at Oxford</title><content type='html'>The CAMONT VO has now been working correctly at Oxford since Friday 2nd March.&lt;br /&gt;Karl Harrison of Cambridge has been running jobs from Cambrdige.&lt;br /&gt;&lt;br /&gt;In another Cambridge collaboration, LHCB software has been installed on a Windows server 2003 test node at Oxford by Ying Ying Li from Cambridge. They are testing the use of Windows for LHCb analysis code, and having tetsed at Cambridge were looking to prove it could work at other sites. Ideally they would like some more test nodes and 0.5 TB of disk space. This may be harder to find.&lt;br /&gt;&lt;br /&gt;Cambridge ran the Atlas DPM ACL fix on Monday 5th when I (PDG) visited Santanu. Now all SouthGrid sites have run the required fix.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pogTNV-B63A/Re__pDS5WFI/AAAAAAAAAAM/jAXU6FIvfaI/s1600-h/intel5150.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_pogTNV-B63A/Re__pDS5WFI/AAAAAAAAAAM/jAXU6FIvfaI/s320/intel5150.jpg" alt="" id="BLOGGER_PHOTO_ID_5039527588775155794" border="0" /&gt;&lt;/a&gt;I took the opportunity to measure the power consumption of the new Dell 1950's (Intel 5150 cpus).  Idle power consumtion is about 200W rising to 285 under load (4 cpu intensive jobs).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4863939725404889072?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4863939725404889072/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4863939725404889072&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4863939725404889072'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4863939725404889072'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/03/camont-jobs-successfuly-running-at.html' title='CAMONT jobs successfuly running at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pogTNV-B63A/Re__pDS5WFI/AAAAAAAAAAM/jAXU6FIvfaI/s72-c/intel5150.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5706831627196635263</id><published>2007-03-01T10:57:00.000+01:00</published><updated>2007-03-01T11:15:48.829+01:00</updated><title type='text'>Oxford and the ATLAS DPM ACL fix.</title><content type='html'>I tried to run the ATLAS patch program yesterday to fix the ACL's on the DPM server at Oxford.&lt;br /&gt;This update has be provided as a binary from ATLAS that has to be run as root on the se. This was potentially dangerous and many sites had delayed running this, and objected to the fact that we don't really know what it is doing. Anyway the pragmatic approach seemed to be that most other sites had run it now so I would.&lt;br /&gt;The configuration file has to be edited to match the local sites config.&lt;br /&gt;I perfomed a normal file backup using the HFS software Tivolis Storage Manager.&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;dsmc incr&lt;/span&gt;&lt;br /&gt;Then dumped the mysql data base.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="color: rgb(0, 0, 153);"&gt;mysqldump --user=root --password=****** --opt --all-databases | gzip -c &gt; mysql-dump-280207.sql.gz&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;As our main DPM server was currently set readonly (To cope with the DPM bug of not sharing across pools properly) we decided to set it back to read/write for the update.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="color: rgb(0, 0, 153);"&gt;dpm-modifyfs --server t2se01.physics.ox.ac.uk --fs /storage --st 0&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;Then run the update program (refered to as a script in some docs):&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;./UpdateACLForMySQL&lt;/span&gt;&lt;br /&gt;Unfortuneatly I had used the wrong password in the config file so it failed,&lt;br /&gt;this is where a strange feature of the update program was discovered.&lt;br /&gt;After it runs it removes several entries from the config file , the password &lt;span style="font-weight: bold;"&gt;and &lt;/span&gt;the gid entry, so after several attempts the correct config file was used and the update appears to have been successful.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="color: rgb(0, 0, 153);"&gt;dpns-getacl /dpm/physics.ox.ac.uk/home/atlas/dq2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/span&gt;Shows the acls.&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;# file: /dpm/physics.ox.ac.uk/home/atlas/dq2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;# owner: atlas002&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;# group: atlas&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;user::rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;group::rwx              #effective:rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;group:atlas/Role=production:rwx         #effective:rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;mask::rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;other::r-x&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;default:user::rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;default:group::rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;default:group:atlas/Role=production:rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;default:mask::rwx&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;default:other::r-x&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I reset the main DPM server back to read only:&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;dpm-modifyfs --server t2se01.physics.ox.ac.uk --fs /storage --st RDONLY&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The process was not simple or clear and I hope not to have to do more for other VO's...&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5706831627196635263?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5706831627196635263/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5706831627196635263&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5706831627196635263'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5706831627196635263'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/03/oxford-and-atlas-dpm-acl-fix.html' title='Oxford and the ATLAS DPM ACL fix.'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-875395874563820187</id><published>2007-03-01T10:54:00.001+01:00</published><updated>2007-03-01T10:57:01.288+01:00</updated><title type='text'>Birmingham suffering from multiple hardware failures</title><content type='html'>The Babar cluster at Birmingham which is made up of older kit salvaged from QMUL and Bristol plus the original Birmingham cluster, is suffering from h/w problems.&lt;br /&gt;7 worker node disks have died, some systems have kernel panics, and the globus MDS service is playing up. Yves is working hard to fix things but maybe we are just getting to the end of the useful life of much of this kit?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-875395874563820187?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/875395874563820187/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=875395874563820187&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/875395874563820187'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/875395874563820187'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/03/birmingham-suffering-from-multiple.html' title='Birmingham suffering from multiple hardware failures'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-5991819436399631712</id><published>2007-02-27T15:22:00.001+01:00</published><updated>2007-02-27T15:27:06.653+01:00</updated><title type='text'>glite UI update fix</title><content type='html'>To resolve the missing dependancy on the ui for the last two updates of glite&lt;br /&gt;in particular the glite-ui-config rpm required python-fpconst.&lt;br /&gt;You can get this rpm from cern see:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://linuxsoft.cern.ch/repository//python-fpconst.html"&gt;http://linuxsoft.cern.ch/repository//python-fpconst.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Use&lt;br /&gt; wget http://linuxsoft.cern.ch/cern/SLC30X/i386/SL/RPMS/python-fpconst-0.6.0-3.noarch.rpm&lt;br /&gt;&lt;br /&gt;to add this to your local repository.&lt;br /&gt;Then yum -y update will work once again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-5991819436399631712?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/5991819436399631712/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=5991819436399631712&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5991819436399631712'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/5991819436399631712'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/glite-ui-update-fix.html' title='glite UI update fix'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4376005951206429044</id><published>2007-02-26T15:04:00.000+01:00</published><updated>2007-02-26T15:09:04.712+01:00</updated><title type='text'>Another Workernode Hard drive failure at Oxford</title><content type='html'>t2wn37 Hard drive failed over the weekend. Dell will replace.&lt;br /&gt;This follows on from t2wn04 last week, and the PSU in t2lfc01 a few weeks before. t2lfc01 was one of the gridpp supplied nodes from Streamline. Replacement of the PSU took several weeks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4376005951206429044?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4376005951206429044/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4376005951206429044&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4376005951206429044'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4376005951206429044'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/another-workernode-hard-drive-failure.html' title='Another Workernode Hard drive failure at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1187674860538972579</id><published>2007-02-20T17:52:00.000+01:00</published><updated>2007-02-20T17:53:46.524+01:00</updated><title type='text'>Fusion jobs successfully running at Oxford</title><content type='html'>FUSION jobs have now run succesfully at Oxford.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1187674860538972579?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1187674860538972579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1187674860538972579&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1187674860538972579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1187674860538972579'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/fusion-jobs-successfully-running-at.html' title='Fusion jobs successfully running at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-4027007262439382285</id><published>2007-02-20T13:37:00.000+01:00</published><updated>2007-02-20T13:49:32.358+01:00</updated><title type='text'>Birmingham Network reconfiguration</title><content type='html'>Yves reports:&lt;br /&gt;Our site has been down since yesterday morning 10am until today 1000am due to network problem which IS have linked to a faulty link with a campus switch.&lt;br /&gt;....&lt;br /&gt; IS have temporarily disabled the physics link to the library switch, one of our two links to the network, and this has fixed the connectivity problem from the outside world to our grid box.&lt;br /&gt;                                                                                                                               &lt;br /&gt;They will re-instate the link (for resilience) when they've got to the bottom of the problem (faulty fibre, or whatever).&lt;br /&gt;--&gt;&lt;br /&gt;&lt;br /&gt;So, it'be interesting to see the gridmon result in the current configuration while waiting for IS to understand the problem.&lt;br /&gt;&lt;br /&gt;This may be the cause of the 33% UDP Packet loss we have been seeing to/from birmingham.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-4027007262439382285?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/4027007262439382285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=4027007262439382285&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4027007262439382285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/4027007262439382285'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/birmingham-network-reconfiguration.html' title='Birmingham Network reconfiguration'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-1112936231966506109</id><published>2007-02-20T13:34:00.000+01:00</published><updated>2007-02-20T13:37:09.263+01:00</updated><title type='text'>FUSION VO Problems at Oxford</title><content type='html'>At Oxford we had reports from Fusion of problems:&lt;br /&gt;"We have checked that FUSION jobs fail at your site with the error "37 the provided RSL 'queue' parameter is invalid". This is because "fusion" is missing at the end of the file /opt/globus/share/globus_gram_job_manager/lcgpbs.rvf in your CE ("fusion" should be included in the list of Values of the attribute "queue"). We also noticed that the FUSION VOMS server certificate ([1]) is not installed at /etc/grid-security/vomsdir/ in your CE."&lt;br /&gt;&lt;br /&gt;I down loaded the cert from :&lt;br /&gt; &lt;a href="http://swevo.ific.uv.es/vo/files/swevo.ific.uv.es-oct2006.pem"&gt;http://swevo.ific.uv.es/vo/files/swevo.ific.uv.es-oct2006.pem&lt;/a&gt;&lt;br /&gt;&lt;br /&gt; and ran&lt;br /&gt;/opt/glite/yaim/scripts/run_function /root/yaim-conf/site-info.def config_globus&lt;br /&gt;which made the 4 VO's I recently added appear in the files lcgpbs.rvf and pbs.rvf in&lt;br /&gt;/opt/globus/share/globus_gram_job_manager/ .&lt;br /&gt;I can only assume that we had had errors when we ran yaim the first time as the 4 new&lt;br /&gt;VO's had not appeared the first time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-1112936231966506109?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/1112936231966506109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=1112936231966506109&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1112936231966506109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/1112936231966506109'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/fusion-vo-problems-at-oxford.html' title='FUSION VO Problems at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-8254880525414040552</id><published>2007-02-19T14:22:00.000+01:00</published><updated>2007-02-19T14:30:50.376+01:00</updated><title type='text'>Problems with DNS style VO names</title><content type='html'>We have now discovered that adding the camont VO is not straight forward due to the new DNS style VO name.&lt;br /&gt;The current yaim  cannot handle the long format for VO names. The new yaim 3.1 which is not yet released should help but has not yet been tested.&lt;br /&gt;Yves has had a look at it and it is very different from the current version.&lt;br /&gt;"Hello all,&lt;br /&gt;&lt;br /&gt;I got hold of the new version of yaim and there are some non-trivial differences with the production version. I think it would be ill advised to try the new version in production. I think we could enable this new vo style by configuring gip by hand and then perform the correct queue to group mapping for pbs/condor. But instead of all sites doing this (plus potential RB complications?), couldn't we revert to the current vo style (if running jobs is urgent), I do not understand while the new vo style should be implemented on production sites when it is still awaiting certification and has not even been tried on the pre-production service?&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;&lt;br /&gt;Yves&lt;br /&gt;"&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-8254880525414040552?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/8254880525414040552/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=8254880525414040552&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8254880525414040552'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/8254880525414040552'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/problems-with-dns-style-vo-names.html' title='Problems with DNS style VO names'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-3181739170077003016</id><published>2007-02-15T12:15:00.000+01:00</published><updated>2007-02-15T12:16:35.037+01:00</updated><title type='text'>New VO's added at Oxford</title><content type='html'>Support for Minos, Fusion, Geant 4 and camont were added yesterday at Oxford.&lt;br /&gt;&lt;br /&gt;The new CA rpms were also installed so now we should be green again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-3181739170077003016?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/3181739170077003016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=3181739170077003016&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3181739170077003016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/3181739170077003016'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/new-vos-added-at-oxford.html' title='New VO&apos;s added at Oxford'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-117129922011872860</id><published>2007-02-12T17:50:00.000+01:00</published><updated>2007-02-12T17:53:40.130+01:00</updated><title type='text'>Latest glite update problem on UI</title><content type='html'>Got the below error on my UI when I tried to update to the latest rpms.&lt;br /&gt;This has already been reported as a GGUS ticket no:&lt;br /&gt;&lt;a href="https://gus.fzk.de/ws/ticket_info.php?ticket=18358"&gt;https://gus.fzk.de/ws/ticket_info.php?ticket=18358&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;gronbech@ppslgen:/var/local&gt; ssh root@t2ui02 'yum -y update;pakiti'&lt;br /&gt;Gathering header information file(s) from server(s)&lt;br /&gt;Server: Oxford LCG Extras&lt;br /&gt;Server: gLite packages&lt;br /&gt;Server: gLite updated packages&lt;br /&gt;Server: gLite updated packages&lt;br /&gt;Server: LCG CA packages&lt;br /&gt;Server: SL 3 errata&lt;br /&gt;Server: SL 3 main&lt;br /&gt;Finding updated packages&lt;br /&gt;Downloading needed headers&lt;br /&gt;Resolving dependencies&lt;br /&gt;.....Unable to satisfy dependencies&lt;br /&gt;Package SOAPpy needs python-fpconst &gt;= 0.6.0, this is not available.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-117129922011872860?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/117129922011872860/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=117129922011872860&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117129922011872860'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117129922011872860'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/latest-glite-update-problem-on-ui.html' title='Latest glite update problem on UI'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-117128846800988162</id><published>2007-02-12T14:46:00.000+01:00</published><updated>2007-02-12T14:54:28.023+01:00</updated><title type='text'>Steve Lloyds ATLAS Test jobs</title><content type='html'>Work was carried out at Oxford to find out why the Atlas test jobs were not working.&lt;br /&gt;It seemed there were some old references to pool accounts of the format atlas0100 and upwards which should have been atlas100 upwards. Once  all references to these were removed the jobs started working. The problem effected both the ce and the DPM server.&lt;br /&gt;&lt;br /&gt;PDG requested that 12.0.5 be installed at Oxford via the web page:&lt;br /&gt;&lt;a href="https://atlas-install.roma1.infn.it/atlas_install/protected/rai.php"&gt;https://atlas-install.roma1.infn.it/atlas_install/protected/rai.php&lt;/a&gt;&lt;br /&gt;but wonders if he should have been using&lt;br /&gt;&lt;a href="https://atlas-install.roma1.infn.it/atlas_install/"&gt;https://atlas-install.roma1.infn.it/atlas_install/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The installation was complete by Friday 9th Feb.&lt;br /&gt;Results for Oxford were all fine until the problems over the weekend.&lt;br /&gt;&lt;a href="http://hepwww.ph.qmul.ac.uk/%7Elloyd/atlas/atest.php"&gt;http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The problems at Bristol are caused by the worker having very small&lt;br /&gt;home disk partitions. The atlas software can not be loaded as their is insufficent space to expand the tar file.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-117128846800988162?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/117128846800988162/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=117128846800988162&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128846800988162'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128846800988162'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/steve-lloyds-atlas-test-jobs.html' title='Steve Lloyds ATLAS Test jobs'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-117128790083153266</id><published>2007-02-12T14:41:00.000+01:00</published><updated>2007-02-12T14:45:00.833+01:00</updated><title type='text'>Oxford instabilities over the weekend</title><content type='html'>The Oxford site had trouble over the weekend due to the system disk on the ce getting full.&lt;br /&gt;This was mainly due to large number of old log files. These have been migrated off to part of the software directory for now.&lt;br /&gt;The dpm server was also in a bad state and services had to be restarted.&lt;br /&gt;&lt;br /&gt;Meanwhile PDG is on the process of adding support for some new VO's; namely:&lt;br /&gt;MINOS, FUSION, GEANT4 and CAMONT&lt;br /&gt;&lt;br /&gt;While also on TPM duty this week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-117128790083153266?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/117128790083153266/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=117128790083153266&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128790083153266'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128790083153266'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/oxford-instabilities-over-weekend.html' title='Oxford instabilities over the weekend'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-117128763864060468</id><published>2007-02-12T14:37:00.000+01:00</published><updated>2007-02-12T14:40:38.640+01:00</updated><title type='text'>RALPPD to get another upgrade</title><content type='html'>Chris Brew announced on Friday 9th Feb:&lt;br /&gt;RALPPD have been awarded another chunck of money to be spent by March 31st.2007&lt;br /&gt;This will allow them to purchase one rack of CPU's and one rack of Disks.&lt;br /&gt;The CPU's will be equivalent to 275KSI2K bringing the total to about 600KSI2K, and the new disks will be 78TB,s bringing the total to 158TB,s.&lt;br /&gt;This includes the 50TB's currently on loan to the T1 will be returned shortly.&lt;br /&gt;The hardware will be identical to the recent T1 purchase.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-117128763864060468?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/117128763864060468/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=117128763864060468&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128763864060468'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128763864060468'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/ralppd-to-get-another-upgrade.html' title='RALPPD to get another upgrade'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-117128741148017667</id><published>2007-02-12T14:19:00.000+01:00</published><updated>2007-02-12T14:36:51.490+01:00</updated><title type='text'>Cambridge New Systems Arrive</title><content type='html'>Santanu announced on 19.1.07:&lt;br /&gt;&lt;br /&gt;Just to let you know that all the new machines have arrived; just waiting for the rack to be delivered and the Dell engineer (that's actually the part of the contact) to come and switch it on.&lt;br /&gt;&lt;br /&gt;When done, it's gonna give LCG/gLite another 128 CPUs and if our experiment with CamGrid and Condor succeeds, it will top up another ~500 CUPs. Now we can mount /experiment-software and LCG middleware area onto any CamGrid machine with any root permissions, WNs outbound connection is also sorted out. Now need to think about the stupid "WN pool account"&lt;br /&gt;&lt;br /&gt;Intel call it as "Woodcrest". All the nodes are dual-core dual CPU, so 4 CPUs under the same roof&lt;br /&gt;Dell Model        : PE1950&lt;br /&gt;Processor  : Xeon 5150Ghz/4MB 1333FSB&lt;br /&gt;Memory    : 8*1GB dual rank DIMMs&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-117128741148017667?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/117128741148017667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=117128741148017667&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128741148017667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/117128741148017667'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2007/02/cambridge-new-systems-arrive.html' title='Cambridge New Systems Arrive'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-116490411071919053</id><published>2006-11-30T17:27:00.001+01:00</published><updated>2008-04-28T14:00:13.972+01:00</updated><title type='text'></title><content type='html'>School of System Engineering&lt;br /&gt;The University of Reading&lt;br /&gt;&lt;br /&gt;Are now setting up their cluster, Emails between Jeremy Coles, Pete Gronbech and Yves have been passing giving advice on setup.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-116490411071919053?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116490411071919053'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116490411071919053'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/11/school-of-system-engineering.html' title=''/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-116490377492616673</id><published>2006-11-30T17:16:00.000+01:00</published><updated>2006-11-30T17:22:54.936+01:00</updated><title type='text'></title><content type='html'>Oxford instability of the ce over the last few days, first the bdii crashed and was restarted.&lt;br /&gt;Second the  globus_mds service crashed and was restarted. IN the end I decided to reboot the node today.&lt;br /&gt;New jobs seem to be arriving now and we are passing SAM tests again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-116490377492616673?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/116490377492616673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=116490377492616673&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116490377492616673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116490377492616673'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/11/oxford-instability-of-ce-over-last-few.html' title=''/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-116411024764419654</id><published>2006-11-21T12:54:00.000+01:00</published><updated>2006-11-21T12:58:00.246+01:00</updated><title type='text'></title><content type='html'>13.11.06&lt;br /&gt;Ralppd : upgraded to torque 2 and dCache 1.7 .&lt;br /&gt;&lt;br /&gt;Lawrie at Birmingham complained about the amount of spam to "lcg" style mailing lists? At least half my spam comes from the following lists:&lt;br /&gt;&lt;br /&gt;proj-atlas-geant4-emb@cern.ch&lt;br /&gt;project-lcg-vo-sites@cern.ch&lt;br /&gt;project-lcg-vo-atlas-sites@cern.ch&lt;br /&gt;project-lcg-security-csirts@cern.ch&lt;br /&gt;&lt;br /&gt;maybe one or two more. One answer would be to change these mailing list names and then don't advertise them on a web page! Or make them closed lists.&lt;br /&gt;&lt;br /&gt;I have created a new page as a header for the shared private technical documents.&lt;br /&gt;See the link at the bottom of&lt;br /&gt;&lt;a href="https://www.gridpp.ac.uk/southgrid/TechnicalBoard.html"&gt;https://www.gridpp.ac.uk/southgrid/TechnicalBoard.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It now includes phone numbers (off the GOCDB) for each site.&lt;br /&gt;&lt;br /&gt;Some questions arise,&lt;br /&gt;Should we have a separate security email list for the RAL site security for major problems not just LCG VO type probs?&lt;br /&gt;&lt;br /&gt;Should Cambridge have an lcg-security email list that includes the site one. My worry is that LCG security challenges may not be seen by Santanu is he is not directly on the cert@cam list.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-116411024764419654?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/116411024764419654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=116411024764419654&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116411024764419654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116411024764419654'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/11/13.html' title=''/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-116186712619788573</id><published>2006-10-26T13:51:00.000+01:00</published><updated>2006-10-26T13:52:06.196+01:00</updated><title type='text'></title><content type='html'>A torque security flaw was made public last week, most southgrid sites shutdown queues on Friday night.&lt;br /&gt;Then patched up over the next 3 days.&lt;br /&gt;Cambridge was not affected as they use Condor.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-116186712619788573?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/116186712619788573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=116186712619788573&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116186712619788573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116186712619788573'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/10/torque-security-flaw-was-made-public.html' title=''/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-116186701557499384</id><published>2006-10-26T13:50:00.000+01:00</published><updated>2006-10-26T13:50:15.586+01:00</updated><title type='text'></title><content type='html'>Sysman meeting at Cambridge followed by the SouthGrid Technical meeting, shared support was discussed and generally agreed, but needs to be formalised and put in place at Birmingham. RALPPD were not there to discuss.&lt;br /&gt;Pete had another meeting with the Oxford Campus Network guys, who have been doing some tests and managed to get better through put by tweeking the kernel. It is now thought that the change that happened on August 15th may have increased the latency between sites and this may be the cause of the reduced bandwidth.&lt;br /&gt;Further tests will be done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-116186701557499384?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/116186701557499384/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=116186701557499384&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116186701557499384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116186701557499384'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/10/sysman-meeting-at-cambridge-followed.html' title=''/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-116075258564590659</id><published>2006-10-13T16:10:00.001+01:00</published><updated>2008-04-28T14:00:35.361+01:00</updated><title type='text'></title><content type='html'>SouthGrid has been busy taking part in the Service Challenge Throughput tests.&lt;br /&gt;Oxford carried out tests on September 26th and 27th. The first Oxford to RALpp went ok although slowly, the second RALPP to Oxford was extremly disapointing. Practically zero bandwidth.&lt;br /&gt;&lt;br /&gt;Further iperf tests have been carried out between servers both within Oxford and SouthGrid. At the moment the main cause of the problem seems to be the installation of a new Campus Firewall on August 15th. See:  the &lt;a href="http://gridmon3.dl.ac.uk/gridmon/graph.php?src%5B%5D=x.physics.ox.ac.uk&amp;amp;dst%5B%5D=x.ph.bham.ac.uk&amp;amp;allMetrics=on&amp;amp;metric%5B%5D=tcpthruput&amp;amp;plot=raw&amp;amp;oneGraph=off&amp;amp;order=met-src&amp;amp;plotsPerGraph=8&amp;amp;newGraph=max&amp;amp;day=1&amp;amp;mth=8&amp;amp;yr=2006&amp;amp;hr=0&amp;amp;min=0&amp;amp;endDay=13&amp;amp;endMth=10&amp;amp;endYr=2006&amp;amp;endHr=16&amp;amp;endMin=7"&gt;Gridmon web site.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Work with David Wallom of Oxford Grid has continued in order to enable the NGS VO on Oxford's cluster.&lt;br /&gt;&lt;br /&gt;Another worker node PSU failed two weeks ago and was replaced under warrenty by Dell.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-116075258564590659?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116075258564590659'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/116075258564590659'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/10/southgrid-has-been-busy-taking-part-in.html' title=''/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-115753792793827265</id><published>2006-09-06T09:45:00.000+01:00</published><updated>2006-09-09T21:49:29.836+01:00</updated><title type='text'>Security Update</title><content type='html'>Oxford UI updated other nodes proceding.&lt;br /&gt;Bristol, Cambridge and Birmingham have confirmed they too have updated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-115753792793827265?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/115753792793827265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=115753792793827265&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115753792793827265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115753792793827265'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/09/security-update.html' title='Security Update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-115745982236273649</id><published>2006-09-05T13:33:00.000+01:00</published><updated>2006-09-05T13:37:02.426+01:00</updated><title type='text'>Oxford updated</title><content type='html'>Over the weekend I updated the rpms on Oxfords gird nodes. On Monday I re ran yaim to make the changes take effect. Testing was hammpered by the SFT page being very slow and the submission page not working well. On Tuesday I discovered Oxford had been failing the SFT's due to pbs not working properly, because there was no longer a nodes, file. This was traced to a typo in the site-info.def file. yaim was re run on which seemed to cure things.&lt;br /&gt;&lt;br /&gt;Yves has had trouble getting &gt;250Mb/s from Birmingham, he thinks this is due to dpm problems rather than raw bandwidth issues as iperf tests give much better results.&lt;br /&gt;&lt;br /&gt;We are now testing Oxford instead of Birmingham.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-115745982236273649?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/115745982236273649/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=115745982236273649&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115745982236273649'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115745982236273649'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/09/oxford-updated.html' title='Oxford updated'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-115712584430325055</id><published>2006-09-01T16:46:00.000+01:00</published><updated>2006-09-01T16:50:44.303+01:00</updated><title type='text'>Friday pm</title><content type='html'>Bristol - Birmingham tests due to start today.&lt;br /&gt;Winnie had some problems diagnosing the error messages from the transfer tests, Yves thinks the documentation is OK if you are an expert  but it could be improved.&lt;br /&gt;&lt;br /&gt;Yves has been benchmarking new systems, Intel Woodcrest vs AMD Opteron, Ethernet vs Infiniband, to provide data for Birminghams future escience cluster purchase. Some results will be available later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-115712584430325055?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/115712584430325055/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=115712584430325055&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115712584430325055'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115712584430325055'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/09/friday-pm.html' title='Friday pm'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-115712531333104943</id><published>2006-09-01T16:32:00.000+01:00</published><updated>2006-09-01T16:41:53.466+01:00</updated><title type='text'>update</title><content type='html'>Four sites out of 5 are now running glite 3.0.2&lt;br /&gt;Just Oxford to go which is being upgraded today.&lt;br /&gt;rm failures at Oxford were cured by re running yaim on the se's. May be gridftp had gone mad?&lt;br /&gt;&lt;br /&gt;EDFA-JET is now fully operational and running SFT's succesfully.&lt;br /&gt;&lt;br /&gt;Yves continuing to test DPM-DPM throughput and has been tuning the kernel and tcp ip parameters to optimise performance.&lt;br /&gt;&lt;br /&gt;Yves also carried out CASTOR DPM tests last Sunday, &lt;a href="http://www.gridpp.ac.uk/wiki/RAL_Tier1_CASTOR_SRM_tests_T1toT2"&gt;http://www.gridpp.ac.uk/wiki/RAL_Tier1_CASTOR_SRM_tests_T1toT2&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-115712531333104943?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/115712531333104943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=115712531333104943&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115712531333104943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115712531333104943'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/09/update.html' title='update'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-115651956505531104</id><published>2006-08-25T16:15:00.002+01:00</published><updated>2006-08-25T16:32:09.090+01:00</updated><title type='text'>Condor Problems at Cambridge</title><content type='html'>The glite CE requires a version of Condor which is a developmet fork, and not the production release.&lt;br /&gt;Santanu  expects very few production sites to ever consider using  a development release and yet LCG has a dependacy on it.&lt;br /&gt;The second number in the release version is even for production and odd for development.&lt;br /&gt;The numbers in question are; lcg requires 6.7.10-1 but Cambridge says 6.6.x-x is more likely at a production site or may be the next release which will be 6.8.x-x.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-115651956505531104?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/115651956505531104/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=115651956505531104&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115651956505531104'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115651956505531104'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/08/condor-problems-at-cambrid_115651956505531104.html' title='Condor Problems at Cambridge'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-33333025.post-115650721875619384</id><published>2006-08-25T12:54:00.000+01:00</published><updated>2006-08-25T13:00:18.766+01:00</updated><title type='text'>First Post</title><content type='html'>Visited Yves at Birmingham on Monday 21st . Discussed the throughput tests he has been carrying out between Bristol and Ral and Bham. Carried out some tests between Oxford and Bham.&lt;br /&gt;&lt;br /&gt;Whole building power testing at Oxford on Wednesday 23rd. Set queues to disabled on Monday  to force quese to drain.  I had previously  marked all the nodes offline which meant Oxford failed some SFT's,  just disabling the VO queues is a better way to do it. All systems came back OK on Thursday morning.&lt;br /&gt;&lt;br /&gt;Yves has been helping Culham get the EDFA-JET site up and running via email. They are very nearly there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/33333025-115650721875619384?l=southgrid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://southgrid.blogspot.com/feeds/115650721875619384/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=33333025&amp;postID=115650721875619384&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115650721875619384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/33333025/posts/default/115650721875619384'/><link rel='alternate' type='text/html' href='http://southgrid.blogspot.com/2006/08/first-post.html' title='First Post'/><author><name>Pete Gronbech</name><uri>http://www.blogger.com/profile/10530255848916315252</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
