Friday, July 25, 2008

Adding multiple clusters to get different memory limit queues

I've been thinking about doing this for ages. The aim is to have different queues with different memory limits to better direct jobs with higher memory requirements to nodes with more memory.

The current method of doing this is to set up up a separate queue for each level with the default memory requirement set. Then in the information system publish separate clusters and subclusters for each of the queues.

So I first created grid500, grid1000 and grid2000 queues with no memory limits, configured them normal using yaim so they would accept jobs from all my supported VOs and checked that job submission to them worked as expected.

I then edited the static-file-Cluster.ldif file on the CE to add extra clusters and subclusters for each of the queues and set the memory for each of the clusters to the memory for the queue. So for example for the grid500 queue I created a 500.pp.rl.ac.uk cluster like so:

dn: GlueClusterUniqueID=500.pp.rl.ac.uk,mds-vo-name=resource,o=grid
objectClass: GlueClusterTop
objectClass: GlueCluster
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueClusterName: 500.pp.rl.ac.uk
GlueClusterService: heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid500
GlueClusterUniqueID: 500.pp.rl.ac.uk
GlueForeignKey: GlueSiteUniqueID=UKI-SOUTHGRID-RALPP
GlueForeignKey: GlueCEUniqueID=heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid500
GlueInformationServiceURL: ldap://heplnx206.pp.rl.ac.uk:2170/mds-vo-name=resource,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

dn: GlueSubClusterUniqueID=500.pp.rl.ac.uk, GlueClusterUniqueID=500.pp.rl.ac.uk,mds-vo-name=resource,o=grid
objectClass: GlueClusterTop
objectClass: GlueSubCluster
objectClass: GlueHostApplicationSoftware
objectClass: GlueHostArchitecture
objectClass: GlueHostBenchmark
objectClass: GlueHostMainMemory
objectClass: GlueHostNetworkAdapter
objectClass: GlueHostOperatingSystem
objectClass: GlueHostProcessor
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueChunkKey: GlueClusterUniqueID=heplnx206.pp.rl.ac.uk
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_1_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_1_1
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_2_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_3_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_3_1
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_4_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_5_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_6_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_7_0
GlueHostApplicationSoftwareRunTimeEnvironment: GLITE-3_0_0
GlueHostApplicationSoftwareRunTimeEnvironment: RALPP
GlueHostApplicationSoftwareRunTimeEnvironment: SOUTHHGRID
GlueHostApplicationSoftwareRunTimeEnvironment: GRIDPP
GlueHostApplicationSoftwareRunTimeEnvironment: R-GMA
GlueHostArchitectureSMPSize: 2
GlueHostArchitecturePlatformType: i586
GlueHostBenchmarkSF00: 0
GlueHostBenchmarkSI00: 1000
GlueHostMainMemoryRAMSize: 500
GlueHostMainMemoryVirtualSize: 1000
GlueHostNetworkAdapterInboundIP: FALSE
GlueHostNetworkAdapterOutboundIP: TRUE
GlueHostOperatingSystemName: ScientificSL
GlueHostOperatingSystemRelease: 4.4
GlueHostOperatingSystemVersion: Beryllium
GlueHostProcessorClockSpeed: 2800
GlueHostProcessorModel: PIV
GlueHostProcessorVendor: intel
lueSubClusterName: 500.pp.rl.ac.uk
GlueSubClusterUniqueID: 500.pp.rl.ac.uk
GlueSubClusterPhysicalCPUs: 0
GlueSubClusterLogicalCPUs: 0
GlueSubClusterTmpDir: /tmp
GlueSubClusterWNTmpDir: /tmp
GlueInformationServiceURL: ldap://heplnx206.pp.rl.ac.uk:2170/mds-vo-name=resource,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

Note that there has to be a blank line in the file after the end of the subcluster definition or else the gip script that adds the VO tags doesn't.

I also had to edit the CE/Queue entries in static-file-CE.ldif to change these two entries for each of the grid500, grid1000 and grid2000 queues

GlueCEHostingCluster: 500.pp.rl.ac.uk
GlueForeignKey: GlueClusterUniqueID=500.pp.rl.ac.uk

These clusters seem to have appeared correctly on the gStat pages.

So now when I edg-job-list-match a jdl with the following requirements I get:

Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 500);

***************************************************************************
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:

*CEId*
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-dteam
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid1000
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid2000
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid500
***************************************************************************

Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 1000);

***************************************************************************
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:

*CEId*
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-dteam
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid1000
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid2000
***************************************************************************

Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 1500);

***************************************************************************
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:

*CEId*
heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-grid2000
***************************************************************************

Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 2001);

===================== edg-job-list-match failure ======================
No Computing Element matching your job requirements has been found!
======================================================================

Which looks very much like what I want to do.

Then to let Torque/Maui know about the memory requirements for each of these new queues I set a default memory requirement for each with something like:
qmgr -c "set queue grid1000 resources_default.mem = 1000mb"
(I think this is a non-enforcing requirement so jobs will not be killed for going over it. To do that I think you need to set "resources_max.mem".)

Now I just need to do the same configuration on heplnx207, phase out the "per VO" queues and persuade users to put the memory requirements of their jobs in their JDL files

No comments: