Friday, March 30, 2012

Should we Hyperthread

Following the recent discussion on hypertheading on the TB-Support mail list and having several sets of nodes that have hyperthreading and three or four gigabytes of memory per core we decided to run some tests to see if opening some job slots on the virtual cores would increase our throughput.

The first test was to benchmark the nodes with hyperthreading enabled out to the full number of cores. We have three sets of nodes with hyperthreading capabilities with E5520, X5650 and E5645 CPUs. For each type we ran one to n instances of the HEPSPEC benchmarking tests when n is the number of real plus the number of virtual cores.

Figure (1): Total Node HEPSPEC by number of concurrent tests

These are shown in Figure (1) and clearly show the nearly linear rise as the test run on the real CPUs then flattening of as more of the virtual cores run becoming almost completely flat or even dropping again as all the virtual cores are used. However it does show a clear increase in the total HEPSPEC rating of the node when using half of the virtual cores. That should mean that there will be a real gain in output by enabling jobs on these virtual cores, as long as real work scales like the HEPSPEC tests and we don't run into network, memory of disk I/O bottlenecks.

Armed with this information we decided to check the real world performance of the nodes with E5520 and X5650 CPUs with jobs running on half the virtual CPUs.

To do this we took half of each set of nodes and increased the number of job slots by 50% (8 to 12 for the E5520s and 12 to 18 for the X5650s, the E5645 nodes are still in test after delivery and not yet ready for production) and reduced the pbs_mom cpumult and wallmult parameters to reflect the lower per core HEPSPEC rating once we start using the virtual cores and returned them to running production jobs.

They have now been running real jobs for seven days and we have enough statistics to start comparing the nodes running jobs on virtual cores with those not doing so.

Figure (2): Average Job Efficiency for nodes using and not using virtual cores
Figure (2) shows the average job efficiency (total CPU time divided by total wall time) for jobs run on nodes with and without the virtual cores in use. There is no sign of a drop in efficiency when running on the virtual cores so it would appear that at 12 or 18 jobs per node we are not yet hitting network, memory or disk I/O bottlenecks.

Figure (3): Average Job Efficiency for different VOs and Roles on the different classes of nodes
Figure (3) shows the average job efficiency for different VOs and roles on the different classes of nodes, and shows no sign of a systematic drop in efficiency when running jobs of the virtual cores (only the prdlhcb group shows signs of such an effect and the statistics are somewhat lower for that).
Figure(4):HEPSPEC06 Scaled CPU Hours per Node per Day
Figure (5): HEPSPEC06 Scaled CPU Hours per Node per Day
Figures (4) and (5) HEPSPEC06 scaled CPU hours accumulated per day per core or node (total unscaled CPU time of all jobs on that class multiplied by the HS06 rating divided by the number of days the test ran and the number of cores or nodes in that class) shows as hoped that although the individual cores accumulate HEPSPEC06 scaled CPU hours faster when not running on virtual cores that is more than offset by the increase in the number of slots per node.

Figure (6): Average number of Jobs per Core per Day
Figure (7): Average number of Jobs per Node per Day
Finally Figures (6) and (7) Jobs per day per node or core (Total number of jobs run on each class of node decided by the length of the test and the number of nodes or cores in that class) shows a similar effect - the individual cores manage to do more work when no jobs are running on the virtual cores but the increase in the number of slots more than make up for it.

In conclusion it appears that running jobs on half of the virtual cores for nodes that are hyperthreading capable gives a 30-40% increase in the total "installed capacity" provided by those nodes without any apparent decrease in the efficiency of jobs running on those nodes.

We will continue the test for another week but unless the numbers change drastically we will be changing our policy and running jobs on half of the virtual cores on hyperthreading capable nodes.

Chris and Rob.

Friday, February 10, 2012

HEPSEPC06 on AMD Interlagos 6276

I have been running HEPSPEC06 on a recent Dell 815 with the new AMD 16 core Interlagos processors.

The only valid HEPSPEC06 result (for current GridPP use) is the SL5 (64 bit OS) but 32 bit compiler result but for interest we ran also with 64 bit compiler switches.

Then we installed SL6 and re-ran both 32 and 64 bit compiler options.

The results are on the GridPP wiki, but the most notable thing is the performance boost you can get going from 32bit on SL5 to 64 bit on SL6.

The boost is nearly 25% which could mean a lot to the experiments and the sites productivity if they can be persuaded to migrate sooner rather than later.

https://www.gridpp.ac.uk/wiki/HEPSPEC06#UKI-SOUTHGRID-OXLink