Tuesday, December 16, 2008

EFDA-JET Service nodes upgraded to glite 3.1

We upgraded our service nodes to Scientific Linux 4.7 and glite-3.1. The worker nodes had been upgraded earlier. The problems/issues we had while upgrading to Scientific Linux 4.7 are listed below:

Storage Engine

While installing the SE glite middleware (glite-SE_dpm_mysql), there was
a missing dependency issue for the perl-SOAP-Lite package.

Error: Missing Dependency: perl-SOAP-Lite >= 0.67 is needed by package
gridview-wsclient-common

Doing a

# yum install perl-SOAP-Lite

only installs perl-SOAP-Lite-0.65, which is lower than the version needed.

The perl-SOAP-Lite rpm was downloaded from a different repository. We
initially downloaded the perl-SOAP-Lite-0.67.el4 but this one failed to install as it needed MQSeries and other packages to be installed. We finally downloaded perl-SOAP-Lite-0.67-1.1.fc1.rf.noarch.rpm and it installed without any problems.

When the node was configured by yaim, the following error was obtained

sed: can't read /opt/bdii/etc/schemas: No such file or directory

The file /opt/bdii/etc/schemas was missing. The fix is to copy the schemas.example file to schemas

# cp -i /opt/bdii/doc/schemas.example /opt/bdii/etc/schemas

First SAM test failed. lcg-lr was missing, we needed to install lcg_util.
This installed a new version of lcg_util that was on the other nodes. lcg_util
was then updated on all the nodes.

Compute Element (& site BDII)

We run the compute element service and the site BDII service on the same node.

While installing the glite-BDII packages, we obtained the following dependency errors.

Error: Missing Dependency: glite-info-provider-ldap = 1.1.0-1 is needed by package glite-BDII
Error: Missing Dependency: glue-schema = 1.3.0-3 is needed by package glite-BDII
Error: Missing Dependency: bdii = 3.9.1-5 is needed by package glite-BDII

Using yum to install the missing packages, installs these packages at a higher level and still causes the installation of glite-BDII packages to fail, as it needs these packages at the versions listed above. These packages were instead installed by hand. A GGUS ticket (Ticket-ID: 42456), which suggested that this problem is fixed in the latest release (update 34).

As with the SE install above, we had the same problem with the schemas file, missing. The above fix was repeated here.

When running yaim, we had the following errors,

grep: a: No such file or directory
grep: VO: No such file or directory
grep: or: No such file or directory
grep: a: No such file or directory
grep: VOMS: No such file or directory
grep: FQAN: No such file or directory
grep: as: No such file or directory
grep: an: No such file or directory
grep: argument: No such file or directory
qmgr: Syntax error - cannot locate attribute
set queue lhcb acl_groups += /opt/glite/yaim/bin/yaim: supply a VO or a VOMS FQAN as an argument

To fix it we edited the file /opt/glite/yaim/functions/utils/users_getvogroup and commented out

#echo "$0: supply a VO or a VOMS FQAN as an argument"

On Gstat web monitoring page, it was being reported that the SE service was missing ('SE missing in Gstat service'). To fix this problem, we edited the file /opt/bdii/etc/bdii-update.conf and add the following line for our SE.

SE ldap://grid001.jet.efda.org:2170/mds-vo-name=resource,o=grid

Mon Box

When running yaim, we had the following errors


Problem starting rgma-servicetool

Starting rgma-servicetool: [FAILED]
For more details check /var/log/glite/rgma-servicetool.log
Stopping rgma-gin: [ OK ]
Starting rgma-gin: [FAILED]

Fixed by defining a new java by adding the following to the site-info.def

HOSTNAME=`hostname`
if [ "$HOSTNAME" == "$MON_HOST" ] ; then
JAVA_LOCATION="/usr/lib/jvm/jre-1.5.0-sun"
else
JAVA_LOCATION="/usr/java/j2sdk1.4.2_12"
fi

We had the same 'schemas' file missing problem here as well.

Networking

EFDA-JET has a slightly unusually set up as we are restricted to a small number of external IP addresses. All nodes are on the same LAN with private IP addresses, whilst the service nodes also have external addresses. In the hosts files on the service nodes, all service nodes are referenced by their external addresses, whilst on the worker nodes, the service nodes are referenced by their private addresses.

This worked well for glite 3.0, but not for glite 3.1, where we saw clients on the worker nodes trying to contact the service nodes via their external addresses. It looks like glite 3.1 iservices are passing IP addresses for clients to be call back on at a later time. The complete solution was to run iptables on the worker nodes and NAT translate outgoing connections for external addresses of the service nodes to their corresponding internal addresses. This was done by adding the following to /etc/rc.local on the worker nodes.

/sbin/service iptables start
/sbin/iptables -A OUTPUT -t nat -d <CE-ext-addr> -j DNAT \
--to-destination <CE-int-addr>
/sbin/iptables -A OUTPUT -t nat -d <SE-ext-addr> -j DNAT \
--to-destination <SE-int-addr>

No comments: