Print

Print


RE: [LCG-ROLLOUT] Publishing local accounted data

Hello Alastair,

this thing with rgma20.pp.rl.ac.uk was coming from an configuration problem on the CE (grid13) were I run the publishing stuff.
I used the user rgma, who had this old configuration....

Our MON box is not involved in the game anymore, currently I create the LcgRecords on the CE and publish them directly to https://lcgic01.gridpp.rl.ac.uk:8443

Unfortunately things are sometimes very slow and sometimes one gets not back the full output.....

After fixing a small thing in my python code I'm able to publish data, but it still don't works as it should do.
From about 12000 entrys (I guess this was a little bit to much) I can only see 1 entry, which is the last one published.
Redone the test with another 10 entrys shows me also only the last entry published (plus 2 entrys from publishing tests done before).

I don't know if it is only a time problem (so maybe the entrys will show up later).

If this doesn't work I will try what you have proposed, maybe this will do the trick.

Thanks and regards,
Carsten.

-----------------------------------------
Carsten Preuss
Gesellschaft fuer Schwerionenforschung mbH
IT
Planckstr. 1, D-64291 Darmstadt, Germany
phone: +49-6159-71-1339

-----------------------------------------



-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Alastair Duncan
Sent: Mon 21.04.2008 14:22
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Publishing local accounted data

Hi Carsten,

What you can attempt is to setup a continuous consumer on the command line

set query continuous
select * from LcgRecords WHERE ExecutingSite = 'GSI-LCG2'

By default this will be for 60 seconds change this to longer if necessary. eg.
set timeout 120

Then from another terminal run your publishing code so see if the data is
published continuously. If it isn't then have a look in your MON log files to
see if there is any indiction of any
problems. /var/log/glite/rgma-server/rgma-server.log

You can also look on the inspector to see if the producer has been created.

https://rgma19.pp.rl.ac.uk:8443/Inspector/Main.do/getSiteStatus?serviceType=primary&siteName=lcg01.gsi.de&portNumber=8443&lookupType=statusDetails

I noticed that a producer had been started while I was checking
        [C-L--]Query type       16Last contact (mins)   6Last registry update (mins)   
20Termination interval (mins)    Mon Apr 21 10:13:54 UTC 2008

Client hostname  LcgRecordsTable  Table count
 688151444ID     grid13.gsi.de          1

Query type      Last contact (mins)Last registry update (mins)
[C-L--]         16                              6

Termination interval (mins)     Time created
20                                      Mon Apr 21 10:13:54 UTC 2008

And clicking the id shows that one tuple was inserted and that there was 1
consumer attached and that was at goc01.grid-support.ac.uk is there any
particular reason why you

I've done a query using the

select * from LcgRecords WHERE ExecutingSite = 'GSI-LCG2'

and this at present is returning ~250 records(13:18BST). This was not working
earlier today. I'm attempting at present so figure out what could be the
cause of the problems.

I'm curious why you think the remote DB is running on rgma20.pp.rl.ac.uk, this
machine is archiving some of the Glue tables but not LcgRecords. The archiver
for LCGRecords is goc01.grid-support.ac.uk but you don't need to query these
machines directly this should be done via your own mon box.

The code you have running on your MON box is pretty old now
5.0.31 the latest release is 5.0.49

regards

Alastair

On Monday 21 April 2008 10:01:49 Preuss, Carsten wrote:
> Hello Jeff,
>
> thanks for your reply.
>
>
> rgma> select * from GlueSite where Name='GSI-LCG2'
>
> +--------------+----------+--------------------------+-------------------+-
>-------------------+-------------------+--------------------+----------+----
>-------+-------------------+-----------------+-----------------+
>
> | UniqueId     | Name     | Description              | SysAdminContact   |
> | UserSupportContact | SecurityContact   | Location           | Latitude |
> | Longitude | Web               | MeasurementDate | MeasurementTime |
>
> +--------------+----------+--------------------------+-------------------+-
>-------------------+-------------------+--------------------+----------+----
>-------+-------------------+-----------------+-----------------+
>
> | lcg01.gsi.de | GSI-LCG2 | No description available | [log in to unmask] |
> | [log in to unmask]  | [log in to unmask] | Darmstadt, Germany | 49.51    |
> | 8.39      | http://www.gsi.de | 2008-04-21      | 08:01:05        |
>
> +--------------+----------+--------------------------+-------------------+-
>-------------------+-------------------+--------------------+----------+----
>-------+-------------------+-----------------+-----------------+ 1 rows
>
> This works fine.
>
> Publishing data via RGMA command line works. The data are published correct
> (we do this via the RGMA on the CE).
>
> So it seems thta I have a problem with the python code itself.
>
> One point in the code that is not clear to me is :
>
>          predicate = "WHERE (ExecutingSite='GSI-LCG2')"
>
> I'm not sure, if this is the right statement for this variable.
>
>
> Cheers,
> Carsten.
>
> -----------------------------------------
> Carsten Preuss
> Gesellschaft fuer Schwerionenforschung mbH
> IT
> Planckstr. 1, D-64291 Darmstadt, Germany
> phone: +49-6159-71-1339
>
> -----------------------------------------
>
>
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout on behalf of Jeff Templon
> Sent: Mon 21.04.2008 09:53
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Publishing local accounted data
>
> Hi,
>
> Do you have basic RGMA connectivity?  If you use the rgma command line
> tool, on the same node from which you are trying to publish, and you say
>
>     show tables
>
> and e.g.
>
>     select * from GlueSite
>
> what do you see?
>
> It could be either that your MON box is not working right and hence data
> never makes it out of your site, OR it could be that the archiver (or
> whatever they call it these days) is not eating the RGMA tuples and putting
> them into the database.
>
> If the commands above work, you could try publishing a tuple by hand, using
> the command line, and verifying that it actually got published.
>
>                                       JT
>
> --On 21 April 2008 08:03:29 +0200 "Preuss, Carsten" <[log in to unmask]> wrote:
> > Hello all,
> >
> > currently we are trying to publish local accounted jobs via RGMA.
> >
> > The first step was to do this via building LcgRecords and writing them to
> > the DB on the MON node.
> > The entrys were builded correctly but were ignored/not published by APEL.
> >
> > The next step, to publish the data directly via the RGMA-PYTHON-API was
> > also unsuccessfull.
> > I followed the example from the
> >
> > "R-GMA User Guide for Python Programmers"
> >
> > as described in the chapter
> >
> > "4. PRIMARY PRODUCER EXAMPLES"
> >
> > ...
> >     try:
> >         storage = rgma.Storage(rgma.StorageType.MEMORY)
> >         properties = rgma.ProducerProperties(storage)
> >         terminationInterval = rgma.TimeInterval(60, rgma.Units.MINUTES)
> >         producer = rgma.PrimaryProducer(terminationInterval, properties)
> >         predicate = "WHERE (ExecutingSite='GSI-LCG2')"
> >         latestRetentionPeriod = rgma.TimeInterval(60, rgma.Units.MINUTES)
> >         historyRetentionPeriod = rgma.TimeInterval(60,
> > rgma.Units.MINUTES) producer.declareTable("LcgRecords", predicate,
> > latestRetentionPeriod, historyRetentionPeriod)
> >         producer.insert(insert)
> >         producer.close()
> >     except (rgma.RGMAException, rgma.RemoteException,
> > rgma.UnknownResourceException), e:
> >         sys.stderr.write("RGMA Error: %s\n" % e)
> >         sys.exit(1)
> >
> >     except:
> >         sys.stderr.write("Unexpected error: %s\n" % sys.exc_info()[0])
> >         sys.exit(1)
> > ....
> >
> >
> > No Records appeared in the remote DB under
> > https://rgma20.pp.rl.ac.uk:8443/R-GMA/
> >
> > Unfortunately I got also no error messages from SQL, RGMA or PHYTON.
> >
> > Has anybody already experiences with this stuff?
> >
> >
> > Thanks in advantage,
> >
> > Carsten.
> >
> > -----------------------------------------
> > Carsten Preuss
> > Gesellschaft fuer Schwerionenforschung mbH
> > IT
> > Planckstr. 1, D-64291 Darmstadt, Germany
> > phone: +49-6159-71-1339
> >
> > -----------------------------------------