Cheers Andrew,
We’ll register with the mailing list and have a look. Thanks for the feedback!!
Gareth
On 7 Oct 2014, at 15:17, Andrew Lahiff <[log in to unmask]> wrote:
> Have you tried stopping, checking for any stray bdii-update or nordugrid-* processes, then restarting? Sometimes I've noticed that after stopping all nordugrid services there can be some bdii-update processes still running, and if these aren't killed before starting services they can cause problems. (I"ll try to remember to submit a ticket about this once the NorduGrid bugzilla is back up...)
>
> When I first was testing an ARC CE I had problems similar to what you report (I think); maybe you might see something useful if you read through this thread:
>
> http://mail.nordugrid.org/mailman/private/nordugrid-discuss/2013q2/051908.html
>
> Since then we haven't had any Infosys related problems, and the logs are just filled with this over and over again:
>
> [2014-10-06 08:47:57] InfosysHelper: VERBOSE: New fifo created: /var/run/arc/infosys/ldif-provider.fifo
> [2014-10-06 08:47:57] InfosysHelper: INFO: Start waiting for notification from A-REX's infoprovider
> [2014-10-06 08:49:14] InfosysHelper: INFO: Notification received from A-REX's infoprovider
> [2014-10-06 08:49:14] InfosysHelper: VERBOSE: Using ldif generator script: /var/run/arc/infosys/ldif-provider.sh
>
> The best thing to do probably is to ask on the NorduGrid mailing list.
>
> Regards,
> Andrew.
>
> ________________________________
> From: Gareth Roy [[log in to unmask]]
> Sent: Tuesday, October 07, 2014 2:43 PM
> To: [log in to unmask]
> Subject: Re: ARC Infosys issues.
>
> Hi Andrew,
>
> The rendering seems to be reasonably fast here too, the problem seems to be once the information is rendered it has trouble contacting the LDAP (BDII) system see below at 04:35:48. I’ve attached a slide I found which I’ve been using to puzzle through whats happening with the components of the information system… I don’t know if it’s up to date but I think the issue we’re having is with the communication between A-REX and the LDAP subsystem.
>
> In the example below it looks like something happened at 04:25 which causes the BDII not to be able to talk to the A-REX, the logs for the info provider don’t show anything and it merrily carries own generating the LDIF and XML (no other warnings generated)… at 04:33 the BDII gives up and has issues with the generated ldif which then means at 04:35 the A-REX gets a warning that it hasn’t notified the LDAP system. It then states the LDIF is too old for the next two minutes until a new FIFO is created.
>
> When this was being particularly problematic it was happening continually for 30 minutes before the two components finally got back into sync. Since tweaking some of the parameters mentioned below this has gotten much better but still happens occasionally (which gives us timeouts in the ARC availability as the tests can’t see the BDII and so fail).
>
> Not sure if you’ve seen this or have these timeouts in your logs.
>
> Thanks,
>
> Gareth
>
>
> Infoprovider.log
> [2014-10-07 04:35:43] CEInfo: INFO: ############## A-REX infoprovider started ##############
> [2014-10-07 04:35:43] CEInfo: INFO: AdminDomain config option is missing in XML. Defaulting to arc.conf values
> [2014-10-07 04:35:43] CEInfo: INFO: ClusterName config option is missing in XML. Trying cluster_alias...
> [2014-10-07 04:35:43] CEInfo: WARNING: arex_mount_point not configured. WS interfaces org.nordugrid.xbes and EMI-ES will not be publish
> ed.
> [2014-10-07 04:35:43] CEInfo: INFO: Reading grid-mapfiles
> [2014-10-07 04:35:43] CEInfo: INFO: Fetching job information from control directory (GMJobsInfo.pm)
> [2014-10-07 04:35:44] CEInfo: INFO: Updating job status information
> [2014-10-07 04:35:44] CEInfo: INFO: Updating frontend information (HostInfo.pm)
> [2014-10-07 04:35:46] CEInfo: INFO: Updating RTE information (RTEInfo.pm)
> [2014-10-07 04:35:46] CEInfo: INFO: Updating LRMS information (LRMSInfo.pm)
> [2014-10-07 04:35:47] CEInfo: INFO: Discovering adotf values
> [2014-10-07 04:35:47] CEInfo: INFO: Generating GLUE2 XML rendering
> [2014-10-07 04:35:48] CEInfo: INFO: Generating LDIF renderings
> [2014-10-07 04:35:48] CEInfo: INFO: Generating GLUE2 LDIF rendering
> [2014-10-07 04:35:48] CEInfo: INFO: Generating NorduGrid LDIF rendering
> [2014-10-07 04:35:48] CEInfo: WARNING: Failed to notify LDAP information system
> [2014-10-07 04:35:48] CEInfo: INFO: ############## A-REX infoprovider finished ##############
>
> bdii.log
> [2014-10-07 04:25:24] InfosysHelper: INFO: Start waiting for notification from A-REX's infoprovider
> [2014-10-07 04:33:24] InfosysHelper: WARNING: SIGTERM caught while waiting for notification from A-REX's infoprovider
> [2014-10-07 04:33:24] InfosysHelper: WARNING: Failed to receive notification from A-REX's infoprovider
> [2014-10-07 04:33:24] InfosysHelper: VERBOSE: Using ldif generator script: /var/run/arc/infosys/ldif-provider.sh
> cat: write error: Broken pipe
> 2014-10-07 04:33:24,599: [ERROR] Timed out while reading /var/tmp/arc/bdii/provider/arc-glue-bdii-ldif
> [2014-10-07 04:33:41] InfosysHelper: INFO: The ldif generator script is too old (/var/run/arc/infosys/ldif-provider.sh)
> [2014-10-07 04:33:41] InfosysHelper: INFO: This file should have been refreshed by A-REX's infoprovider. Check that A-REX is running.
>
>
>
>
>
>
>
>
> On 7 Oct 2014, at 13:46, Andrew Lahiff <[log in to unmask]> wrote:
>
>> Hi Gareth,
>>
>> For us the LDIF renderings are all very fast on each CE, e.g.:
>>
>> ...
>> [2014-10-07 13:40:07] CEInfo: INFO: Generating GLUE2 XML rendering
>> [2014-10-07 13:40:08] CEInfo: INFO: Generating LDIF renderings
>> [2014-10-07 13:40:08] CEInfo: INFO: Generating GLUE2 LDIF rendering
>> [2014-10-07 13:40:08] CEInfo: INFO: Generating NorduGrid LDIF rendering
>> [2014-10-07 13:40:09] CEInfo: INFO: ############## A-REX infoprovider finished ##############
>>
>> Is it the "GLUE2 XML rendering" or "GLUE2 LDIF rendering" that is taking a long time? (or both?)
>>
>> Regards,
>> Andrew.
>>
>> ________________________________________
>> From: Gareth Roy [[log in to unmask]]
>> Sent: Tuesday, October 07, 2014 12:55 PM
>> To: [log in to unmask]
>> Subject: ARC Infosys issues.
>>
>> Hi All,
>>
>> Question for the ARC Technical experts.
>>
>> At Glasgow we’re currently having some issues with our ARC-CE publishing it’s availability via the BDII, which is causing some jobs to fail and the ARC to be unavailable. Usually the BDII is reachable but doesn’t contain any site or job information for 30-60 minutes and then slowly recovers.
>>
>> What appears to be happening is that the infosys is generating a bunch of ldif in /var/run/arc/infosys/ldif-provider.sh which is then read into the BDII via a FIFO and some other magic, and this process seems to timeout and fail.
>>
>> We added these variables (taken from the puppet module) and described in the manual as likely to help and have played with the values (current ones shown):
>>
>> timelimit="3600"
>> infoproviders_timeout="1200"
>>
>> But we’re still seeing some issues particularly with timeouts and getting the BDII to come back up at all. Interestingly if we disable GLUE2 LDAP generation via the infosys_glue2_ldap flag things seem much more responsive.
>>
>> Has anyone else see anything like this? I’m a missing something fundamentally obvious here?
>>
>> Any help would be great as I’m now at the point of going around in circles :)
>>
>> Thanks all,
>>
>> Gareth
>> --
>> Scanned by iCritical.
>
> --
> Scanned by iCritical.
|