[Issues] Some AW_UTIL functions cause lost RPC connection errors

D.R. Boxhoorn danny at astro.rug.nl
Wed Mar 5 14:14:30 CET 2008


Hoi John,

These particular two errors are completely unrelated.

The ORA-28576 is caused by the RPC calls to the HTM library in the database,
but thanks to Hugo we are closing in on that issue.
You can also get the ORA-28576 if you supply invalid parameters to the HTM
routines.

The ORA-3113 means that the connection to the database was lost.
This can happen because the database is restarted, but also because the
network connection itself is lost, for example when a network switch
is restarted or a firewall is reconfigured.
The longer you keep your awe session open, the more likely you are going
to see this error.
If this error occurs frequently, and does not appear to be caused by database
restarts, you should try to connect from a different machine or even different
network to find out if you have a network problem.


                                                   Danny


On Wed, Mar 05, 2008 at 01:09:03PM +0100, John P. McFarland wrote:
> Hi Danny,
> 
> As you know, I too have been experiencing ORA-28576 error during the 
> association stage of GAstrometric.  This is an ongoing issue you said was 
> related to the HTM shared library and was under investigation of some form.
> 
> I am still getting the ORA-28576 errors during the association of a 
> GAstrometric task preceded by another long GAstrometric task, but now I 
> more often am getting "ORA-03113: end-of-file on communication channel" 
> errors during the association and even at other times.  If I retry in the 
> same session I get "ORA-03114: not connected to ORACLE" errors at the first 
> database access indicating a database disconnection.
> 
> Are all these error messages related?
> 
> Cheers,
> 
> 
> -=John
> 
> 
> On Wed, 5 Mar 2008, D.R. Boxhoorn wrote:
> 
> >
> >Hoi Hugo,
> >
> >Could you please try again - with a 35 minute delay - with a new connection
> >and report whether the problem still occurs?
> >
> >Thanks,
> >
> >                                                  Danny
> >
> >On Tue, Mar 04, 2008 at 11:41:32AM +0100, Hugo Buddelmeijer wrote:
> >>Hi all,
> >>
> >>It appears that some of the functions in AW_UTIL cause the ORA-28576
> >>error: lost RPC connection to external procedure agent. This means that
> >>scripts (or sessions) using these functions cannot last longer than 30
> >>minutes.
> >>
> >>E.g. running this query twice with a 35 minute delay will raise above
> >>error: "SELECT * FROM TABLE(AWOPER.AW_UTIL.RADIUSTEST(15, 243.0, 27.0,
> >>5.0/3600.0))". Other queries, such as simple SourceList queries, do not
> >>raise this error.
> >>
> >>The AssociatList class uses above function to find associations. The
> >>consequences of this is that it is not possible to create several
> >>AssociateLists in a script/session if the time between their creation is
> >>more than 30 minutes.
> >>
> >>Is there a way around this problem? Am I doing something wrong?
> >>
> >>Attached scripts show the timeout in question. rpcTimeOutTest.py
> >>compares several queries which are specifically crafted for this test
> >>and would not be used in a regular session. testAL8.py tries two
> >>associations with an (artificial) delay, such a situation can be quite
> >>common.
> >>
> >>Greetings,
> >>Hugo
> >>
> >>
> >>
> >
> >>#!/usr/bin/env awe
> >>from astro.main.AssociateList import AssociateList
> >>from astro.main.SourceList import *
> >>import time
> >>
> >>slid1 = 135751 # 2df_R_13
> >>slid2 = 136161 # 2df_V_13
> >>slid3 = 136121 # 2df_I_13
> >>
> >>sl1 = (SourceList.SLID == slid1)[0]
> >>sl2 = (SourceList.SLID == slid2)[0]
> >>sl3 = (SourceList.SLID == slid3)[0]
> >>
> >># commenting out either the first association
> >># or the sleep results in no error
> >>al1 = AssociateList()
> >>al1.input_lists = [sl1, sl2]
> >>al1.make()
> >>al1.commit()
> >>
> >>time.sleep(35*60)
> >>
> >>al2 = AssociateList()
> >>al2.input_lists = [sl1, sl3]
> >>al2.make()
> >>al2.commit()
> >>
> >>
> >>
> >># output:
> >>"""
> >>virgo15:~/phd/awe>awe testAL8.py
> >>[virgo15] 13:38:31 - Preparing for the matching
> >>[virgo15] 13:40:01 - Found 2893 sources in SourceList with SLID = 135751
> >>[virgo15] 13:44:54 - Found 2901 sources in SourceList with SLID = 136161
> >>[virgo15] 13:44:54 - Looking for pairs
> >>[virgo15] 13:45:34 - Looking for closest pairs
> >>[virgo15] 13:45:35 - Filtered out 79 pairs
> >>[virgo15] 13:45:35 - Found 2456 pairs
> >>[virgo15] 13:45:35 - Inserting first half of pairs
> >>[virgo15] 13:45:35 - Inserting second half of pairs
> >>[virgo15] 13:45:36 - Inserting null associations from last input list
> >>[virgo15] 13:45:36 - Inserting null associations from first input list
> >>[virgo15] 13:45:36 - Created Chain AssociateList with ALID = 62441, name 
> >>=  and 3528 associates!
> >>[virgo15] 14:20:37 - Preparing for the matching
> >>[virgo15] 14:21:44 - Found 2893 sources in SourceList with SLID = 135751
> >>[virgo15] 14:22:46 - Found 6590 sources in SourceList with SLID = 136121
> >>[virgo15] 14:22:46 - Looking for pairs
> >>Traceback (most recent call last):
> >>  File "testAL8.py", line 23, in ?
> >>    al2.make()
> >>  File 
> >>  "/Users/users/buddel/phd/awe/cvs/opipe/astro/main/AssociateList.py", 
> >>  line 146, in make
> >>    self.associate_sourcelists()
> >>  File 
> >>  "/Users/users/buddel/phd/awe/cvs/opipe/astro/main/AssociateList.py", 
> >>  line 207, in associate_sourcelists
> >>    self.associate_lists(list1=self.input_lists[0], 
> >>    list2=self.input_lists[1])
> >>  File 
> >>  "/Users/users/buddel/phd/awe/cvs/opipe/astro/main/AssociateList.py", 
> >>  line 1131, in associate_lists
> >>    c.execute(Tquery)
> >>cx_Oracle.DatabaseError: ORA-28576: lost RPC connection to external 
> >>procedure agent
> >>"""
> >>
> >>
> >
> >>
> >>from common.database.Database import database
> >>import sys,time
> >>
> >># Simple function to do queries
> >>def do_query(q):
> >>    database.connect()
> >>    c = database.cursor()
> >>    c.execute(q)
> >>    results = c.fetchall()
> >>    c.close()
> >>    return results
> >>
> >># Determine what query we want to test
> >>if not len(sys.argv) == 2:
> >>    print """Usage: "awe %s <number>" where number is
> >>  1 for simple query of a SourceList
> >>  2 for NeighBoursTest query
> >>  3 for RadiusTest query""" % (sys.argv[0])
> >>    sys.exit()
> >>
> >>if sys.argv[1] == '2':
> >>    # This query will fail the second time
> >>    query = 'SELECT * FROM TABLE(AWOPER.AW_UTIL.NEIGHBOURSTEST(4067390, 
> >>    9, 9))'
> >>
> >>elif sys.argv[1] == '3':
> >>    # This query will also fail
> >>    query = 'SELECT * FROM TABLE(AWOPER.AW_UTIL.RADIUSTEST(15, 
> >>    243.000000, 27.000000, 5.000000/3600.0))'
> >>
> >>else:
> >>    # This query will succeed
> >>    query = 'SELECT "SLID","SID","HTM" FROM AWOPER."SOURCELIST*SOURCES" T 
> >>    WHERE T.SLID = 136111 AND T.SID = 10'
> >>
> >>
> >># Tell the user about the query
> >>print "query:",query
> >>
> >># Do the query for the first time, will always work
> >>data1 = do_query(query)
> >>print "data1: %i rows" % (len(data1))
> >>
> >># The RPC Timeout will occur after about half an hour
> >>print "sleeping for 35 minutes"
> >>minutes = 35
> >>for i in range(minutes):
> >>        print "sleeping minute %i/%i" % (i,minutes)
> >>        time.sleep(60)
> >>
> >>
> >># Try again, it will fail in query 2 and 3
> >>data2 = do_query(query)
> >>print "data2: %i rows" % (len(data2))
> >>
> >># cx_Oracle.DatabaseError: ORA-28576: lost RPC connection to external 
> >>procedure agent
> >>
> >>
> >>
> >>
> >
> >>_______________________________________________
> >>Issues mailing list
> >>Issues at astro-wise.org
> >>http://listman.astro-wise.org/mailman/listinfo/issues
> >_______________________________________________
> >Issues mailing list
> >Issues at astro-wise.org
> >http://listman.astro-wise.org/mailman/listinfo/issues
> >
> >
> >
> >** CRM114 Whitelisted by: From: "D.R. Boxhoorn" <danny at astro.rug.nl **
> >
> >** ACCEPT: CRM114 Whitelisted by: From: "D.R. Boxhoorn" 
> ><danny at astro.rug.nl **
> >
> >


More information about the Issues mailing list