Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Agent fails when database turns unavailable during failover

SA-14519

Summary



When PRPC is pointed at a database cluster where one node fails over to another, any agent that executes during the failover period (usually a matter of a few seconds) goes into an inconsistent state and must be shutdown manually. This is visible by checking the Agents page of the SMA and inspecting the "next run time" column which is blank (the agent will never run again unless manually stopped and restarted or by bouncing the application server).
 

Error Messages



2015-09-12 16:46:46,916 [j2ee14_ws,maxpri=10]] [  STANDARD] [B727BE5F93BB4BDA00D0FC1A42BE498D1] [   Your_App:02.01.01] (  internal.access.DatabaseImpl) ERROR   - There was a problem with the database when getting a list:
com.pega.pegarules.pub.database.DatabaseException: Database-General          Problem encountered when getting connection for database pegarules          12514    66000    Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
DSRA0010E: SQL State = 66000, Error Code = 12,514
From: (B727BE5F93BB4BDA00D0FC1A42BE498D1)
Caused by SQL Problems.
Problem #1, SQLState 66000, Error code 12514: java.sql.SQLException: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
DSRA0010E: SQL State = 66000, Error Code = 12,514


 

Steps to Reproduce

  1. Set up an agent to run every 5 seconds.
  2. Take down database for 10 seconds.
  3. Notice that the next run time is blank for the agent and the agent never runs until restarted.


Root Cause



A defect or configuration issue in the operating environment.

PRPC expects the datasource connections to be usable by the time they are assigned by the application server connection manager. If they are not and the connection encounters an exception, the agents may enter an inconsistent state as per this scenario. From the application perspective, this means that the database failover is not working correctly. The duration of the outage (3 seconds or otherwise) is immaterial to the situation.
 
The WebSphere application server has robust mechanisms to test datasource connections before they are assigned to an application resource (and before they are returned to the pool). Robust database connection management, security, and performance is one of the “value adds” of an application server. Adding connection testing configuration would be appropriate for this failover scenario. Adding this level of configuration will take care of the failover issues observed, but the key is that the management code for ensuring the failover is successful be kept in the application server, the JDBC driver, and the database engine itself. Adding this kind of connection management and testing logic to the Pega application would be a product enhancement that would be redundant with programming in the other levels of the stack.

 

Resolution


Add connection testing and purging to the datasouce configuration. 

Published October 1, 2015 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us