Support Article
Upgrade to Pega 7.1.7 causes DB Deadlock exceptions
SA-10036
Summary
An upgrade to Jboss EAP 6.3.2 together with Pega 7.1.7 is performed. Deadlocks in MS SQL database are observed when testing with 1 node and 5 slaves.
"com.pega.pegarules.pub.database.DatabaseException: Transaction (Process ID 77) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction".
A custom table TABLE_LOG_AUDIT contains log key actions in flows such as, User logged in, Time report created, Time report submitted. This table is filled with approximately 12 million rows.
An activity is creates an instance of the class, fetches a new sequence NBR from the DB through a stored procedure, puts the properties, does a OBJ-SAVE with the WriteNow checkbox checked and finally does a Page-Remove on the in memory page.
The error occurs because of a combination of the 12 million rows and Pega 7.1.7. The behavior disappears after truncating the log table. The error occurs frequently when logs are created for login, and sporadically when logs are created for log out of the user. It was also observed that when more than 1 user is logged in every 2 seconds on one JVM, this issue occurs.
The environment involved is:
MS SQL Server 2012
MS Windows server 2012 R2
Jboss Enterprise Application Platform 6.3.2 (also 6.1 used for comparison checking)
PRPC 7.1.7 (also 7.1.6 used for comparison checking)
JAVA SE 1.7.0_60-b19
sqljdbc4.jar (ms sql drivers)
Error Messages
2015-04-27 09:10:50,994 [xecutor-threads - 16] [ STANDARD] [ ] [ PegaRULES:07.10] ( internal.access.DatabaseImpl) ERROR your_server|IPADDRESS.10 - Encountered problem when performing a Commit
com.pega.pegarules.pub.database.DatabaseException: Transaction (Process ID 95) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
From: (HD63F05282C93AA21457337683180F5E6:IPADDRESS.10)
SQL: MERGE INTO data.so_log_audit t USING (VALUES (? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ?)) AS inputs ("pzInsKey" , "EventResult" , "EventType" , "FundId" , "Language" , "LoginType" , "MemberId" , "SessionId" , "WorkType" , "pxCreateDateTime" , "pxCreateOpName" , "pxCreateOperator" , "pxCreateSystemID" , "pxInsName" , "pxObjClass" , "pxUpdateDateTime" , "pxUpdateOpName" , "pxUpdateOperator" , "pxUpdateSystemID" , "pyDescription" , "pyID" , "pyLabel" , "pyUserAgent" , "pzPVStream") ON (t.pzInsKey = inputs."pzInsKey") WHEN MATCHED THEN UPDATE SET EventResult = inputs."EventResult" , EventType = inputs."EventType" , FundId = inputs."FundId" , Language = inputs."Language" , LoginType = inputs."LoginType" , MemberId = inputs."MemberId" , SessionId = inputs."SessionId" , WorkType = inputs."WorkType" , pxCreateDateTime = inputs."pxCreateDateTime" , pxCreateOpName = inputs."pxCreateOpName" , pxCreateOperator = inputs."pxCreateOperator" , pxCreateSystemID = inputs."pxCreateSystemID" , pxInsName = inputs."pxInsName" , pxObjClass = inputs."pxObjClass" , pxUpdateDateTime = inputs."pxUpdateDateTime" , pxUpdateOpName = inputs."pxUpdateOpName" , pxUpdateOperator = inputs."pxUpdateOperator" , pxUpdateSystemID = inputs."pxUpdateSystemID" , pyDescription = inputs."pyDescription" , pyID = inputs."pyID" , pyLabel = inputs."pyLabel" , pyUserAgent = inputs."pyUserAgent" , pzPVStream = inputs."pzPVStream" WHEN NOT MATCHED THEN INSERT (pzInsKey , EventResult , EventType , FundId , Language , LoginType , MemberId , SessionId , WorkType , pxCreateDateTime , pxCreateOpName , pxCreateOperator , pxCreateSystemID , pxInsName , pxObjClass , pxUpdateDateTime , pxUpdateOpName , pxUpdateOperator , pxUpdateSystemID , pyDescription , pyID , pyLabel , pyUserAgent , pzPVStream) VALUES (inputs."pzInsKey" , inputs."EventResult" , inputs."EventType" , inputs."FundId" , inputs."Language" , inputs."LoginType" , inputs."MemberId" , inputs."SessionId" , inputs."WorkType" , inputs."pxCreateDateTime" , inputs."pxCreateOpName" , inputs."pxCreateOperator" , inputs."pxCreateSystemID" , inputs."pxInsName" , inputs."pxObjClass" , inputs."pxUpdateDateTime" , inputs."pxUpdateOpName" , inputs."pxUpdateOperator" , inputs."pxUpdateSystemID" , inputs."pyDescription" , inputs."pyID" , inputs."pyLabel" , inputs."pyUserAgent" , inputs."pzPVStream");
Caused by SQL Problems.
Problem #1, SQLState 40001, Error code 1205: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 95) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Steps to Reproduce
Perform load testing
---------------------------------------
Functional test, systest environment
The result is ok
Testrun 1 Testrun 2 Testrun 3
Number of users 5 10 15
Iterations 2 2 2
Ramp up time 10 min 10 min 10 min
http errors 0% 0% 0%
Avg response time 195ms 165ms 169ms
Sent timereports (nbr of weeks)/target 20/20 40/40 56/60
Standard deviation total 694ms 460ms 583ms
High standard deviations Log in, fetching pdfs fetching pdfs Log in, fetching pdfs
Notes Some time reports alread existed in the backend system (2 x 2 weeks)
Loadtests preprod
A serious error occured quite fast in preproduction when the number of users went over 30 users / 60 seconds for log in and log out, the database got deadlock exceptions on the table TABLE_LOG_AUDIT (audit logging for events in the application ”APPLICATION”). The error has been reproduced and occurs when the number of logins exceeds 1 user / 2 seconds.
Root Cause
The root cause of this problem is a defect in Pegasystems’ code/rules. In Pega 7.1.7, merge or batch is introdued as default mode for saving (deferred/immediate) instances.
In the same transaction, the instances is grouped into multiple batches by prepared statements. The process does not necessarily maintain deterministic orders for instances. This could potentially violate execution orders wherever a mandatory order is needed. For example, foreign key constraints.
Resolution
This issue is resolved through the following local-change:
Disable merge or batch by change the following prconfig settings. This requires restart PRPC.
<env name="classmap/usemergestatement" value ="false" />
<env name="database/batchupdates" value="-1" />
Published September 18, 2015 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.