Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Pega718 - hazelcast preventing startup

SA-17011

Summary



Hazelcast problem in an environment consists of 4 nodes cluster, where only one server managed to start up. All servers run on WebSphere 8 against Oracle 11.2. Server 1 is the elastic search node. All the servers runs on a separate machine. The system has been running without any problem for months and then suddenly server 3 and 4 failed to startup with hazelcast errors.
 

Error Messages


PegaRULES.log
2015-11-02 15:06:04,182 [          a_name.xx] [  STANDARD] [                    ] (      internal.mgmt.PRNodeImpl) INFO    - Starts joining cluster
2015-11-02 15:11:22,952 [          a_name.xx] [  STANDARD] [                    ] (  internal.mgmt.PREnvironment) ERROR  - java.lang.IllegalStateException: Node failed to start!
2015-11-02 15:11:22,956 [          a_name.xx] [  STANDARD] [                    ] (      etier.impl.EngineStartup) ERROR  - PegaRULES initialization failed. Server: a_name.xx
com.pega.pegarules.pub.context.InitializationFailedError: PRNodeImpl init failed
  at com.pega.pegarules.session.internal.mgmt.PREnvironment.getThreadAndInitialize(PREnvironment.java:388)
  at com.pega.pegarules.session.internal.PRSessionProviderImpl.getThreadAndInitialize(PRSessionProviderImpl.java:1998)
  at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineStartup.initEngine(EngineStartup.java:664)
  at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl._initEngine_privact(EngineImpl.java:165)
  at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl.doStartup(EngineImpl.java:138)
.......................
Caused by:
java.lang.IllegalStateException: Node failed to start!
  at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:125)
  at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:153)
  at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:136)
  at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:112)
  at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:58)
  at com.pega.pegarules.cluster.internal.PRClusterHazelcastImpl.initialize(PRClusterHazelcastImpl.java:400)
 
SystemOut.log
[10/30/15 16:20:16:315 CET] 000000f4 InternalParti W com.hazelcast.partition.InternalPartitionService  [aa.bb.ccc.dd]:a_portNumber [365d514d10dbdd9348860eea944f9b88] [3.4.1] Following unknown addresses are found in partition table sent from master[Address[cc.dd.eee.ff]:a_portNumber]. (Probably they have recently joined or left the cluster.) {
  Address[kk.ll.mmm.nnn]:a_portNumber
}
[10/30/15 16:20:16:371 CET] 000000f5 InternalParti W com.hazelcast.partition.InternalPartitionService  [aa.bb.ccc.dd]:a_portNumber [365d514d10dbdd9348860eea944f9b88] [3.4.1] Following unknown addresses are found in partition table sent from master[Address[cc.dd.eee.ff]:a_portNumber]. (Probably they have recently joined or left the cluster.) {
  Address[kk.ll.mmm.nnn]:a_portNumber
}
[10/30/15 16:20:16:390 CET] 000000f4 InternalParti W com.hazelcast.partition.InternalPartitionService  [aa.bb.ccc.dd]:a_portNumber [365d514d10dbdd9348860eea944f9b88] [3.4.1] Following unknown addresses are found in partition table sent from master[Address[cc.dd.eee.ff]:a_portNumber]. (Probably they have recently joined or left the cluster.) {
  Address[kk.ll.mmm.nnn]:a_portNumber
}


Steps to Reproduce



The system has been running without any problem for months and then suddenly server 3 and 4 failed to startup with hazelcast errors. 


Root Cause



The server machines have more than one NIC installed. During the server start up, some server instances chose IP addresses associated to a NIC that does not map the machine's hostname and causing the reported hazelcast start-up failure. 

Resolution



The following has been documented in the  Hazelcast online manual:

 

14.2.2. Specifying Network Interfaces

 

You can also specify which network interfaces that Hazelcast should use. Servers mostly have more than one network interface so you may want to list the valid IPs. Range characters ('*' and '-') can be used for simplicity. So 10.3.10.*, for instance, refers to IPs between 10.3.10.0 and 10.3.10.255. Interface 10.3.10.4-18 refers to IPs between 10.3.10.4 and 10.3.10.18 (4 and 18 included). If network interface configuration is enabled (disabled by default) and if Hazelcast cannot find an matching interface, then it will print a message on console and won't start on that node.

Administrator can force Hazelcast to use one NIC over another by specifying "cluster/hazelcast/interface" setting in the prconfig.xml file. For example:

<env name="cluster/hazelcast/interface" value="123.123.123.*" />

The setting can also be applied in Dynamic System Setting.
 

Published January 31, 2016 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us