Support Article
Pega718 - hazelcast preventing startup
Summary
Hazelcast problem in an environment consists of 4 nodes cluster, where only one server managed to start up. All servers run on WebSphere 8 against Oracle 11.2. Server 1 is the elastic search node. All the servers runs on a separate machine. The system has been running without any problem for months and then suddenly server 3 and 4 failed to startup with hazelcast errors.
Error Messages
PegaRULES.log
2015-11-02 15:06:04,182 [ a_name.xx] [ STANDARD] [ ] ( internal.mgmt.PRNodeImpl) INFO - Starts joining cluster
2015-11-02 15:11:22,952 [ a_name.xx] [ STANDARD] [ ] ( internal.mgmt.PREnvironment) ERROR - java.lang.IllegalStateException: Node failed to start!
2015-11-02 15:11:22,956 [ a_name.xx] [ STANDARD] [ ] ( etier.impl.EngineStartup) ERROR - PegaRULES initialization failed. Server: a_name.xx
com.pega.pegarules.pub.context.InitializationFailedError: PRNodeImpl init failed
at com.pega.pegarules.session.internal.mgmt.PREnvironment.getThreadAndInitialize(PREnvironment.java:388)
at com.pega.pegarules.session.internal.PRSessionProviderImpl.getThreadAndInitialize(PRSessionProviderImpl.java:1998)
at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineStartup.initEngine(EngineStartup.java:664)
at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl._initEngine_privact(EngineImpl.java:165)
at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl.doStartup(EngineImpl.java:138)
.......................
Caused by:
java.lang.IllegalStateException: Node failed to start!
at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:125)
at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:153)
at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:136)
at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:112)
at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:58)
at com.pega.pegarules.cluster.internal.PRClusterHazelcastImpl.initialize(PRClusterHazelcastImpl.java:400)
SystemOut.log
[10/30/15 16:20:16:315 CET] 000000f4 InternalParti W com.hazelcast.partition.InternalPartitionService [aa.bb.ccc.dd]:a_portNumber [365d514d10dbdd9348860eea944f9b88] [3.4.1] Following unknown addresses are found in partition table sent from master[Address[cc.dd.eee.ff]:a_portNumber]. (Probably they have recently joined or left the cluster.) {
Address[kk.ll.mmm.nnn]:a_portNumber
}
[10/30/15 16:20:16:371 CET] 000000f5 InternalParti W com.hazelcast.partition.InternalPartitionService [aa.bb.ccc.dd]:a_portNumber [365d514d10dbdd9348860eea944f9b88] [3.4.1] Following unknown addresses are found in partition table sent from master[Address[cc.dd.eee.ff]:a_portNumber]. (Probably they have recently joined or left the cluster.) {
Address[kk.ll.mmm.nnn]:a_portNumber
}
[10/30/15 16:20:16:390 CET] 000000f4 InternalParti W com.hazelcast.partition.InternalPartitionService [aa.bb.ccc.dd]:a_portNumber [365d514d10dbdd9348860eea944f9b88] [3.4.1] Following unknown addresses are found in partition table sent from master[Address[cc.dd.eee.ff]:a_portNumber]. (Probably they have recently joined or left the cluster.) {
Address[kk.ll.mmm.nnn]:a_portNumber
}
Steps to Reproduce
The system has been running without any problem for months and then suddenly server 3 and 4 failed to startup with hazelcast errors.
Root Cause
The server machines have more than one NIC installed. During the server start up, some server instances chose IP addresses associated to a NIC that does not map the machine's hostname and causing the reported hazelcast start-up failure.
Resolution
The following has been documented in the Hazelcast online manual:
14.2.2. Specifying Network Interfaces
You can also specify which network interfaces that Hazelcast should use. Servers mostly have more than one network interface so you may want to list the valid IPs. Range characters ('*' and '-') can be used for simplicity. So 10.3.10.*, for instance, refers to IPs between 10.3.10.0 and 10.3.10.255. Interface 10.3.10.4-18 refers to IPs between 10.3.10.4 and 10.3.10.18 (4 and 18 included). If network interface configuration is enabled (disabled by default) and if Hazelcast cannot find an matching interface, then it will print a message on console and won't start on that node.
Administrator can force Hazelcast to use one NIC over another by specifying "cluster/hazelcast/interface" setting in the prconfig.xml file. For example:
<env name="cluster/hazelcast/interface" value="123.123.123.*" />
The setting can also be applied in Dynamic System Setting.
Published January 31, 2016 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.