In an A/P configuration, the PCS uses a Virtual IP (VIP) which floats among the nodes and is owned by only one node - the master. The other node in the cluster would be slave node at this time. Each node in a cluster is in near-constant communication with the other to ensure they are all active and there is data synchronization as well, so all nodes know each other's status and all nodes have the latest configuration, including user sessions (DSID cookie values), bookmarks, system settings, etc. Status is reported on the clustering page. In the event that a node has not sent an update (timeout being the gateway's ARPing timeout + 5 seconds) the standby node in an Active/Passive cluster will assume the VIP (failover).Active-Active Mode
In an A/A configuration, there is no VIP, and the PCS relies on an external load balancer if the load is to be distributed among PCS nodes. Synchronization still takes place in A/A. Active/Active clustering allows for the nodes to be on different subnets on the same LAN (WAN clustering is not supported). Node-specific options, such as IP addresses, VLANs, et cetera (please see the admin guide for full list), will NOT synchronize among nodes and will be used only locally. Session IDs and user data are synchronized, if enabled.Configuration only
This option allows for only the configuration elements to be synchronized. This does not allow for session or user data synchronization.What are the ports used for PCS clustering?
UDP 4803, 4804
Do the active nodes monitor the state of their own interface?
TCP 4808, 4809, 4900 - 4910
Each node monitors both of it's interfaces by sending an ARP who-has (ARPing) to the default gateway. This ARP message is sent every 5 seconds. The PCS will wait up to 5 seconds for a response. If after 5 seconds, no response is received, the PCS begins a wait period of 45 seconds. If there is still no response, the PCS marks the interface as down.
Note: The ARP timeout value is configurable from the network settings page for each interface. Additionally, you can configure how many ARP ping timeouts are received before marking the interface as down. This applies to both interfaces and all nodes in the cluster. On the cluster properties page, there is an option to have each PCS disable their external interface in the event their internal interface goes down. This is a cluster-wide setting.How big is the Synchronization Packet?
This depends on how much data in synchronized. In testing, generally ~ 1MB of data is transferred for 1000 users when a node is added to the cluster and synchronized. After the nodes are synchronized, data only gets sent when something changes, whether it is a user session status, user properties (bookmarks), or a change to the system configuration. These are generally very small updates of only a few KB.How does the PCS inform the local nodes if the passive becomes the Master?
When one PCS fails, the other PCS detects the outage and assumes the VIP. It then issues a gratuitous ARP so that all local nodes (switches and routers included) will know the new MAC address for the VIP.What happens when the link goes down between two nodes in an Active/Active cluster?
When the two nodes are no longer able to communicate, each member will think it's the only node in the cluster, and will not synch data to other, now marked unreachable, PCS node. At this time, they will each still have all user sessions active, because PCS stores user sessions in the synchronized DSCache. This is because their cache was synchronized up until that point, so they still each think they have 250 users logged into their node. Now some sessions may get timed out on each node, meaning more users could log in to each node, meaning they could potentially each get 250 users, equaling a grand total of 500 concurrent users, spread out over 2, now separate, nodes.
So now let us presume each node has 250 users. When the two nodes rejoin the nodes will synchronize again. When this happens, the master node (which is randomly chosen in 3.X, and based on a split-cluster weight in 4.0), would override the slave, and thus the slave's cache (250 user sessions) would be wiped out and overwritten by the master's 250 user sessions. So at this time, the active sessions on the master, would be the active sessions in the newly formed cluster, after the join and synch process takes place once again.