Verify if there are any general network outages/problem (Physical, Data-link). If there are none, then perform the following checks:
- High CPU/Memory utilization or OUT of memory symptoms and critical event reports (if any) on PCS/PPS node dashboard graphs/event logs.
- Recommended data to review: This information can be verified from the admin UI under System > Status > Overview.
- Mismatch in speed/duplex/MTU settings between the PCS/PPS network interfaces and the respective connected switch ports.
- Recommended data to review: This information can be verified from the admin UI under System > Network > Internal Port and/or External Port.
- High CPU on gateways/firewall devices resulting in latency, delayed responses and packet drops.
- ARP/Proxy ARP security settings/filters on the firewall gateway devices that either drop or do not respond to ARP between the two nodes.
- Recommended data to review: This information can only be verified by a system snapshot while the problem is occurring.
- Incorrect ARP broadcast (sent from another device) received by one or both nodes causing an incorrect mac address entry on the PCS device.
- Recommended data to review: This information can only be verified by a system snapshot while the problem is occurring.
If the observed symptom is due to high CPU/memory experienced during peak usage hours, it could be triggered due to sudden ramp-up/burst in user activity.
- If the ramp-up time is consistent every day, please engage Pulse Secure Support to analyze the load on the device.
- Recommended data to gather: Gather a system snapshot (while the problem is occurring) and screenshot of all system status graphs for 1 day and 2 days. Note: Graphs should cover the time when the problem is occurring.
Note: To further optimize performance, disable synchronization (if not required) for "log messages/ user sessions/ last access time" at
System >
Cluster >
Properties >
Synchronization.
Recommended logs to gather:
If the problem persists, perform the following steps on all nodes:
- Enable node monitoring at Maintenance > Troubleshooting > Monitoring > Node Monitor.
- Enter 30 as maximum log size / 30 seconds as monitoring interval.
- Enable the checkboxes for Node monitoring enabled and ifconfig enabled/top enabledt/cachesize enabled/dsstatdump enabled, and click Save Changes.
- Enable debug logging at Maintenance > Troubleshooting > Monitoring, and specify the following:
- Detail Level: 10 / Size: 250 MB
- Enter the following event codes: dsnetd::garpsweep,dsnetd::health,dsnetd::ipat,DSCluster:dsipatd, -DSUtil,-DSLog,-DSConfig, and click Save Changes.
After 5-10 minutes (during the failover symptoms), collect the following on all nodes:
- Enable the checkboxes for Include Debug log/system config under Maintenance > Troubleshooting > System Snapshot and Take Snapshot. Save the file locally.
- Navigate to System > Log/Monitoring > Event Logs and click Save All Logs. Save the file locally
- Save screenshots of all "System dashboard / system capacity graphs" at System > Status page.
- After logs are captured, disable node monitoring and debug logging commented at steps 1 and 2.
Please open a Tech SR at
https://my.pulsesecure.net and attach all logs.