Reset Search
 

 

Article

KB25377 - How to troubleshoot and collect logs for when nodes in an active/passive cluster are enabled but unreachable

« Go Back

Information

 
Last Modified Date11/23/2015 7:51 PM
Synopsis
This article provides information on the troubleshooting procedure and logs to collect, when the active/passive cluster is not formed properly and the nodes are shown as enabled, but unreachable.
 
Problem or Goal
The following image illustrates the issue of a node being shown as enabled, but unreachable:

User-added image


There are two possible scenarios:
 
  • Both nodes are accessible via the web. The VIP is on one of the nodes and user access is not impacted; but cluster failover does not occur, when the Cluster goes into this state. 
  • One of the nodes goes down and is inaccessible via web and the ping is lost to the node. In this case, if the active node goes down, the failover to the passive node will not occur.
Cause
Solution
Try to resolve the issue by performing the following procedure:
 
  1. Make sure that the Network Connect Server IP Address value under Network settings > Network Connect is not the same as the internal port's physical IP/Cluster VIP IP on both of the nodes.

    If they are the same, change the Network Connect Server IP address to its default value  - 10.200.200.200 and reboot the cluster. Verify if the cluster is up by monitoring the cluster status.
  1. Check the link speed between the PCS node and the switch port, which is connected to the node. Verify under Network settings> Internal port settings, for both nodes, if the link speed setting is the same and matches with the link speed on the switch port, which is connected to the internal port of both the nodes.
  2. Verify that there are no duplicate ARP entries for the cluster's internal VIP IP in the network.
If the above procedure does not resolve the issue, collect the following logs and open a  Pulse Secure TAC case:

On both of the nodes:
  • Under Troubleshooting, enable Monitoring > Node monitoring.
  • Under Troubleshooting, enable Monitoring > Debug logging, set the log level as 15 and log size as 50, and type the event code as DSUtil,-DSLog,-DSConfig,dsnetd::ipat,dsnetd::garpsweep (without any spaces).

Note: The event codes are case-sensitive and should be entered as described above.
     
    • When the debugging options are enabled, enable TCP dump from the internal port on both of the nodes, leave it on for 3-5 minutes, and capture the TCP dump from both of the nodes. 
    • Obtain an admin generated snapshot with the Include system config and Include debug log checkboxes enabled on both of the nodes and turn the above 2 options off for monitoring, when the system snapshot has been taken.
    • Obtain the User access, event, and admin access logs from both of the nodes. These logs contain the time stamp of when the snapshot and TCP dump were captured.
    • On both of the nodes, via the admin UI, take a complete screenshot of the Status overview graphs, which are filtered for a week’s data. You can filter the graph by going to Status overview > Page settings.

    For the 2nd scenario, a physical reboot of the node, which went down, restores the cluster in most cases. Open a case with Pulse Secure TAC to diagnose the issue.

    The following logs should be collected for this scenario:

    Before rebooting the node that went down, connect to the it via serial console and try to obtain the serial console output by performing the following procedure:

    Note: If the console is not even responding, skip step 1 and go to step 2. If the console displays errors, take a screenshot of the console screen.
    1. Kernel Trace/dump:
      • Go to HyperTerminal menu > Transfer > Capture Text.
    2. Select the file to write to and click Start.
    • To set the Kernel Logging to Level 9, use the key combination of Ctrl + 9.
    • Leave it on for 15 minutes.
    • The Ctrl+Break+T key sequence will output the kernel trace/dump to the console.
    • After the problematic node has been rebooted, obtain an admin generated snapshot with the Include system config and Include debug log check boxes enabled, immediately after the node has been brought back up.
    • If one of the nodes is accessible via the web, when the issue occurs, obtain an admin generated snapshot on the node, with the Include system config and Include debug log check boxes enabled.
    • Obtain the user access, event, and admin access logs from both of the nodes.
    • On both of the nodes, via the admin UI, under Status overview > Page settings > Advanced status, select the Check storage checkbox; so that the storage percentage is displayed in the admin UI overview.

      Obtain a complete screenshot of the Status overview Graphs from both of the nodes, which are filtered for a day’s data and another one filtered for a week’s data. You can filter the graph by going to Status overview > Page settings.
    • Obtain the recently generated process snapshots, if any, from both of the nodes under Troubleshooting > System snapshot.
    Related Links
    Attachment 1 
    Created ByData Deployment

    Feedback

     

    Was this article helpful?


       

    Feedback

    Please tell us how we can make this article more useful.

    Characters Remaining: 255