Skip to content

Commit

Permalink
CASMTRIAGE-6742 : troubleshooting (#5358)
Browse files Browse the repository at this point in the history
* This is a combination of 2 commits.

CASMTRIAGE-6742 : cfs troubleshooting

* PR comments

* Update operations/iuf/IUF.md

Co-authored-by: Lindsay Eliasen <[email protected]>
Signed-off-by: Pankhuri-Rajesh <[email protected]>

---------

Signed-off-by: Pankhuri-Rajesh <[email protected]>
Co-authored-by: Lindsay Eliasen <[email protected]>
  • Loading branch information
Pankhuri-Rajesh and leliasen-hpe authored Sep 10, 2024
1 parent e330de3 commit 89ad967
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions operations/iuf/IUF.md
Original file line number Diff line number Diff line change
Expand Up @@ -838,6 +838,22 @@ The following actions may be useful if errors are encountered when executing `iu
of this document includes links to descriptions of each stage. Each of those descriptions includes an **Execution Details** section describing how to find the appropriate code in the IUF `workflows` directory to understand the
workflow and debug the issue.

### 4. Specific scenarios

1. IUF workflow may loop while rebuilding a management node.

- IUF loops while waiting for CFS to complete configuration of a management node. This step might not be completing because the CFS error count for the node has exceeded the maximum retry count for applying the configuration.
- Look at the Ansible logs for the CFS configuration operation for that node and attempt to rectify the problem.
- After resolving the problem, update the default error count in CFS using the below command. Run this command form a master or worker node. Set environment variable `XNAME` to be the xname of the node where the CFS configuration has failed.

```bash
cray cfs components update --enabled true --state '[]' --error-count 0 --format json $XNAME
```

- Once the error count is reset, the CFS will restart configuration for the node. If it does not start within a few minutes,
check whether CFS is unable to start the configuration again for the node due to any other issue. Rectify the problem by referring to the
[CFS troubleshooting guide](../../operations/configuration_management/Troubleshoot_CFS_Sessions_Failing_to_Start.md)

## Install and Upgrade Observability Framework

The Install and Upgrade Observability Framework includes assertions for Goss health checks, as well as metrics and dashboards for health checks.
Expand Down

0 comments on commit 89ad967

Please sign in to comment.