Utility storage is designed to support Kubernetes and the System Management Services (SMS) it orchestrates. Utility storage is a cost-effective solution for storing the large amounts of telemetry and log data collected.
Ceph is the utility storage platform that is used to enable pods to store persistent data. It is deployed to provide block, object, and file storage to the management services running on Kubernetes, as well as for telemetry data coming from the compute nodes.
IMPORTANT NOTES:
- Commands for Ceph health must be run from either a master NCN,
ncn-s001
,ncn-s002
, orncn-s003
, unless they are otherwise specified to run on the host in question. Those nodes are the only ones with the necessary credentials. Individual procedures will specify when to run a command from a node other than those.
- Shrink: This only pertains to removing nodes from a cluster. Since Octopus and the move to utilize Ceph orchestrator, the Ceph cluster is probing nodes and adding unused drives. Removing a drive will only work if the actual drive is removed from a server.
- Add: This will most commonly pertain to adding a node with its full allotment of drives.
- Replace: This will most commonly pertain to replacing a drive or a node after hardware repairs.
Adjust Ceph cluster
- Adding a Ceph Node to the Ceph Cluster
- Add Ceph OSDs
- Shrink the Ceph Cluster: remove a Ceph node
- Shrink Ceph OSDs: remove OSDs from a Ceph cluster
- Adjust Ceph Pool Quotas
- Alternate Storage Pools
Ceph information
Ceph related operations
- Ceph Daemon Memory Profiling
- Ceph Deep Scrubs
- Dump Ceph Crash Data
- Identify Ceph Latency Issues
- Manage Ceph Services
- Restore Nexus Data After Data Corruption
- Collect Information about the Ceph Cluster
Ceph tools' usage documentation
MDS
- Troubleshoot Ceph MDS Client Connectivity Issues
- Troubleshooting Ceph MDS Reporting Slow Requests and Failure on Client
- Troubleshoot Insufficient Standby MDS Daemons Available
RGW
- Troubleshoot if RGW Health Check Fails
- Troubleshoot an Unresponsive Rados-Gateway (
radosgw
) S3 Endpoint
OSD
- Troubleshoot Ceph OSDs Reporting Full
- Troubleshoot a Down OSD
- Troubleshoot Ceph OSDs Not Being Created on Disks
Ceph Health
- Troubleshoot Large Object Map Objects in Ceph Health
- Troubleshoot Failure to Get Ceph Health
- Troubleshoot HEALTH ERR Module
devicehealth
Other