Ceph

Ceph Storage Backend for OpenStack

The Ceph Cluster is used as Storage Backend for OpenStack, to provide root disks and volumes via block devices for virtual machines. It is also used to store vm-images.

In general the following OpenStack components interact with ceph:

Cinder: Allocates ceph block devices for vms
Glance: OpenStack image service
Nova: Consumes ceph backed cinder volumes and glance images to run vms

Used software

To deploy Ceph we use Rook Ceph Rook is the Ceph Storage Operator for Kubernetes. In Cotb ther are currently the following versions integrated:

info

Rook rook/ceph:v1.9.2
- with Ceph Version quay.io/ceph/ceph:v17.2.3

Detailed Documentation for Rook and Ceph can be found here:

Ceph in a nutshell

info

Ceph Docs

OSDs - stores data, handles replication, recovery and rebalancing; mapped to physical devices
Monitors - maintains maps of the cluster state. Responsible for managing authentication between daemons and clients
Managers - keeping track of runtime metrics and the current state of the Ceph cluster

Ceph Pools

Pools are logical partitions that are used to store objects.
Pools provide
- Resilience: number of OSDs that are allowed to fail without data being lost
- Placement Groups: aggregate Objects within pools; data durability and even distribution among OSDs
- CRUSH Rules: handles placement of objects and its replicas
- Snapshots: ability to create snapshots of a pool

Rook Ceph

overlays
├── cluster     # cluster deployment
├── openstack   # openstack resources, create ceph pools and accounts
└── operator    # kubernetes CRDs and operator

Ceph pools for OpenStack
- cinder-backup: pool to store cinder backups
- glance: store raw images
- vms: root disks via direct copy from 'images' pool
- cinder: volumes pool

Prerequisites

Disk that are used for the Ceph storage Cluster always needs to be the same size.

Inventory configuration for storage nodes:

For each storage node you should configure filesystems /var/lib/rook and /var/lib/cephvolumes for ceph to store its state.
- The partitions are used to persist the cluster state and is mandatory for each storage node.
Each disk in a stroage node which is desired to be used in the ceph cluster must have one partition and needs a partitionlabel which begins with osd. Typically followed by increasing numbering schema.
All disks with the partition label osd* are consumed by ceph later in the default configuration.
The setting size_mib: 0 indicates, that the whole disk shall be used for this partitions.
- To ensure optimal operations, ceph recommends to use only disks and osds with the same size
Due to the fact that ceph creates its own filesystem, the value create_filesystem: false need to be set for every osd.

  state:
    - device: /dev/disk/by-id/XYZ
      wipe_disks: false
      partitions:
        ...
        - label: rookceph
          path: /var/lib/rook
          size_mib: 5000
        - label: cephvolumes
          path: /var/lib/cephvolumes
          size_mib: 5000
        ...
    - device: /dev/disk/by-id/nvme-OBFUSCATED
      wipe_disks: false
      partitions:
        - label: osd01
          create_filesystem: false
          size_mib: 0
    - device: /dev/disk/by-id/XYZ
      wipe_disks: false
      partitions:
        - label: osd02
          create_filesystem: false
          size_mib: 0

Ceph Deployment for usage in OpenStack

The manifests for the ceph deployment are located in: deployments/apps/ceph.

The deployment is currently applied in 3 steps:

Deploy operator

The Rook operator automates the configuration of the storage cluster and the deployed components and performs healthchecks and monitors the cluster state. If some configuration changes or OSD are not immediately detected during the initial deployment step, you can restart the Operator Pod (by deleting the Pod, not the deployment). The operator bootstraps the cluster and monitors the following components:

Ceph Monitor pods
Ceph OSD daemons
RADOS storage
Ceph daemons
Common Resources
- CRDs for pools, object stores, file system
- Kubernetes ClusterRoles and ClusterRoleBindings

# example
# ceph/overlays/operator
$ kustomize build . | kubectl apply -f -
namespace/rook-ceph created
deployment.apps/rook-ceph-operator created

Deploy cluster

With the running Rook operator, the Cluster deployment can be applied to create the Ceph cluster. Within the rook-ceph namespace there should be the following pods up and running. In addition a rook-ceph-osd-* pod for every configured osd in the inventory.

# ceph/overlays/cluster
$ kustomize build . | kubectl apply -f -
cephcluster.ceph.rook.io/rook-ceph created
deployment.apps/rook-ceph-tools created

$ kubectl -n rook-ceph get pods
NAME                                    READY   STATUS      RESTARTS   AGE
rook-ceph-mon-a-565c45b7d4-hkwjx        1/1     Running     0          10m
rook-ceph-tools-764df978f9-tjfgm        1/1     Running     0          11m
rook-ceph-mon-b-6799ccfc99-fhpxf        1/1     Running     0          8m56s
rook-ceph-mon-c-77f8bf5574-8cjhg        1/1     Running     0          7m52s
rook-ceph-mgr-a-845464657f-fzhzq        2/2     Running     0          7m41s
rook-ceph-mgr-b-74898ddc55-c48j8        2/2     Running     0          7m40s
rook-ceph-osd-0-77b856b79b-hm4k9        1/1     Running     0          5m18s
rook-ceph-osd-2-8478cf74b4-4jxsk        1/1     Running     0          5m18s
rook-ceph-osd-4-6b965c445d-7zcq8        1/1     Running     0          5m18s
rook-ceph-osd-7-6f96557b5d-472pm        1/1     Running     0          4m41s
rook-ceph-osd-3-8497754798-kqrnm        1/1     Running     0          4m42s
rook-ceph-osd-6-78fd6c4d46-gq26c        1/1     Running     0          4m41s
rook-ceph-osd-1-c5c7fdfd4-qlldq         1/1     Running     0          4m41s
rook-ceph-osd-5-5c9669848-7p2kc         1/1     Running     0          4m41s
rook-ceph-operator-78bf578574-fp4vj     1/1     Running     0          114s
rook-ceph-osd-prepare-storage02-956r4   0/1     Completed   0          92s
rook-ceph-osd-8-7fb5bfc45f-c9hn8        0/1     Running     0          71s
rook-ceph-osd-prepare-storage01-2q7t6   0/1     Completed   0          88s
rook-ceph-osd-10-6cb959bb9-6gzbg        0/1     Running     0          31s
rook-ceph-osd-11-cd7cf5bf4-52qp2        0/1     Running     0          31s
rook-ceph-osd-9-79c8db8cf8-bg67j        0/1     Running     0          31s 

If all pods are up and running, and the osd-prepare jobs are completed, you can verify the ceph cluster state as described in the Toolbox section. If not all osds are detected please restart the rook-operator with the following command:

kubectl -n rook-ceph rollout restart deployment rook-ceph-operator

To examine the logs of the osd preparation investigate the logs from the respective osd-prepare job of the desired storage node.

kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare

kubectl -n rook-ceph logs rook-ceph-osd-prepare-storagenode01-jd0dh 

info

ceph-mgr should be active
all mons should be in quorum
one osd should be active (minimum)
cluster health should be HEALTH_OK
- if not, errors should be investigated
- Check https://rook.io/docs/rook/v1.9/Troubleshooting/ceph-common-issues/ for troubleshooting

Deploy OpenStack resources

The Ceph resources (e.g. ceph client, block-pools, etc) are managed via the rook ceph operator and deployed with kubernetes resources. The manifests for Openstack are managed in the ceph/overlays/openstack directory within the ceph app deployment.

Deploy ceph pools and clients for OpenStack Services
- cinder, cinder-backup, glance, nova
Handles access to ceph pools

# ceph/overlays/openstack
$ kustomize build . | kubectl apply -f -

info

Client keyrings generated here are needed for OpenStack deployment.

The following section describes the necessary steps to configure OpenStack to use Ceph and the above created resources.

Configure OpenStack to use Ceph

In order to use Ceph as storage backend for OpenStack, the access to the Ceph Cluster must be configured within the OpenStack deployment. To retrieve the secrets for the generated Ceph clients you need access to the Kubernetes Cluster where Ceph is deployed. Each Openstack Service has it's own Ceph pool and therfore its own access credentials for OpenStack.

extract ceph client secrets
for i in cinder cinder-backup glance nova; do \
  echo "$i:"; \
  kubectl --namespace rook-ceph get secret rook-ceph-client-$i -o jsonpath="{.data.$i}" | base64 -d; \
  echo -e "\n"; \
done

The secrets can also be retrieved via the ceph cli with the ceph-toolbox.

Import the secret keys into the corresponding configuration in your OpenStack deployment in openstack/overlays/workload/config/ceph/. The following configurations are used by OpenStack to connect to the various Ceph-pools with their respective capabilities.

client.cinder-backup.keyring
[client.cinder-backup]
	key = redacted
	caps mon = "profile rbd"
	caps osd = "profile rbd pool=backups"

client.cinder.keyring
[client.cinder]
	key = redacted
	caps mon = "allow profile rbd"
	caps osd = "profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images"

client.glance.keyring
[client.glance]
	key = glance
	caps mon = "profile rbd"
	caps osd = "profile rbd pool=volumes, profile rbd pool=images"

client.nova.keyring
[client.nova]
	key = redacted
	caps mon = "profile rbd"
	caps osd = "profile rbd pool=images, profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=backups"

warning

Ensure to have a newline at end of file!

In the next step, the ceph config for the Cluster connection of the OpenStack Services and Ceph has to be configured. This is done via the ceph*.conf files in the OpenStack deployment in the openstack/overlays/workload/config/ directory.

The Ceph fsid used here is already configured in the Ceph deployment in deployments/apps/ceph/overlays/cluster/cluster.yaml or can be retrieved from the live system via kubernetes resources in the rook-ceph namespace kubectl get configmap rook-config-override -o yaml.

$ kubectl get configmap rook-config-override -o yaml
apiVersion: v1
data:
  config: |
    [global]
    fsid = 9cba584a-259a-410c-b3c4-f214ba73b477
    osd_pool_default_size = 2
    bdev_flock_retry = 20
    bluefs_buffered_io = false
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernete ...

ceph.conf
ceph-backup.conf
ceph-glance.conf
ceph-nova.conf

The following example shows the configuration for the openstack/overlays/workload/config/ceph.conf

[global]
fsid                    = 9cba584a-259a-410c-b3c4-f214ba73b477
mon initial members     = a b c
mon host                = [rook-ceph-mon-a.rook-ceph],[rook-ceph-mon-b.rook-ceph],[rook-ceph-mon-c.rook-ceph]
osd_pool_default_size   = 2
bdev_flock_retry        = 20
bluefs_buffered_io      = false

[client.cinder]
keyring = /etc/ceph/client.cinder.keyring

[client.cinder-backup]
keyring = /etc/ceph/client.cinder-backup.keyring

[client.glance]
keyring = /etc/ceph/client.glance.keyring

[client.nova]
keyring = /etc/ceph/client.nova.keyring

For OpenStack itself, there must be a RBD_SECRET_UUID to be able to access the ceph secrets by the acompanying services. The value for the Secret can be generated with the uuidgen program. The following services has to be configured with the same generated uuid within the OpenStack Deployment as Environment variable.

cinder_volume.yaml
libvirt_compute.yaml
nova_compute.yaml

Additional services like the nova-android-emulater does also use this secret.

nova_compute_android_emulator.yaml
Use Glance to store your cloud images.

warning

Raw image format recommended to use with ceph/openstack for direct copy

Use volumes to boot VMs, or to attach volumes to running VMs via cinder

info

Overview OpenStack Integration

Rook Ceph toolbox

Rook toolbox is a container with ceph cli and common tools used for rook debugging and testing.

Ceph toolbox

$ kubectl exec -it rook-ceph-tools-6c9d58dbf-7k6z4 -- ceph status
  cluster:
    id:     7694ab0c-b215-4f07-a6c7-5cb7abfbcf05
    health: HEALTH_OK
  services:
    mon: 1 daemons, quorum a (age 2w)
    mgr: a(active, since 2w)
    osd: 2 osds: 2 up (since 2w), 2 in (since 2w)
  data:
    pools:   5 pools, 129 pgs
    objects: 1.32k objects, 1.7 GiB
    usage:   569 MiB used, 1.3 TiB / 1.3 TiB avail
    pgs:     129 active+clean

Upgrade Ceph Cluster

caution

WARNING: Upgrading a Rook cluster is not without risk. There can always be unexpeted issues that may damage the integrity of the cluster. This can also include data loss.

Clusters under upgrade can be unavailable for a short time during the upgrade process.

Please read the official upgrade docs from the Rook website before performing an upgrade.

Check the upgrade guides in https://rook.io/docs/rook/v1.10/Upgrade/rook-upgrade/ and https://rook.io/docs/rook/v1.10/Upgrade/ceph-upgrade/ with the respective versions.

Upgrade manifests in overlays/operator
- upgrade operator and cluster resources with the according tag. and check with previous version for custom adjustments. They should be kept.
- Update all common resources and crds manifests from the respective version.
Adjust and update the manifests included in the overlays/cluster directory. e.g. toolbox.yaml and the ceph version version in cluster.yml.
Update the rook operator deployment. The largest portion of the upgrade is triggered when the operator's image is updated to v1.10.x. When the operator is updated, it will proceed to update all of the Ceph daemons.

Wait for the upgrade to complete check component versions:

export ROOK_CLUSTER_NAMESPACE=rook-ceph
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

Tear down ceph cluster:

caution

This completely wipes all data in your ceph cluster.

delete the openstack overlay deployment overlays/openstack

delete remaining cephblockpools if existing

kubectl get cephblockpool

kubectl delete -n rook-ceph cephblockpool poolname

Delete ceph cluster CRD

kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'

Delete ceph cluster

kubectl -n rook-ceph delete cephcluster rook-ceph
kubectl -n rook-ceph get cephcluster

Delete state directory on all storagenodes
```
rm -rf /var/lib/rook/*
```
Delete the operator deployment from overlays/operator
Wipe data on disks
```
DISK="/dev/sdX"
sgdisk --zap-all $DISK
```
Alternative to 7) Use use wipefs -a /dev/diskpartition on a specific partition. (use an elevated alpine container via podman to execute this on the affected physical storage node ) Then use gdisk -l /dev/diskpartition to verify if the partition table scan show not persent for every entry.

Ceph

Ceph Storage Backend for OpenStack​

Used software​

Ceph in a nutshell​

Rook Ceph​

Prerequisites​

Ceph Deployment for usage in OpenStack​

Deploy operator​

Deploy cluster​

Deploy OpenStack resources​

Configure OpenStack to use Ceph​

Rook Ceph toolbox​

Upgrade Ceph Cluster​

Tear down ceph cluster:​

Ceph Storage Backend for OpenStack

Used software

Ceph in a nutshell

Rook Ceph

Prerequisites

Ceph Deployment for usage in OpenStack

Deploy operator

Deploy cluster

Deploy OpenStack resources

Configure OpenStack to use Ceph

Rook Ceph toolbox

Upgrade Ceph Cluster

Tear down ceph cluster: