Skip to main content

Ceph

Ceph Storage Backend for OpenStack

The Ceph Cluster is used as Storage Backend for OpenStack, to provide root disks and volumes via block devices for virtual machines. It is also used to store vm-images.

In general the following OpenStack components interact with ceph:

  • Cinder: Allocates ceph block devices for vms
  • Glance: OpenStack image service
  • Nova: Consumes ceph backed cinder volumes and glance images to run vms

Used software

To deploy Ceph we use Rook Ceph Rook is the Ceph Storage Operator for Kubernetes. In Cotb ther are currently the following versions integrated:

info
  • Rook rook/ceph:v1.9.2
    • with Ceph Version quay.io/ceph/ceph:v17.2.3

Detailed Documentation for Rook and Ceph can be found here:

Ceph in a nutshell

  • OSDs - stores data, handles replication, recovery and rebalancing; mapped to physical devices
  • Monitors - maintains maps of the cluster state. Responsible for managing authentication between daemons and clients
  • Managers - keeping track of runtime metrics and the current state of the Ceph cluster

Ceph Pools

  • Pools are logical partitions that are used to store objects.
  • Pools provide
    • Resilience: number of OSDs that are allowed to fail without data being lost
    • Placement Groups: aggregate Objects within pools; data durability and even distribution among OSDs
    • CRUSH Rules: handles placement of objects and its replicas
    • Snapshots: ability to create snapshots of a pool

Rook Ceph

overlays
├── cluster # cluster deployment
├── openstack # openstack resources, create ceph pools and accounts
└── operator # kubernetes CRDs and operator
  • Ceph pools for OpenStack
    • cinder-backup: pool to store cinder backups
    • glance: store raw images
    • vms: root disks via direct copy from 'images' pool
    • cinder: volumes pool

Prerequisites

Disk that are used for the Ceph storage Cluster always needs to be the same size.

Inventory configuration for storage nodes:

  • For each storage node you should configure filesystems /var/lib/rook and /var/lib/cephvolumes for ceph to store its state.
    • The partitions are used to persist the cluster state and is mandatory for each storage node.
  • Each disk in a stroage node which is desired to be used in the ceph cluster must have one partition and needs a partitionlabel which begins with osd. Typically followed by increasing numbering schema.
  • All disks with the partition label osd* are consumed by ceph later in the default configuration.
  • The setting size_mib: 0 indicates, that the whole disk shall be used for this partitions.
    • To ensure optimal operations, ceph recommends to use only disks and osds with the same size
  • Due to the fact that ceph creates its own filesystem, the value create_filesystem: false need to be set for every osd.
  state:
- device: /dev/disk/by-id/XYZ
wipe_disks: false
partitions:
...
- label: rookceph
path: /var/lib/rook
size_mib: 5000
- label: cephvolumes
path: /var/lib/cephvolumes
size_mib: 5000
...
- device: /dev/disk/by-id/nvme-OBFUSCATED
wipe_disks: false
partitions:
- label: osd01
create_filesystem: false
size_mib: 0
- device: /dev/disk/by-id/XYZ
wipe_disks: false
partitions:
- label: osd02
create_filesystem: false
size_mib: 0

Ceph Deployment for usage in OpenStack

The manifests for the ceph deployment are located in: deployments/apps/ceph.

The deployment is currently applied in 3 steps:

Deploy operator

The Rook operator automates the configuration of the storage cluster and the deployed components and performs healthchecks and monitors the cluster state. If some configuration changes or OSD are not immediately detected during the initial deployment step, you can restart the Operator Pod (by deleting the Pod, not the deployment). The operator bootstraps the cluster and monitors the following components:

  • Ceph Monitor pods
  • Ceph OSD daemons
  • RADOS storage
  • Ceph daemons
  • Common Resources
    • CRDs for pools, object stores, file system
    • Kubernetes ClusterRoles and ClusterRoleBindings
# example
# ceph/overlays/operator
$ kustomize build . | kubectl apply -f -
namespace/rook-ceph created
deployment.apps/rook-ceph-operator created

Deploy cluster

With the running Rook operator, the Cluster deployment can be applied to create the Ceph cluster. Within the rook-ceph namespace there should be the following pods up and running. In addition a rook-ceph-osd-* pod for every configured osd in the inventory.

# ceph/overlays/cluster
$ kustomize build . | kubectl apply -f -
cephcluster.ceph.rook.io/rook-ceph created
deployment.apps/rook-ceph-tools created

$ kubectl -n rook-ceph get pods
NAME READY STATUS RESTARTS AGE
rook-ceph-mon-a-565c45b7d4-hkwjx 1/1 Running 0 10m
rook-ceph-tools-764df978f9-tjfgm 1/1 Running 0 11m
rook-ceph-mon-b-6799ccfc99-fhpxf 1/1 Running 0 8m56s
rook-ceph-mon-c-77f8bf5574-8cjhg 1/1 Running 0 7m52s
rook-ceph-mgr-a-845464657f-fzhzq 2/2 Running 0 7m41s
rook-ceph-mgr-b-74898ddc55-c48j8 2/2 Running 0 7m40s
rook-ceph-osd-0-77b856b79b-hm4k9 1/1 Running 0 5m18s
rook-ceph-osd-2-8478cf74b4-4jxsk 1/1 Running 0 5m18s
rook-ceph-osd-4-6b965c445d-7zcq8 1/1 Running 0 5m18s
rook-ceph-osd-7-6f96557b5d-472pm 1/1 Running 0 4m41s
rook-ceph-osd-3-8497754798-kqrnm 1/1 Running 0 4m42s
rook-ceph-osd-6-78fd6c4d46-gq26c 1/1 Running 0 4m41s
rook-ceph-osd-1-c5c7fdfd4-qlldq 1/1 Running 0 4m41s
rook-ceph-osd-5-5c9669848-7p2kc 1/1 Running 0 4m41s
rook-ceph-operator-78bf578574-fp4vj 1/1 Running 0 114s
rook-ceph-osd-prepare-storage02-956r4 0/1 Completed 0 92s
rook-ceph-osd-8-7fb5bfc45f-c9hn8 0/1 Running 0 71s
rook-ceph-osd-prepare-storage01-2q7t6 0/1 Completed 0 88s
rook-ceph-osd-10-6cb959bb9-6gzbg 0/1 Running 0 31s
rook-ceph-osd-11-cd7cf5bf4-52qp2 0/1 Running 0 31s
rook-ceph-osd-9-79c8db8cf8-bg67j 0/1 Running 0 31s

If all pods are up and running, and the osd-prepare jobs are completed, you can verify the ceph cluster state as described in the Toolbox section. If not all osds are detected please restart the rook-operator with the following command:

kubectl -n rook-ceph rollout restart deployment rook-ceph-operator

To examine the logs of the osd preparation investigate the logs from the respective osd-prepare job of the desired storage node.

kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare

kubectl -n rook-ceph logs rook-ceph-osd-prepare-storagenode01-jd0dh
info

Deploy OpenStack resources

The Ceph resources (e.g. ceph client, block-pools, etc) are managed via the rook ceph operator and deployed with kubernetes resources. The manifests for Openstack are managed in the ceph/overlays/openstack directory within the ceph app deployment.

  • Deploy ceph pools and clients for OpenStack Services
    • cinder, cinder-backup, glance, nova
  • Handles access to ceph pools
# ceph/overlays/openstack
$ kustomize build . | kubectl apply -f -
info

Client keyrings generated here are needed for OpenStack deployment.

The following section describes the necessary steps to configure OpenStack to use Ceph and the above created resources.

Configure OpenStack to use Ceph

In order to use Ceph as storage backend for OpenStack, the access to the Ceph Cluster must be configured within the OpenStack deployment. To retrieve the secrets for the generated Ceph clients you need access to the Kubernetes Cluster where Ceph is deployed. Each Openstack Service has it's own Ceph pool and therfore its own access credentials for OpenStack.

extract ceph client secrets
for i in cinder cinder-backup glance nova; do \
echo "$i:"; \
kubectl --namespace rook-ceph get secret rook-ceph-client-$i -o jsonpath="{.data.$i}" | base64 -d; \
echo -e "\n"; \
done

The secrets can also be retrieved via the ceph cli with the ceph-toolbox.

Import the secret keys into the corresponding configuration in your OpenStack deployment in openstack/overlays/workload/config/ceph/. The following configurations are used by OpenStack to connect to the various Ceph-pools with their respective capabilities.

client.cinder-backup.keyring
[client.cinder-backup]
key = redacted
caps mon = "profile rbd"
caps osd = "profile rbd pool=backups"

client.cinder.keyring
[client.cinder]
key = redacted
caps mon = "allow profile rbd"
caps osd = "profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images"

client.glance.keyring
[client.glance]
key = glance
caps mon = "profile rbd"
caps osd = "profile rbd pool=volumes, profile rbd pool=images"

client.nova.keyring
[client.nova]
key = redacted
caps mon = "profile rbd"
caps osd = "profile rbd pool=images, profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=backups"

warning

Ensure to have a newline at end of file!

In the next step, the ceph config for the Cluster connection of the OpenStack Services and Ceph has to be configured. This is done via the ceph*.conf files in the OpenStack deployment in the openstack/overlays/workload/config/ directory.

The Ceph fsid used here is already configured in the Ceph deployment in deployments/apps/ceph/overlays/cluster/cluster.yaml or can be retrieved from the live system via kubernetes resources in the rook-ceph namespace kubectl get configmap rook-config-override -o yaml.

$ kubectl get configmap rook-config-override -o yaml
apiVersion: v1
data:
config: |
[global]
fsid = 9cba584a-259a-410c-b3c4-f214ba73b477
osd_pool_default_size = 2
bdev_flock_retry = 20
bluefs_buffered_io = false
kind: ConfigMap
metadata:
annotations:
kubectl.kubernete ...

  • ceph.conf
  • ceph-backup.conf
  • ceph-glance.conf
  • ceph-nova.conf

The following example shows the configuration for the openstack/overlays/workload/config/ceph.conf

[global]
fsid = 9cba584a-259a-410c-b3c4-f214ba73b477
mon initial members = a b c
mon host = [rook-ceph-mon-a.rook-ceph],[rook-ceph-mon-b.rook-ceph],[rook-ceph-mon-c.rook-ceph]
osd_pool_default_size = 2
bdev_flock_retry = 20
bluefs_buffered_io = false

[client.cinder]
keyring = /etc/ceph/client.cinder.keyring

[client.cinder-backup]
keyring = /etc/ceph/client.cinder-backup.keyring

[client.glance]
keyring = /etc/ceph/client.glance.keyring

[client.nova]
keyring = /etc/ceph/client.nova.keyring

For OpenStack itself, there must be a RBD_SECRET_UUID to be able to access the ceph secrets by the acompanying services. The value for the Secret can be generated with the uuidgen program. The following services has to be configured with the same generated uuid within the OpenStack Deployment as Environment variable.

  • cinder_volume.yaml
  • libvirt_compute.yaml
  • nova_compute.yaml

Additional services like the nova-android-emulater does also use this secret.

  • nova_compute_android_emulator.yaml

  • Use Glance to store your cloud images.

warning

Raw image format recommended to use with ceph/openstack for direct copy

  • Use volumes to boot VMs, or to attach volumes to running VMs via cinder

Rook Ceph toolbox

Rook toolbox is a container with ceph cli and common tools used for rook debugging and testing.

Ceph toolbox

$ kubectl exec -it rook-ceph-tools-6c9d58dbf-7k6z4 -- ceph status
cluster:
id: 7694ab0c-b215-4f07-a6c7-5cb7abfbcf05
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 2w)
mgr: a(active, since 2w)
osd: 2 osds: 2 up (since 2w), 2 in (since 2w)
data:
pools: 5 pools, 129 pgs
objects: 1.32k objects, 1.7 GiB
usage: 569 MiB used, 1.3 TiB / 1.3 TiB avail
pgs: 129 active+clean

Upgrade Ceph Cluster

caution

WARNING: Upgrading a Rook cluster is not without risk. There can always be unexpeted issues that may damage the integrity of the cluster. This can also include data loss.

Clusters under upgrade can be unavailable for a short time during the upgrade process.

Please read the official upgrade docs from the Rook website before performing an upgrade.

Check the upgrade guides in https://rook.io/docs/rook/v1.10/Upgrade/rook-upgrade/ and https://rook.io/docs/rook/v1.10/Upgrade/ceph-upgrade/ with the respective versions.

  1. Upgrade manifests in overlays/operator
    • upgrade operator and cluster resources with the according tag. and check with previous version for custom adjustments. They should be kept.
    • Update all common resources and crds manifests from the respective version.
  2. Adjust and update the manifests included in the overlays/cluster directory. e.g. toolbox.yaml and the ceph version version in cluster.yml.
  3. Update the rook operator deployment. The largest portion of the upgrade is triggered when the operator's image is updated to v1.10.x. When the operator is updated, it will proceed to update all of the Ceph daemons.
  4. Wait for the upgrade to complete check component versions:
    export ROOK_CLUSTER_NAMESPACE=rook-ceph
    watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

    kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

Tear down ceph cluster:

caution

This completely wipes all data in your ceph cluster.

  1. delete the openstack overlay deployment overlays/openstack
  2. delete remaining cephblockpools if existing
    kubectl get cephblockpool

    kubectl delete -n rook-ceph cephblockpool poolname
  3. Delete ceph cluster CRD
    kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'
  4. Delete ceph cluster
    kubectl -n rook-ceph delete cephcluster rook-ceph
    kubectl -n rook-ceph get cephcluster
  5. Delete state directory on all storagenodes
    rm -rf /var/lib/rook/*
  6. Delete the operator deployment from overlays/operator
  7. Wipe data on disks
    DISK="/dev/sdX"
    sgdisk --zap-all $DISK
  8. Alternative to 7) Use use wipefs -a /dev/diskpartition on a specific partition. (use an elevated alpine container via podman to execute this on the affected physical storage node ) Then use gdisk -l /dev/diskpartition to verify if the partition table scan show not persent for every entry.