Openstack telemetry: Ceilometer, Gnocchi and Aodh

In Openstack Ceilometer is the component that gathers data from the cloud and pre-processes it. It distinguishes between samples (CPU time) and events (creation of an instance). Resources, Meters and Samples are fundamental concepts in Ceilometer.

Samples are retrieved at regular intervals and if Ceilometar fails to get the sample it can be estimated by interpolation. Events are retrieved as they happen and cannot be estimated.

Ceilometer sends events to the a storage service, while samples are sent to a service named Gnocchi, which is optimized to handle large amount of time-series data.

Aodh gets measures from Gnocchi, checks whether certain conditions are met and triggers actions. This is the foundation for application auto-scaling.

Other uses for Gnocchi data are monitoring the health of the cloud and billing.

Ceilometer has three ways retrieving samples and events:

  • Services may voluntarily provide them by sending Ceilometer notification via Openstack’s messaging system. This is preferred way since it is based on internal knowledge that the service has about it’s resources and it is fast without much overhead and stress on the systems.
  • Ceilometer actively retrieves data via APIs which is a costly method for billing and alarming.
  • Ceilometer can get data by accessing sub-components of services such as the hypervisor that run the instances.

Second and third method are referred also as methods where Ceilometer “polls” the samples.

More details on Openstack telemetry can be found on this link:
https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html

While Ceilometer has resources, meters and samples Gnocchi has resources, metrics and measures. Gnocchi resource corresponds to Ceilometer resource. Metric is roughly equivalent to a meter in Ceilometer. Gnocchi does not store every metric value it receives from Ceilomter, but rather it combines values and stores the results at regular intervals according to Archive policy.

Listing gnocchi resources, metrics, measures:
Resources:
gnocchi resource list
gnocchi resource show UUID
Metrics:
gnocchi metric list*
gnocchi metric show cpu --resource UUID
Measures:
gnocchi measures show cpu --resource UUID --start YYYY-MM-DDTHH:MM:SS+00:00
* Output will be empty for non-admin users.
Listing resources, metrics, measures with openstack client:
Resources:
openstack metric resource list
openstack metric resource show UUID
Metrics:
openstack metric metric list*
openstack metric metric show cpu --resource UUID
Measures:
openstack metric measures show cpu --resource UUID --start YYYY-MM-DDTHH:MM:SS+00:00
* Output will be empty for non-admin users.
Aggregation:
Server grouping:
openstack server create --property metering.server_group=Mail*
Metrics aggregation:
gnocchi measures aggregation --query server_group=Mail --resource-type=instance --aggregation mean -m cpu_util
* For gnocchi all servers with ‘–property metering.server_group=Mail’ can be considered tagged.
Listing Ceilometer Events:
Event types are defined in a YAML type:
/etc/ceilometer/event_definitions.yaml
List event types, events and event details:
ceilometer event-type-list
ceilometer event-list
ceilometer event-show EVENT_ID
* For gnocchi all servers with ‘–property metering.server_group=Mail’ can be considered tagged.
** There is no option in horizon GUI to view events or statictics, but gnocchi visualisation can be provided by Grafana.
Example generating CPU and disk load and showing gnocchi measures:
Create two instances:
openstack server create --image cirros-image --flavor 1 --nic net-id=... --user-data cpu.sh cpu-user*
openstack server create --image cirros-image --flavor 1 --nic net-id=... --user-data disk.sh disk-user*
Let the instances finish creating and leave them running for a while:
openstack server list
Show cpu usage measures from cpu-user server:
gnocchi measures show cpu.delta --resource-id SERVER_UUID
795 gnocchi measures show cpu_util --resource-id SERVER_UUID
Show disk usage measures from disk-user server:
gnocchi measures show disk.read.requests --resource-id SERVER_UUID
gnocchi measures show disk.read.requests.rate --resource-id SERVER_UUID
* Files cpu.sh and disk.sh are your bash scripts to generate CPU load and disk load.
** Don’t forget to stop cpu-server and disk-server after you have finished, since they will continue to generate CPU and disk load.
Alarms:
An alarm has:
Type
Condition: depends on type
Evaluation window
State: OK/Alarm/Insufficient Data
Actions for state transitions
Condition example:
mean cpu_util > 60
all resources tagged server_group=Mail
Single Resource Threshold Alarm:
openstack alarm create --name cpuhigh \
--type gnocchi_resources_threshold \
--aggregation-method mean --metric cpu_util \
--comparison-operator gt --threshold 30 \
--resource-type instance \
--resource-id INSTANCE_UUID \
--granularity 60 --evaluation-periods 2 \
--alarm-action http://127.0.0.1:1234 \
--ok-action http://127.0.0.1:1234
Alarm based on resource aggregates:
openstack alarm create --name cpuhigh \
--type gnocchi_aggregation_by_resources_threshold \
--aggregation-method mean --metric cpu_util \
--comparison-operator gt --threshold 30 \
--resource-type instance \
--query '{ "=": { "server_group" : "Mail" }}' \
--granularity 60 --evaluation-periods 2 \
--alarm-action http://127.0.0.1:1234 \
--ok-action http://127.0.0.1:1234
Alarm commands:
openstack alarm list
openstack alarm show ALARM_ID
openstack alarm-history show ALARM_ID
openstack alarm state get ALARM_ID
openstack alarm update ALARM_ID ...