In first part of the series we have learned how to properly deploy OpenStack Ironic with Red Hat OpenStack Platform 13 (OSP13). Today we will take our deployment to the next level and enable Inspection and Autodiscovery for baremetal nodes in overcloud.
Please note OpenStack Inspector is not officially supported in OSP13 mostly due to limitation of not being capable to run in HA fashion. The limitation has been addressed and is planned to be part of the product in future release. In a mean time Red Hat requires to create support exception for the feature and also user must accept the limitation of only being able to use this component on a single node or composable role.
With that out of the way, let’s talk about benefits of OpenStack Inspector:
- ability to automatically add new nodes to Ironic by simply connecting them to provisioning network and powering them on
- inspecting nodes for it’s capabilities and later using them for targeting specific nodes (example node has GPU or node runs on specific model of the CPU or specific bios version)
- zero touch provisioning (additional components such as CMP – Cloudforms/ManageIQ would be helpful to orchestrate the process)
- streamlined baremetal consumption
Before we jump into configuration lets look at this short 11 minutes video demoing working solution. In this demo with a single “power on” command we will autodiscover 7 heterogeneous baremetal nodes and automatically assign various capabilities based on type of hardware discovered. Finally we deploy RHEL7, ubuntu and Windows2016 in UEFI mode and verify it’s functionality.
Ok, so how do we get there? First make sure to follow the instructions from the Part 1 of this blog. If you have that up and running, let’s delete your deployment and make few changes to the configuration.
I. Ironic Inspector Overcloud Configuration
As mentioned earlier Ironic Inspector in the current release can only be deployed on a single node or composable role. If you remember from my Lab architecture I only use a single monolithic controller and with that I don’t need to worry about breaking controllers into 2 kinds – controller with Inspector and without. I will only modify my role to include Inspector.
1. Edit roles_data.yaml file and ensure Ironic inspector becomes part of the deployment
(undercloud) [stack@undercloud ~]$ cat templates/roles_data.yaml
###############################################################################
# File generated by TripleO
###############################################################################
###############################################################################
# Role: Controller #
###############################################################################
– name: Controller
description: |
Controller role that has all the controler services loaded and handles
Database, Messaging and Network functions.
CountDefault: 1
tags:
– primary
– controller
networks:
– External
– InternalApi
– Storage
– StorageMgmt
– Tenant
– CustomBM
HostnameFormatDefault: ‘%stackname%-controller-%index%’
# Deprecated & backward-compatible values (FIXME: Make parameters consistent)
# Set uses_deprecated_params to True if any deprecated params are used.
uses_deprecated_params: True
deprecated_param_extraconfig: ‘controllerExtraConfig’
deprecated_param_flavor: ‘OvercloudControlFlavor’
deprecated_param_image: ‘controllerImage’
ServicesDefault:
– OS::TripleO::Services::Aide
– OS::TripleO::Services::AodhApi
– OS::TripleO::Services::AodhEvaluator
…
– OS::TripleO::Services::IronicApi
– OS::TripleO::Services::IronicConductor
– OS::TripleO::Services::IronicPxe
– OS::TripleO::Services::IronicInspector
– OS::TripleO::Services::Iscsid
– OS::TripleO::Services::Keepalived
– OS::TripleO::Services::Kernel
– OS::TripleO::Services::Keystone
– OS::TripleO::Services::LoginDefs
…
Please note that for typical 3 controller nodes, we would have to create another new role that would be a copy of the standard controller role. We would leave original role unchanged and deploy 2 standard controllers with that. The third controller node would use a modified controller role that includes – – OS::TripleO::Services::IronicInspector
2. Find out the latest version of Inspector container and include it in your docker registry.
OSP13 (Queens) has a capabilities of containerizing all the services and this includes Ironic Inspector. Since Ironic Inspector has not been officially supported in OSP13, we need to manually find out the version of IronicInspector kolla container and include it in your docker registry.
At the time of writing this blog I found out the latest version of Inspector container by searching for it here -> https://access.redhat.com/containers/ . For your ease of searching I am also including direct link here -> https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-ironic-inspector
In my lab I am using undercloud to become my local docker registry and I simply added following line to my local_registry_images.yaml
(undercloud) [stack@undercloud ~]$ cat local_registry_images.yaml
container_images:
– imagename: registry.access.redhat.com/rhosp13/openstack-aodh-api:13.0-47
push_destination: 172.31.0.10:8787
– imagename: registry.access.redhat.com/rhosp13/openstack-aodh-evaluator:13.0-46
push_destination: 172.31.0.10:8787
…
– imagename: registry.access.redhat.com/rhosp13/openstack-ironic-inspector:13.0-46
push_destination: 172.31.0.10:8787
…
and re-synched local registry
My overcloud_images.yaml used for pointing the overcloud to my undercloud as a local docker registry has also been updated to include Ironic Inspector
(undercloud) [stack@undercloud ~]$ cat templates/overcloud_images.yaml
# Generated with the following on 2018-08-07T22:09:36.643279
#
# openstack overcloud container image prepare –namespace=registry.access.redhat.com/rhosp13 –push-destination=172.31.0.10:8787 –prefix=openstack- –tag-from-label {version}-{release} -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic-inspector.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/collectd.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/fluentd-client.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/octavia.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/manila.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/sensu-client.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml –set ceph_namespace=registry.access.redhat.com/rhceph –set ceph_image=rhceph-3-rhel7 –output-env-file=/home/stack/templates/overcloud_images.yaml –output-images-file /home/stack/local_registry_images.yaml
#
parameter_defaults:
DockerAodhApiImage: 172.31.0.10:8787/rhosp13/openstack-aodh-api:13.0-47
DockerAodhConfigImage: 172.31.0.10:8787/rhosp13/openstack-aodh-api:13.0-47
…
DockerInsecureRegistryAddress:
– 172.31.0.10:8787
DockerIronicApiConfigImage: 172.31.0.10:8787/rhosp13/openstack-ironic-api:13.0-46
DockerIronicInspectorImage: 172.31.0.10:8787/rhosp13/openstack-ironic-inspector:13.0-46
DockerIronicApiImage: 172.31.0.10:8787/rhosp13/openstack-ironic-api:13.0-46
DockerIronicConductorImage: 172.31.0.10:8787/rhosp13/openstack-ironic-conductor:13.0-44
DockerIronicConfigImage: 172.31.0.10:8787/rhosp13/openstack-ironic-pxe:13.0-40
DockerIronicPxeImage: 172.31.0.10:8787/rhosp13/openstack-ironic-pxe:13.0-40
DockerIscsidConfigImage: 172.31.0.10:8787/rhosp13/openstack-iscsid:13.0-45
…
3. Define Introspection IP pool and networks, enable autodiscovery. Also ensure Inspector collects all the extra data about your hardware. We will need that later when defining rules for Inspection.
(undercloud) [stack@undercloud ~]$ cat templates/ExtraConfig.yaml
parameter_defaults:
CustomBMVirtualFixedIPs: [{‘ip_address’:’172.31.10.14′}]
NeutronEnableIsolatedMetadata: ‘True’
NovaSchedulerDefaultFilters:
– RetryFilter
– AggregateInstanceExtraSpecsFilter
– AggregateMultiTenancyIsolation
– AvailabilityZoneFilter
– RamFilter
– DiskFilter
– ComputeFilter
– ComputeCapabilitiesFilter
– ImagePropertiesFilter
IronicCleaningDiskErase: metadata
IronicIPXEEnabled: true
IronicCleaningNetwork: baremetal
IronicInspectorIpRange: ‘172.31.10.50,172.31.10.69’
IronicInspectorInterface: vlan320
IronicInspectorEnableNodeDiscovery: true
IronicInspectorCollectors: default,extra-hardware,numa-topology,logs
ServiceNetMap:
IronicApiNetwork: custombm
IronicNetwork: custombm
IronicInspectorNetwork: custombm
4. Ensure default Ironic Inspector yaml file is present in your heat template directory
(undercloud) [stack@undercloud ~]$ ls /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic-inspector.yaml
/usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic-inspector.yaml
5. Edit your deployment script and add everything you’ve defined in it.
time openstack overcloud deploy –templates –stack chrisj \
…
-r /home/stack/templates/roles_data.yaml \
-n /home/stack/templates/network_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic-inspector.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /home/stack/templates/ExtraConfig.yaml \
…
6. Deploy with new configuration and ensure Ironic Inspector container is getting created on desired controller node:
[root@chrisj-controller-0 ~]# docker ps | grep inspector
cdc828fe73a5 172.31.0.10:8787/rhosp13/openstack-ironic-inspector:13.0-46 “kolla_start” 13 days ago Up 8 days ironic_inspector_dnsmasq
aca8dd04f53f 172.31.0.10:8787/rhosp13/openstack-ironic-inspector:13.0-46 “kolla_start” 13 days ago Up 8 days ironic_inspector
7. After deployment is done, the agent.kernel and agent.ramdisk needs to be copied to Inspector controller node.
(undercloud) [stack@undercloud ~]$ scp /httpboot/agent.* heat-admin@172.31.0.27:~/
(undercloud) [stack@undercloud ~]$ ssh heat-admin@172.31.0.27
[heat-admin@chrisj-controller-0 ~]$ sudo -i
[root@chrisj-controller-0 ~]# cp /home/heat-admin/agent.* /var/lib/ironic/httpboot/
[root@chrisj-controller-0 ~]# ls -al /var/lib/ironic/httpboot/
total 421844
drwxr-xr-x. 2 42422 42422 86 Sep 26 00:55 .
drwxr-xr-x. 4 42422 42422 38 Sep 25 22:13 ..
-rwxr-xr-x. 1 root root 6398256 Sep 26 00:55 agent.kernel
-rw-r–r–. 1 root root 425555837 Sep 26 00:55 agent.ramdisk
-rw-r–r–. 1 42422 42422 758 Sep 25 22:56 boot.ipxe
-rw-r–r–. 1 42422 42422 470 Sep 25 22:15 inspector.ipxe
[root@chrisj-controller-0 ~]# chown 42422:42422 /var/lib/ironic/httpboot/agent.*
II. Defining Inspector and Autodiscovery rules
Even though it is always easier to work with the same type of hardware vendor accross your deployments, it is not unusual for my clients to take advantage of different type of hardware in the same baremetal pool. For example there might be different brand of CPU that work better for different workload or maybe some systems are pre-build with powerful GPU and need to be used for tasks such as AI or ML where others are just a general purpose nodes. Maybe we have a pool of nodes with ultra fast nvme drives that are dedicated for deploying databases.
Whatever the use case is, Ironic Inspector is capable of identifying these different specs and assigning them to your nodes (so you have more control over which node gets deployed for what type of workload). The same process is also used for specifying different ipmi power control credentials based on different hardware vendors. Here is great upstream documentation describing this process at the basic level.
And this is what I have used in a demo
1. Create Inspection rule for ipmi user, password and provisioning mac address
In my lab I am using 4 different type of nodes, but at the higher level I was able to qualify them by vendor. It’s either supermicro and my default ipmi user and password is ADMIN/ADMIN .. or it’s not supermicro (it’s asrock) and it this case ipmi user and password is admin/admin.
I have create following rule to capture different hardware:
(undercloud) [stack@undercloud ~]$ cat supermicro_asrock_rules.json
[
{
“description”: “Set IPMI credentials for supermicro”,
“conditions”: [
{“op”: “eq”, “field”: “data://auto_discovered”, “value”: true},
{“op”: “eq”, “field”: “data://inventory.system_vendor.manufacturer”,
“value”: “Supermicro”}
],
“actions”: [
{“action”: “set-attribute”, “path”: “driver_info/ipmi_username”,
“value”: “ADMIN”},
{“action”: “set-attribute”, “path”: “driver_info/ipmi_password”,
“value”: “ADMIN”},
{“action”: “set-attribute”, “path”: “driver_info/ipmi_address”,
“value”: “{data[inventory][bmc_address]}”}
]
},
{
“description”: “Set IPMI credentials for Everything else”,
“conditions”: [
{“op”: “eq”, “field”: “data://auto_discovered”, “value”: true},
{“op”: “ne”, “field”: “data://inventory.system_vendor.manufacturer”,
“value”: “Supermicro”}
],
“actions”: [
{“action”: “set-attribute”, “path”: “driver_info/ipmi_username”,
“value”: “admin”},
{“action”: “set-attribute”, “path”: “driver_info/ipmi_password”,
“value”: “admin”},
{“action”: “set-attribute”, “path”: “driver_info/ipmi_address”,
“value”: “{data[inventory][bmc_address]}”}
]
}
]
Note: the first section describe condition and it will run anytime autodiscovery runs and the manufacturer is “Supermicro” … the second section is almost identical, except it runs if manufacturer is “ne” – not equal “Supermicro”
2. If you haven’t done it already upload ramdisk and kernel files that will be used for deployment and cleaning of the nodes. Record the image ID for the next step.
(chrisj) [stack@undercloud ~]$ source ~/chrisjrc (my overcloud rc file)
(chrisj) [stack@undercloud ~]$ openstack image create –public –container-format aki –disk-format aki –file ~/images/ironic-python-agent.kernel deploy-kernel
+——————+————————————–+
| Field | Value |
+——————+————————————–+
| checksum | 3fa68970990a6ce72fbc6ebef4363f68 |
| container_format | aki |
| created_at | 2017-01-18T14:47:58.000000 |
| deleted | False |
| deleted_at | None |
| disk_format | aki |
| id | 93d2341b-7e28-4cf4-bd61-c280aa3cc909 |
| is_public | True |
| min_disk | 0 |
| min_ram | 0 |
| name | deploy-kernel |
| owner | c600f2a2bea84459b6640267701f2268 |
| properties | |
| protected | False |
| size | 5390944 |
| status | active |
| updated_at | 2017-01-18T14:48:02.000000 |
| virtual_size | None |
+——————+————————————–+
(chrisj) [stack@undercloud ~]$ openstack image create –public –container-format ari –disk-format ari –file ~/images/ironic-python-agent.initramfs deploy-ramdisk
+——————+————————————–+
| Field | Value |
+——————+————————————–+
| checksum | b4321200478252588cb6e9095f363a54 |
| container_format | ari |
| created_at | 2017-01-18T14:48:18.000000 |
| deleted | False |
| deleted_at | None |
| disk_format | ari |
| id | b6163b1a-308d-47fb-9600-89be2c44f8df |
| is_public | True |
| min_disk | 0 |
| min_ram | 0 |
| name | deploy-ramdisk |
| owner | c600f2a2bea84459b6640267701f2268 |
| properties | |
| protected | False |
| size | 363726791 |
| status | active |
| updated_at | 2017-01-18T14:48:23.000000 |
| virtual_size | None |
+——————+————————————–+
3. Define Inspector rules to set all the extra capabilities you need for better hardware selection. Set up your unique and informative barametal names, assign the deployment and cleaning ramdisk/kernel or set pxeboot to use uefi rather then legacy boot. There are just few examples I have used in this next rule:
(undercloud) [stack@undercloud ~]$ cat overcloud_set_capabilities.json
[
{
“description”: “set basic capabilities, names, ramdisks and bootmode”,
“conditions”: [
{“op”: “eq”, “field”: “data://auto_discovered”, “value”: true}
],
“actions”: [
{“action”: “set-capability”, “name”: “cpu_model”, “value”: “{data[inventory][cpu][model_name]}”},
{“action”: “set-capability”, “name”: “manufacturer”, “value”: “{data[inventory][system_vendor][manufacturer]}”},
{“action”: “set-capability”, “name”: “bios_version”, “value”: “{data[extra][firmware][bios][version]}”},
{“action”: “set-capability”, “name”: “boot_mode”, “value”: “uefi”},
{“action”: “set-capability”, “name”: “boot_option”, “value”: “local”},
{“action”: “set-attribute”, “path”: “name”, “value”: “{data[extra][system][motherboard][vendor]}-{data[extra][system][motherboard][name]:.9}-{data[inventory][bmc_address]}”},
{“action”: “set-attribute”, “path”: “driver_info/deploy_kernel”, “value”: “93d2341b-7e28-4cf4-bd61-c280aa3cc909″},
{“action”: “set-attribute”, “path”: “driver_info/deploy_ramdisk”, “value”: “b6163b1a-308d-47fb-9600-89be2c44f8df“}
]
}
]
Note: reference the posted video above to see how the following rule affects the naming convention of the nodes and their parameters.
4. Import just created rules into Inspector.
(undercloud) [stack@undercloud ~]$ openstack baremetal introspection rule import supermicro_asrock_rules.json
(undercloud) [stack@undercloud ~]$ openstack baremetal introspection rule import overcloud_set_capabilities.json
III. Autodiscovery and deployment
We should now have an environment that looks just like the one presented in demo above. Let’s initiate the discovery.
1. First ensure Ironic doesn’t know anything about the nodes that we are planning to add to the environment:
(undercloud) [stack@undercloud ~]$ source chrisjrc (this is my overcloud rc file)
(chrisj) [stack@undercloud ~]$ openstack baremetal node list
Note: this should report as empty list
2. Check the power status of the nodes.
ipmitool -I lanplus -U <user> -P <password> -H <host_name_or_ip> power status
I currently use 7 baremetal nodes with 2 different sets of usernames and password. Looping the command above to get the status for all of the nodes:
(chrisj) [stack@undercloud ~]$ for i in 21 24 25 34 ; do ipmitool -I lanplus -U ADMIN -P ADMIN -H 172.31.9.$i power status; done
Chassis Power is off
Chassis Power is off
Chassis Power is off
Chassis Power is off
(chrisj) [stack@undercloud ~]$ for i in {31..33} ; do ipmitool -I lanplus -U admin -P admin -H 172.31.9.$i power status; done
Chassis Power is off
Chassis Power is off
Chassis Power is off
3. Power all the nodes on and wait for the discovery to complete
ipmitool -I lanplus -U <user> -P <password> -H <host_name_or_ip> power on
(chrisj) [stack@undercloud ~]$ for i in 21 24 25 34 ; do ipmitool -I lanplus -U ADMIN -P ADMIN -H 172.31.9.$i power on; done
(chrisj) [stack@undercloud ~]$ for i in {31..33} ; do ipmitool -I lanplus -U admin -P admin -H 172.31.9.$i power on; done
4. Monitor the discovery from KVM console as well as ironic. After few minutes all the nodes should come back as follows:
(chrisj) [stack@undercloud ~]$ openstack baremetal node list
+————————————–+———————————-+—————+————-+——————–+————-+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+————————————–+———————————-+—————+————-+——————–+————-+
| 476ca23a-46c8-4a93-8901-1844a7c64851 | Supermicro-X10SDV-4C-172.31.9.21 | None | power off | enroll | False |
| de92ec0d-34a5-4ccd-97bc-7d2398887e88 | ASRock-J1900D2Y-172.31.9.33 | None | power off | enroll | False |
| c92ff0b6-7759-4490-ac9d-274825d2b245 | ASRock-J1900D2Y-172.31.9.32 | None | power off | enroll | False |
| 02c9d83f-8ba7-40da-8bdc-cde1a93db84f | ASRock-J1900D2Y-172.31.9.31 | None | power off | enroll | False |
| 4002d68d-6472-49fb-92d6-bd35b96f642e | Supermicro-A1SAi-172.31.9.34 | None | power off | enroll | False |
| 755085f2-1168-4dfe-b3c2-c553dd468110 | Supermicro-X10SDV-6C-172.31.9.25 | None | power off | enroll | False |
| 8b99044d-f99b-400b-bf92-b25561e98b06 | Supermicro-X10SDV-6C-172.31.9.24 | None | power off | enroll | False |
+————————————–+———————————-+—————+————-+——————–+————-+
Note: I have used a unique names by combining manufacturer, model name and ipmi ip address.
5. Set up a resource class to ‘baremetal’ (this is new starting in Pike), and then take ownership of the power controll of the nodes and clean them up
(chrisj) [stack@undercloud ~]$ for NODE in `openstack baremetal node list -c UUID -f value` ; do openstack baremetal node set $NODE –resource-class baremetal ; done
(chrisj) [stack@undercloud ~]$ for NODE in `openstack baremetal node list -c UUID -f value` ; do openstack baremetal node manage $NODE ; done
(chrisj) [stack@undercloud ~]$ for NODE in `openstack baremetal node list -c UUID -f value` ; do openstack baremetal node provide $NODE ; done
It will take another couple minutes for them to clean up. After that the status should show available to all:
(chrisj) [stack@undercloud ~]$ openstack baremetal node list
+————————————–+———————————-+—————+————-+——————–+————-+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+————————————–+———————————-+—————+————-+——————–+————-+
| 476ca23a-46c8-4a93-8901-1844a7c64851 | Supermicro-X10SDV-4C-172.31.9.21 | None | power off | available | False |
| de92ec0d-34a5-4ccd-97bc-7d2398887e88 | ASRock-J1900D2Y-172.31.9.33 | None | power off | available | False |
| c92ff0b6-7759-4490-ac9d-274825d2b245 | ASRock-J1900D2Y-172.31.9.32 | None | power off | available | False |
| 02c9d83f-8ba7-40da-8bdc-cde1a93db84f | ASRock-J1900D2Y-172.31.9.31 | None | power off | available | False |
| 4002d68d-6472-49fb-92d6-bd35b96f642e | Supermicro-A1SAi-172.31.9.34 | None | power off | available | False |
| 755085f2-1168-4dfe-b3c2-c553dd468110 | Supermicro-X10SDV-6C-172.31.9.25 | None | power off | available | False |
| 8b99044d-f99b-400b-bf92-b25561e98b06 | Supermicro-X10SDV-6C-172.31.9.24 | None | power off | available | False |
+————————————–+———————————-+—————+————-+——————–+————-+
6. Follow the rest of the steps from previous blog post in section IV. Deploying Operating System to Baremetal
You should now be able to deploy multiple images on your baremetal
(chrisj) [stack@undercloud ~]$ openstack server list
+————————————–+————+——–+————————+————————+———–+
| ID | Name | Status | Networks | Image | Flavor |
+————————————–+————+——–+————————+————————+———–+
| f19d2529-40da-4323-b092-18ccc10ee09f | windows-bm | BUILD | baremetal=172.31.10.75 | windows2016-uefi | baremetal |
| e539af0c-cfa4-44ed-bd5f-4c8bddf2263f | ubuntu-bm | BUILD | baremetal=172.31.10.77 | ubuntu-1804-uefi-ready | baremetal |
| a1937083-35df-40c5-84a4-4008428010dd | rhel-bm | BUILD | baremetal=172.31.10.78 | rhel7-uefi | baremetal |
+————————————–+————+——–+————————+————————+———–+
This concludes part 2 of the blog. Even though the process described here focuses on baremetal autodiscovery in overcloud, please be aware that undercloud comes with this functionality out-of-the box starting in OSP12.
Stay tuned for next blogs describing baremetal multi-tenancy and process for building uefi and legacy images.