1. Keep it simple
Don’t add unnecessary complexity to your deployment. Try to keep it as close to the reference architecture as possible. This includes:
-
adding functionality you might use in the future (this is like buying a truck for towing because one day you might get a boat) - example - sahara, manila etc. - if you don’t need that extra functionality right now, do not add it just because you can
-
removing default features that come with a vanilla deployment
-
extensively modifying config files without validated justification - example - collapsing networks just because you don’t want to create separate vlans - create the damn vlans!
Extra features add unnecessary load to your controllers, computes and storage nodes. You can always add them later when you need them. The fewer variances you introduce the better off you are in finding someone else out there running the same config and testing the same configuration. Don’t be a snowflake because snowflakes are difficult to maintain.
2. Minimize the number of custom configs - keep originals in their original location
Customization capabilities is one of the biggest advantages when it comes to mind when deploying OpenStack with OSP Director/TripleO. With that in mind, don’t copy every single yaml file out of the templates directory that you are planning to use as an extension to the openstack overcloud deploy command. Example: I almost never change content of the default network-isolation.yaml file, but I deploy it almost every time. Instead of making a copy of the file to a local ~/templates directory, just keep it in original location under /usr/share/openstack-tripleo-heat-templates/ and include it from there.
With each major upgrade, and in some cases even in minor new releases, of OSP Director/TripleO there are a lot of changes being made to the default templates. The fewer templates that you have to manage the better. You will thank me after the next upgrade.
3. Use Ansible
Starting with OSP10 (Newton), undercloud ships with ansible pre-installed. It is there, so take advantage of it. Simply update your inventory file with overcloud nodes and you will be able to make ad-hoc queries or changes to all of the nodes on the fly, in seconds. It is so easy I’ll do it for you. On director as the stack user execute the following:
Definitely get yourself familiar with modules like: copy, lineinfile or just simple shell. In few minutes you will become an Ansible ninja. Here’s a quick link to the Ansible module documentation:
http://docs.ansible.com/ansible/latest/modules_by_category.html
4. Version Control your config files
Making any changes to your OpenStack deployment involves making changes to ultimately text files. User errors are unavoidable and it’s really easy to “fat finger” an extra character to your config files. Additionally, YAML files are hypersensitive to spacing. Use a version control software of your choice to track changes of your custom templates inside local /home/stack/templates. It’s also a good practice to push them out to external repository in the event of a disaster!
5. Make your overcloud deployments and updates consistent
The easiest way to “nuke” your overcloud deployment is by updating heat stack using an inconsistent deploy command. Do yourself a favor and create a deploy.sh file in the /home/stack directory to be used to execute your overcloud deploy command.
Don’t be lazy though, make it nice and neat. Here is my example:
You could also mirror location of customized yaml files from /usr/openstack-tripleo-heat-templates under ~stack/templates. For instance, put network-environment.yaml in ~stack/templates/environments so it’s easier to find where the original came from.
6. Inject a custom password into the overcloud image
Deploying OpenStack with Director sometimes might feel like a trial and error approach. It is unusual to get a perfectly working deployment on the first try. Especially if you add multiple customizations. The easiest way I have found to troubleshoot failed deployment is to get on the failing overcloud node and crawl through the os-collect-config log:
Example: $ journalctl -u os-collect-config
Unfortunately if the deployment failed before the networking configuration part completed, you might not be able to ssh to the node from undercloud as the heat-admin user.
That leaves you with the age old method of getting access via ipmi/kvm console. In order to do that, the root password needs to be specified. Adding the password to the root account is easy with virt-customize:
There are 2 pre-requisites for virt-customize to work:
-
Install libguesfs-tools
-
Do it before uploading the images to glance on the undercloud
7. Configure bash-completion with OpenStack
Remembering all of the OpenStack commands is a ‘mission impossible’ task, so why not make your life easier and configure command auto-completion. It’s really simple.
On you undercloud, install bash-completion:
$ yum -y install bash-completion
By doing just that you will get access to nova, glance, cinder and some more old school openstack commands. Generate the bash-completion script for the new OpenStackCli.
$ openstack complete > /etc/bash_completion.d/os_complete
Re-authenticate to stack and you are all set.
Note: Unfortunately the openstack auto-complete got messed up with a recent python-openstackclient update. I have opened a bug to get it fixed which can be tracked here:
https://bugzilla.redhat.com/show_bug.cgi?id=1458824
In the mean time feel free to use this template instead:
8. Validate before and after deployments
Again, TripleO uses a lot of manually configured yaml files for a deployment. Not only do you have to be careful to not “fat finger” a character, but you also need to make sure everything is properly indented, you are not missing a dash for a new section and most importantly you have configured things correctly.
There are built-in tools that can help with the validation.
$ openstack baremetal instackenv validate --file ./instackenv.json
This validation will not only validate that you have not butchered the json file, but it will also reach out to the physical nodes via the IPMI interface and make sure you can control the power status of the nodes
The networking files are the ones you will make the most changes to. The command above validates the syntax of not just the network-environment.yaml file where you keep all of the vlans and ip pools, it will also drill down to the individual nic-config files for compute, controllers, storage and other composable roles network files.
There is a whole slew of ansible validations that come with your undercloud. Explore them all by crawling through /usr/share/openstack-tripleo-validations/validations/ directory.
You can also execute them directly with:
$ ansible-playbook (validation-file).yaml
9. Something Something .. Networking
Things will fail, especially if you are just getting into the OSP Director/TripleO groove. It’s a combination of learning curve and the complexity of the solution. You have to deal with hardware, storage, compute, networking and more. Sometimes the error is not straightforward and won’t give you a good indication of where to dig for troubleshooting.
I have done probably hundreds of OpenStack deployments over the course of my career and the one thing I learned is .. “if you are not sure what failed .. it is probably networking”
Use simple tools to troubleshoot. Start with a simple ping:
If you are taking advantage of MTU jumbo frames, make sure you ping with ‘-s 8000’ and optionally the ‘-M do’ parameters
Ssh to your overcloud nodes
$ cat /etc/hosts
Ping all of the nodes on all of the vlans
Ping your gateway
Ping google (if you have external connectivity to the internet)
Look up your routing table:
$ ip route
Check your physical interface assignment to ovs bridges with:
$ ovs-vsctl show
Create a simple tagged interface on the bridge and assign an IP address to it to make sure you are getting out without involving too many variables.
On problematic nodes, check out the networking configuration file pushed by Director/TripleO - /etc/os-net-config/config.json
These are the most simple tools that will help you resolve 80% of networking issues you might encounter. For the rest of them, get yourself familiar with tools such as tcpdump, nmap and netstat
Credits:
Thanks Darin Sorrentino and Ruchika Kharwar for extra ideas and edits!
There is 1 Comment
Super Nice.. !! Handy to have
Super Nice.. !! Handy to have this when working with director/openstack for sure !!