Thursday, December 8, 2016

Hiera hierarchies and the custom facts everyone needs

It's been awhile since I've updated this blog, but my tech journey has never ended.  For the last two years I've been working as a Senior Technical Solutions Engineer for Puppet covering the Silicon Valley and advocating for DevOps and IT Automation.  I also haven't been absent from blogging in that time, I've just been forunate enough to have my posts on the official Puppet Blog.  I have decided however to begin cross posting here to have a single collection of my digital work

Originally published at https://puppet.com/blog/hiera-hierarchies-and-custom-facts-everyone-needs on 8 July 2016

---

NOTE: This article is targeted at versions of Puppet which utilize Hiera 3 or higher and Facter 3 or higher — generally Puppet 4, or Puppet Enterprise 2015.2 and later versions. Some of this information will also apply to older versions.

For the last year and a half, I've been representing Puppet as the technical solutions engineer covering all the accounts headquartered in Silicon Valley. This has been a fantastic opportunity to evangelize configuration management to clients both new and old. One of the areas I've noticed every new Puppet user runs into quite quickly is how to utilize Hiera effectively to manage separation of data from code. On a fresh install of Puppet Enterprise 2015.2 or greater the hierarchy is pretty simple:
:hierarchy:
  - nodes/%{::trusted.certname}
  - common
Basically a scalpel or a shotgun. Not exactly taking advantage of the power of the tool, but for a good reason, as for any additional useful layers, custom facts are required.

On the plus side, Hiera, like nearly all of Puppet, is very customizable and can be tweaked to each individual organization's needs. For new adopters, all of that power can be confusing. I've noticed I nearly always recommend the same ideas, so it seemed only fair to share those as a blog.

Hiera is powerful, but it has some limitations. Notably, there is a functional limit to the number of layers which can be added until performance begins to take a hit, and in Puppet Enterprise versions before 2016.2, only one hierarchy could be used at a time for the entire compile master.

Because of this, it makes sense to focus the hierarchy on generic concepts instead of specifically referencing unique items of the business unit or workflow. The hierarchy which I've recommended the most is along the lines of this:
:hierarchy:
  - nodes/%{::trusted.certname}
  - team/%{::team}
  - application/%{::application}
  - datacenter/%{::datacenter}
  - common
The actual names of each layer can change, but these represent what I find to be generally the most important pieces of metadata about a system, that is:
  • The name of the node
  • Who owns the node
  • What the node does
  • Where the node is
  • Metadata common to all nodes

The first objection I usually hear is something along the lines of, "But we need a layer for X." To that, I challenge clients to consider whether they really need the additional layer, or if it fits into one of the existing layers. The vast majority of the differentiation they want tends to fit into the application layer. Having a ton of different applications with some data overlap is preferable to having too many layers with too little differentiation.

The conversation moves quickly from there to how to create the facts for %{::team}, %{::application} and %{::datacenter}. While there isn't a universal answer for how to create these facts, often one of a few possible methods will solve this problem for the vast majority of organizations.

Ultimately, the goal is to find a way to programmatically determine the answer to, "What is the X for this node?" To do this, we look to the pieces of information that are already a part of or attached to the node. I'll outline these approaches below.

Parse existing fact

Sysadmins have been attaching metadata to servers for a very long time in the form of hostnames. Many organizations still place information such as data center, application and team in the system's name. Facter by default already creates a fact for hostname, so we can parse that existing fact to generate new facts.

These examples are custom Ruby facts; they can be added into any module in the <module>/lib/facter directory as .rb files, and will be copied to all of the nodes via pluginsync and executed. The first example here simply takes the first four characters and turns them into a new fact.
Facter.add(:datacenter) do
  setcode do
    Facter.value(:hostname)[0..3]
  end
end
If we wanted to get everything from the fifth character to the eighth, we could modify the third line as follows:
Facter.value(:hostname)[4..7]
Or the fifth character to the end of the line:
Facter.value(:hostname)[4..-1]
Below is a more complicated example that takes from the sixth character to where there is a -, or to the end of the hostname, whichever comes first:
Facter.value(:hostname)[5..-1][/(.*?)(\-|\z)/,1]

Match value to table

This is ugly, but sometimes it's the best option, particularly when the only way to determine the data center is via IP address. The shortcoming of this approach is that it requires the fact to be updated whenever there is a new potential value. Utilizing a known fact such as the IP network and case statements, we can match up with a value such as data center. Where avoidable, this shouldn't be used just to replace shorthand metadata with full names, as it adds unnecessary complication (such as matching pdx in the hostname and replacing with portland):
Facter.add(:datacenter) do
  setcode do
    network=Facter.value(:network)
    case network
    when '10.0.2.0'
      'portland'
    when '10.0.3.0'
      'sydney'
    when '192.168.0.0'
      'home'
    else
      'unknown'
    end
  end
end

Read metadata

AWS allows VMs to be tagged with metadata that can be read elsewhere. Often, these types of tags are created already for the purpose of charging back the cost of the node to the group responsible for them. This metadata can be turned into custom facts and used with Puppet as well.

The following code creates Facter facts for the EC2 region and the EC2 tags (credit to Chris Barker and Adrien Thebo for this code snippet):
Facter.add(:ec2_region) do
  confine do
    Facter.value(:ec2_metadata)
  end
  setcode do
    region = Facter.value(:ec2_metadata)['placement']['availability-zone'][0..-2]
    region
  end
end

Facter.add(:ec2_tags) do
  confine do
    begin
      require 'aws-sdk-core'
      true
    rescue LoadError
      false
    end
  end

  confine do
    Facter.value(:ec2_metadata)['iam']['info']
  end

  setcode do
    instance_id = Facter.value('ec2_metadata')['instance-id']
    region = Facter.value(:ec2_metadata)['placement']['availability-zone'][0..-2]
    ec2 = Aws::EC2::Client.new(region: region)
    instance = ec2.describe_instances(instance_ids: [instance_id])
    tags = instance.reservations[0].instances[0].tags
    taghash = { }
    tags.each do |tag|
      taghash[tag['key'].downcase] = tag['value'].downcase
    end
    taghash
  end
end

Write a custom fact script

Any script that returns a value can be turned into a custom fact. With a Ruby fact, it’s simply a matter of wrapping the script in a setcode and exec statement:
Facter.add('hardware_platform') do
  setcode do
    Facter::Core::Execution.exec('/bin/uname --hardware-platform')
  end
end
However, Facter also supports running any script in its native format, such as Bash or PowerShell. Simply ensure the script returns key value pairs, and place in the facts.d folder of any module. Pluginsync will copy the fact, the same as for facts in the lib/facter directory of a module. A simple example:
#!/usr/bin/bash
echo testfact=fluffy

Drop facts file

When there is no programmatic way to determine the appropriate value, Facter supports the creation of this type of metadata via fact files in /etc/puppetlabs/facter/facts.d. (Note: This folder isn’t pre-created on a default Puppet Enterprise install.) Many organizations actually prefer this method over the deterministic methods above, because it avoids potential collisions. These files can be in yaml, json or txt format. There can even be executable scripts in this directory, as long as they return key value pairs. Generally I consider txt to be easiest as it's simply:
key=value
There can be any number of files in this directory, each containing any number of key value pairs. Files in the same directory can have different formats, as well. Choosing how to break up facts between multiple files, or whether to consolidate them, tends to relate more to how they are created.

Generally, facts files work best when they are created by the provisioning system at initial provisioning. Most workflows with VMware vRealize Automation or Cisco UCS Director use this method to pass information from the provisioning system to Puppet. It's relatively easy in any of these systems to create a file with the appropriate values on the provisioned system, then install Puppet and let it handle the rest. An example of a file that might get created is below:
facts.txt
datacenter=portland
application=doc
team=TSE

Conclusion

Hiera is one of the most powerful parts of Puppet for enabling reusability of code, but custom facts are a critical component you need to take advantage of that capability. Starting with a sane but simple hierarchy and building a few simple custom facts can greatly accelerate the ability to adopt Puppet across your organization. It's quite likely that you'll need to mix and match, and tweak the examples here to fulfill your organization's needs, but my hope is that this blog will provide sufficient guidance to get you started with Hiera and custom facts.

Chris Matteson is a senior technical solutions engineer at Puppet.

Learn more