I feel your pain. SSL is tough and is probably the number one stumbling block for new users getting Puppet working in their environment. Hopefully this answer helps reduce frustration and get you up and running. The good news is, once it's set up right, you won't have to fiddle with it any more.
First, make sure the problem you're having is actually an SSL problem. Almost all of the SSL-related error messages on the client start with the string SSL_connect
and then the error raised up by the underlying crypto libraries. General networking errors will not have this string, so normal network troubleshooting methodology applies; specifically, Connection refused - connect(2)
means a TCP connection attempt got a RST packet indicating a firewall or puppet master not running, and getaddrinfo: nodename nor servname provided, or not known
means the server's hostname (the value of puppet agent --configprint server
) was not resolvable in DNS/hosts.
Next, assuming you do have an SSL_connect
style error, it's time to dive under the hood a bit. This diagram illustrates a time-sequence diagram of SSL operations as an agent starts up. I'll step through the diagram and describe common error messages at each step and how to remedy them.
In this explanation, strings like
$server
indicate Puppet settings, not manifest, shell, or ruby variables. Since the values can be changed by command line flags and puppet.conf settings, I don't want to include absolute values. To display the correct value for your site, runpuppet [subcommand] --configprint [setting]
where[subcommand]
isagent
ormaster
and[setting]
is the setting whose value you want to see. Specific troubleshooting steps are written in boldface.
On startup, Puppet connects to its $ca_server
and downloads the CA certificate, step [1] in the diagram. All subsequent connections are validated against this certificate. (Verify that the CA certificate downloaded is the same as the one presented by the master) The agent fetches the Certificate Revocation List for the CA, shown in step [2]. The CRL contains serial numbers of revoked certificates and is signed by the revoking CA's certificate. If you get an SSL_connect error on the agent which says "Revoked certificate", it's because the server's certificate's serial number is in the CRL; issue a new certificate for the server with puppet cert generate
.
Next, the agent looks for its own certificate, as a file in $ssldir/certs/$certname.pem
; if this doesn't exist, the agent will attempt to fetch its own certificate from https://$ca_server/production/certificate/$certname
(diagram transaction [3]). If you have deleted the client's $ssldir
, but $ca_server
has a previously-issued certificate for the node, this transaction will download it to the client. But since the private key which generated the certificate request was deleted along with the clients $ssldir
, the certificate is now useless. If you get "retrieved certificate does not match private key" errors, you need to run puppet cert clean [nodename]
on the master, then rm $ssldir/certs/$certname.pem
on the client. Then re-run the agent and the master will 404, telling the client it needs to generate and send a new certificate signing request. (Step [4] in the diagram). Once the certificate request is on the master, it's up to your signing policy (as implemented by the $autosign
file) to determine whether the certificate is automatically issued or whether an administrator needs to run puppet cert sign
on the pending request before it's valid. Once it's issued, the agent's request for its own cert (shown in step [5]) will succeed and pluginsync and the catalog request proceed.
It's important to understand that Puppet uses mutual SSL authentication so the client first validates the server's certificate (same as any https web site) and then the server validates the client's certificate (which public websites never do) in order to validate the client's identity. Here is a list of things each side cares about the other side's certificate:
- is the system date within the
ValidityPeriod
in the certificate? If the system clocks are inaccurate this can fail - Is the certificate signed by the CA certificate specified by
$localcacert
? If you have recently 'started over' by removing your$ssldir
on the server, the client$ssldir
must also be deleted - Is the certificate's serial number listed in the
$hostcrl
? Similarly, if you have 'started over' with SSL directories, the serial numbers will also start from 1. - (client only) Does the certificate of the host I'm connecting to present a
Subject
orsubjectAltName
attribute that matches the name I'm using to connect? If you do no special configuration on the master, the certificate auto-generated the first time you runpuppet master
will contain subjectAltName attributes (aliases) ofpuppet
andpuppet.yourdomain.com
. If you do no special configuration on the client, it will attempt to connect to a host namedpuppet
. So the default setup just requires that you configure DNS/hosts name resolution to point the unqualified hostnamepuppet
to your server, and things will work. Alternately, setting$server
on the agent to the actual hostname of the master will also work. Anything else (IP addresses, aliases or CNAMEs other thanpuppet
on the master, subdomains, etc) requires that you use thepuppet cert generate --dns_alt_names=cname1:cname1.domain.com $certname
on the server to replace the default certificate with your own.
Things that the Puppet SSL layer explicitly DOES NOT care about:
- Whether there are reverse DNS (PTR records) for the clients and servers
- Whether each server in a load-balanced configuration has the same certificate
- Whether the master has a cached copy of the clients' certificates in its
$ssldir/certs
or$ssldir/ca/signed_certificates
directories
Helpful openssl troubleshooting commands (remember all $variables
here shorthand forpuppet [agent|master] --configprint variable
settings:
openssl x509 -noout -text -in $ssldir/certs/$certname.pem
-- displays the human-readable form of the certificate for this hostopenssl rsa -noout -modulus -in $ssldir/private_keys/$certname.pem
,openssl x509 -noout -modulus -in $ssldir/certs/$certname.pem
-- displays the unique modulus of the random number used to generate the keypair; if they do not match, the certificate was generated with a different private key; you need to clean it from the client and the server and re-issue.openssl s_client -connect $server:8140 -showcerts -CAfile $ssldir/certs/ca.pem -key $ssldir/private_keys/$certname.pem -cert $ssldir/certs/$certname.pem
This will create an encrypted connection to the master and show you the SSL messages and verification messages along the way. If the final line says "Verify return code: 0 (ok)" and leaves you at a prompt, you're successfully connected and can send fake HTTP requests to the puppetmaster application directly (GET /production/node/$certname HTTP/1.1\n\nAccept: pson
is a good thing to try)
Other resources
- Brice Figureau has a great blogpost with a very detailed description of Puppet's SSL layer.
- Jeff Moser's byte-by-byte analysis of SSL establishment is essential reading for protocol nerds
- If you end up doing a lot of modulus comparison, here is a nice openssl shortcut
Hi @ahpook, are you able to share the diagram? It doesn't appear to exist anymore.