Skip to content

Instantly share code, notes, and snippets.

@pburkholder
Last active November 2, 2016 18:27
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pburkholder/9c397c36fb966bd54be7c39ff1501776 to your computer and use it in GitHub Desktop.
Save pburkholder/9c397c36fb966bd54be7c39ff1501776 to your computer and use it in GitHub Desktop.
Chef and admin privilege

Original Question:

I had a call this afternoon ... and the question posed was how are we getting around the requirements for Admin level permissions on the Dev Environment to install and run Chef. It was a great question and I am sure we will run into that problem here very shortly. As you may or may not know, the Security requirements don’t allow [our organization's] Developers to have Admin level priv on the computers. Chef requires that to run.

TL;DR:

  1. We need to reframe the question from the developer's access to the platform's permissions
  2. The chef-client scheduled task on the windows nodes won't be useful unless it runs as an admin user.
  3. Your developers will continue to need VMs to effectively do their work.

LONG VERSION:

I've seen this question before. When I was working as a Customer Success Engineer for Chef Software I had two customers who came up with similar issues running Chef in their environments. One was a privately-held investment fund, the other a contractor working with potentially top-secret data in the federal space.

In both cases I tried to work within the strictures that the customer believed they were working under, and tried to come up with workflows where chef-client ran as a limited-privilege user. In both cases we largely failed to provide any process improvement, for reasons I'll outline below. Before making that mistake again, we need to reframe the concerns around the question.

To wit: Unless your organization is ready for fully immutable infrastructure, then some agent on the system needs to run with administrative privileges, whether that's LCM w/ DSC, or System Center 2012 Configuration Manager Client, or chef-client, or puppet-agent, or a privileged human user making changes directly to the system through the desktop UI (don't do this).

And your organization is not ready for fully immutable infrastructure: where you build a system image in a development environment complete with all the services and applications that will be running in dev, and then nothing changes on that disk image in dev, stage or prod -- all changes are mediated by real-time service discovery within the local envornment (that is, a service informs the node what load-balancer to use, what DB to connect to, or what URL to respond to for requests, etc. which all constitute service discovery). The image is never patched or updated; it is destroyed when a new image has come up through the same process of build and multi-environment testing. Using Chef can be a step toward immutable infrastructure, but in most cases it's used precisely to build and keep "mutable infrastructure" in a known state.

Absent immutable infrastructure, then your organization, usually via an Ops team, is running configuration management or infrastructure-as-code on behalf of the organization mission. Whether that code is SCCM or DSC or Puppet or Chef is immaterial -- in the end some privileged agent is executing code to configure a system to serve an agency's mission, a business's shareholders, or non-profit's goals.

When we introduce DevOps processes, we are not granting developers administrative access to target systems in dev/test/stage/prod. We are providing a platform for running application code that follows standard build, test, package, release development practices. That is, your developers currently can not, on their own, change the .NET or Java code that's running on a live system; those changes only happen via a release process with tested software artifacts. Likewise, with DevOps practices they cannot, on their own, change system code in those environments. That also happens via a release process for tested, bundled artifacts such as cookbooks or modules.

It is "possible" to run chef-client as non-admin user. But here's what can happen:

  • The admin user (LCM, or SCCM) tramples the non-admin chef-client changes
  • The non-admin chef-client can't do the configuration that is actually needed
  • It turns out that a non-admin chef-client user doesn't have the right scope for a set of tasks, so then you need a 'middleware chef-client user' and a 'front-end chef-client user'
  • Those two users start trampling each others work
  • and so on, until you have multiple code paths for multiple client that don't do the work they're supposed to and everyone wonders why none of the promised DevOps gains have been realized.

The questions is to ask here are not just: "Is Chef-Client secure?" Or "Can we we let developer teams write code that runs as admin?" and "Can we give them VMs for their work" but

Are we more secure and effective as an agency, or a business, or a nation when:

  • we can deliver on our mission in 1/100th the lead time that we do currently (ref: see work of Nicole Forsgren[1])?
  • any change to a system can be code-reviewed by multiple people since all changes are via code (and not via an irreproducible and error-prone GUI)
  • those changes are tested for validity AND compliance in non-production environments that we know are functionally identical to production
  • patches and security updates can be delivered to production, reliably, within hours of release (ref: a major bank with divisions using Chef patched heartbleed in 24h and about 6 man-hours, while non-automated divisions took 30 days and untold hundreds of man-hours).
  • we are working through a pipeline that provides proactive security and compliance, instead of a reactive, out-of-cycle set of practices

If you trust your developer and ops teams less than you fear the state actors trying to infiltrate and exploit your systems, then by all means don't let chef-client run as admin on your systems, and don't let developers run any hardware-abstraction VMs.

And if you don't trust your developer teams to write code for the chef-client, then you shouldn't trust them to write any code, since exploits can be slipped in at any layer if there isn't proper code review and rigorous testing.

That said, just as your teams need runtimes for the .Net or Java code, with DevOps practices your teams will need short-lived hardware-abstraction VMs to test and develop that code, whether those are provided via local hypervisor like VmWare Workstation or VirtualBox, or a cloud-based VM from OpenStack/Azure/AWS/etc.

[1] The lead time factor number is actually not 100x, but 2555x, but I thought 1/2555th would be too unbelievable.

Updates to include above

http://stevenmurawski.com/powershell/2016/03/dsc-partial-configurations-are-the-devil's-workshop/index.html - Same idea of splitting responsibility across teams

From Robb Kidd and Sean Walberg:

You're not giving developers admin access, you're moving the actions that require admin access into peer-reviewable testable, repeatable code for both devs and ops.

@pburkholder
Copy link
Author

The whole question of "admin access for devs" also points to the essential requirement for cross-functional teams. Each application should have at least one team member who’s responsible for production (or at least, production-like) operation. If there’s no one on the app team who has both authority and responsibility for that, then release cadence will be dominated by "Ops Team" capacity, and that bottleneck will likely stymie any process improvement. "The Phoenix Project" describes this scenario pretty starkly.

So to realize a short lead time we’ll need some variation of Devs w/ Admin Access, either by
a) implementing cross-functional teams by allowing folks currently in Ops roles the opportunity to work more directly with specific teams and their apps, or
b) by allowing some folks currently in dev roles to have some ops authority (namely, the ability to create and manage sandbox systems for application deployment testing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment