Skip to content

Instantly share code, notes, and snippets.

@cheeseplus
Last active March 22, 2017 17:23
Show Gist options
  • Save cheeseplus/8a1871b837a31cd6c0113a382fe8e03d to your computer and use it in GitHub Desktop.
Save cheeseplus/8a1871b837a31cd6c0113a382fe8e03d to your computer and use it in GitHub Desktop.

tl;dr

Enabling the shared folder cache based purely on platform name is overly agressive when the source of the box is unknown. The experience is statistically better if we know at least know that it's a bento box but even then that isn't a guarantee given versions of the hypervisor and vmtools in play. The proposed enhancements should not increase the number of downloads and are targetted at minimizing out of the box breakage for platforms which have an exceedingly low probability of working (i.e. anything not bento or boxcutter).

Caching mechanism

Shared folders are inherently brittle for a variety of reasons I've outlined on the issue and know far too well from building the bento boxes.

That said the only other alternative proposed is to utilize the transport layer of kitchen to shuttle the installer to the instance which is time consuming. We could test out the rsync shared folder method that vagrant provides which may be better in terms of performance.

The following presumes we are sticking with the shared folder implementation

Fixing the caching structure

This is only a problem if you're testing platforms that have an installer with the same name, i.e. Debian and Ubuntu. This manifests if you cache the installer for Ubuntu 16.04 and then run a Debian 7 suite after, you'll get a GLIBC error. This means we aren't actually checking the checksum of the packages either which is Probably Bad. Namepspacing everything into folders or embedding the checksum in the filename would be reasonable fixes. Related, we don't actually seem to do any checksum verification if something exists in the cache which allows the above case to occur.

Currently

/Users/cheeseplus/.kitchen/cache
├── chef-12.19.36-1.el6.x86_64.rpm
├── chef_12.18.31-1_amd64.deb
└── chef_12.19.36-1_amd64.deb

Better

├── debian-7
│   └── chef_12.19.36-1_amd64.deb
├── el-6
│   └── chef-12.19.36-1.el6.x86_64.rpm
└── ubuntu-16.04
   ├── chef_12.18.31-1_amd64.deb
   └── chef_12.19.36-1_amd64.deb 

Alternatively


├── debian
│   └── 7
│       └── chef_12.19.36-1_amd64.deb
├── el
│   └── 6
│       └── chef-12.19.36-1.el6.x86_64.rpm
└── ubuntu
   └── 16.04
       ├── chef_12.18.31-1_amd64.deb
       └── chef_12.19.36-1_amd64.deb

Impact on downloads: technically this means we are downloading another binary but since it's the binary we should have been grabbing all along, effectively no increase

Enabling cache only for bento boxes

How we enable caching: Currently the cache is only disabled in the case of Windows + non-virtualbox hypervisor OR the platform name matches the regex /(freebsd|macos|osx)/ OR it's been disabled manually.

Two components to the issue: move to a whitelist and only enable if we know the box came from bento.

The first component is simply moving from enumerating a list of platforms we expect it to NOT WORK on to a list of platforms we expect it to WORK. This means platforms like OmniOS or when the name is something goofy, we don't enable caching.

The second component involves kitchen-vagrant ONLY enabling caching when we know the box is a bento box. We currently only test the platform name for a string segment like ubuntu which means the box_name/url could be coming from any source and is no guarantee that said instance has the vmtools required installed. As mentioned in the issue we don't have any data for these downloads anyway but I'd be willing to bet good money that the bulk of kitchen-vagrant related chef-client downloads are for the bento boxes.

Impact on downloads: negligible - the first component throttles back how aggressively we try to enable the cache and is targeted mostly at fixing the experience for platforms we have little guarantee would work with the shared folder mechanism*. The second part is along the same lines of narrowing the funnel to just bento boxes for which we have a much higher probability of working out of the box. This is about improving the experience when we have a high probably that automatically enabling the cache will break things.

Catching on Error

Lastly, even if the stars are aligned such that we only enable the cache for a known bento box, there is no guarantee the host hypervisor and the version of bento box are compatible for shared folders. This case presents itself fairly regularly and even in patch versions of hypervisors, a recent example being Fusion 8.5.5 not working with bento boxes built with the tooling from 8.5.3.

Impact on downloads: none, this only helps us recover or provide a useful error message to the user to disable caching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment