Skip to content

Instantly share code, notes, and snippets.

@mottosso
Last active August 25, 2020 21:00
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mottosso/0945b2d19a1920e999fbfb61f4f301a3 to your computer and use it in GitHub Desktop.
Save mottosso/0945b2d19a1920e999fbfb61f4f301a3 to your computer and use it in GitHub Desktop.
7th June 2019 - Software Packaging Detour

Today I ran into an issue occurred on a users machine that didn't occur on mine, due to our environment being different. This is exactly the kind of issue Rez is good at solving, however the issue found its way around that by leveraging the fact that our Python distributions wasn't actually Rez packages. I had made them such that they referenced a local install, and it just so happened we had installed different versions.

So I had look at remedying this, not just for this instance, but to seal this hole permanently. It's one of the things missing at the moment, (1) some way of installing system software and (2) installing it to support both Windows and Linux.


Developer REZ_PACKAGES_PATH

One of the requirements for this project was to enable a developer to have an additional set of packages available to him such that he is able to test a complete context ahead pushing it to the floor.

Rez already provides a mechanism for this, and rather than embedding this into the GUI I thought it best to leave this for the console.

The workflow is as follows.

  1. Per default, REZ_PACKAGES_PATH points to globally available packages
  2. Optionally, a developer may append additional directories to this path, prior to opening launchapp2

Perhaps the most straightforward way of keeping track of which paths you have and how to edit them, is to wrap them into a shell script.

launchapp2.bat

@echo off
setlocal

set REZ_PACKAGES_PATH=%USERPROFILE%\packages;%REZ_PACKAGES_PATH%

rez env python-3 PyQt5 rez -- python -m launchapp2 --root /path/to/projects

Python as a Package

Today we ran into an issue with launchapp2 that happened because our environments was not perfectly aligned. So I wanted to take the opportunity to address this proper.

So far, we've been relying on Python being accessible from the system, like we are with Maya and Nuke etc. The Rez package merely appends the c:\python27 directory to the PATH, which carries a few benefits.

  1. We won't have to worry about keeping a Python per platform as a package on the server
  2. There is no performance overhead of using Python

However, it comes at a cost.

  1. We sometimes can't know for sure that the version of Python I have installed is the same as yours
  2. The version can be 64-bit on one machine, 32-bit on another
  3. The system Python may have packages in its site-package that differ from the ones on mine

So how can we address this, without losing the aforementioned advantages?

This led me down a deep rabbit hole.

Summary

Hole Pros Cons
1 Python as a payload Control We are in full control over what goes into a package and how. Size 30+ mb * version * variant quickly adds up. Not to mention other packages potentially being much larger.
Consistent The same applies to not only Python, but just about any other package too
2 Python from the web VCS Store just the definition of Python in GitLab, and leave the payload for someone else to host Connectivity More moving parts means more potential for error, not to mention being required to have a working internet connection to install packages.
2.1 Embedded from python.org Isolated These are small (6 mb unzipped) and able to fully exclude PYTHONPATH, with full control over sys.path on start-up, which is great for our purposes. No Python 2 Unfortunately
Portable No install required
2.4 Exe from python.org Windows-only
Admin privileges required
2.3 MSI from python.org No Python 3
Windows-only
2.4 Miniconda Portable Entangled Conda is a cesspool of bad practices and wasted diskspace, extracting only Python from a conda "environment" is non-trivial.
Linux Yes, there is one for Linux and even OSX!
2.5 Conan Portable Limited to C++ libraries
2.6 Chocolatey Command-line based Entangled Packages merely wraps official installers, which means we are in no control over where packages end up or have any consistent interface for interacting with command-line arguments for them.
2.7 Conda Forge Independent of Conda If we just download the package ourselves, and not worry about the package manager Restricted They've made an effort to prevent downloads from Python, presumably to prevent abuse
2.8 NuGet Portable Like conda and Embedded Windows-only
Limited selection It does have Python and a number of other apps, but ultimately this is a collection of packages specifically for C#
2.9 Scoop Portable Like conda and Embedded Windows-only
Shims Scoop separates between installation directory and executables, which is very useful! No versioning Only the latest, or "current" version of each package exists
Small implementation The community is active, on GitHub and the project isn't enough out of control or complicated enough to have an effect on. It's entirely written in Powershell.
Big repository Where NuGet and Conda only provides a handful of system pacakges, Scoop only has system packages, and lots of them.
Scalable The same idea extends to Linux and yum or apt-get

Python as a Payload

Initially I figured we could store a version of Python in the project itself, that is later included in the release.

3.7/
  windows/
    Python37
  linux/
    Python37
  package.py

But that's problematic as Linux doesn't necessarily provide portable versions Python, but instead favours use of e.g. yum and apt-get.

Python from the Web

If Python for Linux isn't bundable like this, could we level the playing field and fetch Python for Windows off the internet as well?

Exe from python.prg

I found that there are options for installing Python from the command-line, however the problem was it required Admin privileges, despite writing to a user-writable directory and not affecting system environment variables. That's a bummer.

Embedded from python.org

There is another Python distribution for Windows called "embedded" which is meant for embedding into a software project, similar to mayapy for Maya.

It doesn't come with options and is shipped as a single .zip file, and excludes a number of default packages like pip and tkinter. It also doesn't take PYTHONPATH into account per default, which makes it highly self-contained and portable. Almost exactly what we need.

We still do need it to be PYTHONPATH-aware, as that's how we're able to append Python modules from other packages. Ideally we would be communicating "privately", as in having our own REZ_PYTHONPATH that it picks up, such that it cannot be mistaken for what a user or system may have chosen to put there. But for the time being I'll delete the python._pth to revert to its original behavior of reading from PYTHONPATH.

The next challenge was figuring out how to devise a Rez package that didn't just copy or compile files from its local directory, but actually went online to fetch a Python distribution. The added benefit is that we're not able to host this package in our own internal GitLab instance, without having to host the actual binaries (of the many versions of Python we're interested in having). This pattern then also applies to just about anything available online.

url = "https://www.python.org/ftp/python/{0}/python-{0}-embed-amd64.zip"
url = url.format("3.7.3")
dst = os.path.join(path, "python")
fname = os.path.join(dst, os.path.basename(url))

try:
    os.makedirs(dst)
except OSError as e:
    if e.errno != errno.EEXIST:
        raise

print("Downloading %s.." % url)
urlretrieve(url, fname)

print("Unzipping.. %s" % fname)
with zipfile.ZipFile(fname) as f:
    f.extractall(dst)

print("Cleaning up..")
os.remove(fname)

# These normally restrict Python from reading PYTHONPATH
for pth in glob.glob(os.path.join(dst, "*._pth")):
    os.remove(pth)

print("Done")

Having done that however, I quickly realised there was no equivalent for Python 2..

Conda Forge

Conda provides binaries for almost every version of Python, and must surely be portable as they are installed into what amounts to individual virtual environment.

These are great, and available for each platform. Except we aren't able to as easily insert a version number in to the final URL since the URL also contains what looks like a commit hash.

1h later

As it turns out, downloading from anywhere but a browser is limited, yielding a CloudFlare warning about permissions. My guess is that they discourage use of their packages outside of Conda itself.

MSI from python.org

That leaves having to write two separate install procedures, one for embed.zip and another for the .msi package Python 2 ships as.

$ msiexec /i python-2.7.15.amd64.msi TARGETDIR="%cd%" /qn /norestart

1h later

As it happens, the MSI doesn't enjoy being installed by an unprivileged user. I did somehow manage to get files populated in a target folder, but could never reproduce it and the online community seemed to advise against it. The next issue was the installer not actually pausing until finished, but rather taking off independently in the background with no indication of when it actually finished..

At this point, it's end of day and the rabbit hole had proven much deeper than originally anticipated. I'm going to have to up my game, and see about collaborating with Miniconda for this to work.

Miniconda

20 mins later

Miniconda did have a silent option for installing to a custom path, but took an excessive 8 minutes to finish, at 20-50% CPU consumption. It's a 60 mb download and 600 mb installed.

Furthermore, what I wanted to do was use Conda to fetch Python so that it could be relocated to a Rez package. But it doesn't appear as though it's able to do that.

  1. Packages may only be installed into a conda "environment"
  2. An environment may not be created without also including a number of packages I didn't ask for
  3. An environment may not be created without also including Python

Which means the Python distribution does get installed, but is entangled into this "environment". It didn't appear worth trying to pry this out. One thought was to not only include a conda install as a Rez package, but an environment too. We need some way of distinguishing between what is installed and what already was, so we can properly extract a Rez package from it. But at 600 mb, I am left speechless.

Consider this simple request.

$ conda create --name myenv six

And just look at what it came up with.

create --name tempenv six
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: C:\Users\manima\Dropbox\dev\anima\github\mottosso\rez-for-projects\dev\zconda\build\temp\envs\tempenv

  added / updated specs:
    - six


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.5.15  |                0         166 KB
    openssl-1.1.1c             |       he774522_1         5.7 MB
    pip-19.1.1                 |           py37_0         1.8 MB
    python-3.7.3               |       h8c8aaf0_1        17.8 MB
    setuptools-41.0.1          |           py37_0         680 KB
    sqlite-3.28.0              |       he774522_0         945 KB
    vs2015_runtime-14.15.26706 |       h3a45250_4         2.4 MB
    wheel-0.33.4               |           py37_0          57 KB
    ------------------------------------------------------------
                                           Total:        29.5 MB

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/win-64::ca-certificates-2019.5.15-0
  certifi            pkgs/main/win-64::certifi-2019.3.9-py37_0
  openssl            pkgs/main/win-64::openssl-1.1.1c-he774522_1
  pip                pkgs/main/win-64::pip-19.1.1-py37_0
  python             pkgs/main/win-64::python-3.7.3-h8c8aaf0_1
  setuptools         pkgs/main/win-64::setuptools-41.0.1-py37_0
  six                pkgs/main/win-64::six-1.12.0-py37_0
  sqlite             pkgs/main/win-64::sqlite-3.28.0-he774522_0
  vc                 pkgs/main/win-64::vc-14.1-h0510ff6_4
  vs2015_runtime     pkgs/main/win-64::vs2015_runtime-14.15.26706-h3a45250_4
  wheel              pkgs/main/win-64::wheel-0.33.4-py37_0
  wincertstore       pkgs/main/win-64::wincertstore-0.2-py37_0


Proceed ([y]/n)?


Downloading and Extracting Packages
setuptools-41.0.1    | 680 KB    | ############################################################################### | 100%
vs2015_runtime-14.15 | 2.4 MB    | ############################################################################### | 100%
python-3.7.3         | 17.8 MB   | ############################################################################### | 100%
pip-19.1.1           | 1.8 MB    | ############################################################################### | 100%
openssl-1.1.1c       | 5.7 MB    | ############################################################################### | 100%
wheel-0.33.4         | 57 KB     | ############################################################################### | 100%
ca-certificates-2019 | 166 KB    | ############################################################################### | 100%
sqlite-3.28.0        | 945 KB    | ############################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > activate tempenv
#
# To deactivate an active environment, use:
# > deactivate
#
# * for power-users using bash, you must source
#

This is madness.

Chocolatey

So next I had a look at Chocolatey. I found that it does indeed have support for non-adminstrative installs.

And even mentions a dedicated set of packages suitable for this kind of install.

Let's go.

1h later

Sigh. Chocolatey isn't a real package manager, it merely wraps native installers, like the .exe from python.org into a command-line interface. It's not shy about it either, there is very little effort at changing that and as such it is not suitable for our purposes, as we can't provide a consistent set of options for the packages being installed, such as where to install them.

NuGet

I had only heard about these, and found that they are the package manager for C#, like pip is to Python. As it happens however, they also provide packages for portable apps, like 7-Zip, Node.js and - Python.

More importantly:

  1. Available for both Python 2 and 3
  2. Well maintained and up to date, latest Python being 3.8 beta
  3. No restrictions on downloads from Python
  4. Consistent download URLs which is great for passing e.g. 3.7.0 to the installer, unlike Conda which included a commit hash in each download.

It does however not provide a Linux build which is where things fall short.. Solving for Windows is only half the story.

Scoop

At this point, I was prepared to throw in the towel and re-invent the wheel, so to speak, and make a repository of packages from scratch. But before I did, I ventured online to search one last time for "windows package manager" and this was how I found Scoop.

Then I stumbled upon Scoop.

$ scoop install git

Here's what a package, referred to as a "manifest" looks like.

It has at least two strengths for our usecase, compared with Chocolatey which scratches a similar itch:

  1. No admin Doesn't require admin privileges
  2. Isolated installs Isolates each install into one directory

Right off the top however, it does have a few warts.

  1. No versions It doesn't natively support requests for a specific version of a package, like python-3.6; instead, a select few packages exists as alternative repositories, like python2 and even then only one version for it.
  2. No prefix It doesn't support specifying what directory an install ends up in, which is a problem for us as we need to redirect installs into a Rez package.
  3. Bad exe's It appears the mechanism it uses to bring executables into a common directory, referred to as "shims" is flawed, in particualr with Python.

However, upon researching an unrelated issue I stumbled upon an alternative implementation for it.

And the way it works is very interesting! I've long been looking for a way of gathering executables into a single directory that didn't rely on symlinks.

  • Soft symlinks technically work, but require admin privileges on Windows
  • Hard symlinks does not require admin privileges, but break an executable as it also changes that executables path relative itself. Many executables, including Python, use its executable as a fixed point around which dependencies reside, like DLLs and configuration files, e.g. python._pth.

A "shim" solve this, by providing all of the benefits of a symlink, without admin privileges.

So how does it work?

# 1. Create an executable to any program
$ cp shim.exe python.exe
# 2. Point this executable to an absolute path
$ echo path = c:\python27\python.exe > python.shim
$ echo args = -u > python.shim
# 3. Profit
$ python -c "print('hello world!')"
hello world!

And presto, you've got an executable to python.exe that can be placed anywhere on your system and always refer to this abolute path.

But why?

For completeness, the advantage to doing this - both for Scoop and in the general case - is that we can effectively put together a single directory of executables, and expose this one directory to PATH.

c:/
  my_bins/
    python.exe
    pip.exe
    maya.exe
    ls.exe
    tree.exe

You could get fancy, and include version numbers too.

c:/
  my_bins/
    python27.exe
    python36.exe
    pip27.exe
    pip36.exe
    maya2018.exe
    maya2019.exe
    ls.exe
    tree.exe

Genius.

For Rez, it means we could get rid of the "shims" generated by pip, and not only add transparency to what a shim is actually doing and where the actual executable resides, which is not only editable in plain-text, it also solves issues it has with cmd.exe history.

Scoop and --prefix

After some investigation, I found that there was in fact a method of overriding where packages are installed, although it's a little hacky and apparently unsupported.

$ $env:SCOOP_HOME=c:\custom\dir

I came across this in their appveyor.yml.

It would appear that this is the root directory for each of Scoop's additional variables involving an installation path.

The only issue with this is that Scoop uses this itself; it relies on Scoop being an app/ and its executable being available in shims/ which means we can't simply install things into an empty folder and pick apart what we want. We also can't delete this folder en masse on finish.

Nonetheless, we do know what will be in there, and can look the other way.

1h later

So far so good. I found that Scoop automatically creates a junction from the latest installed version of an app to a directory called "current/" which is quite clever. It means you're able to reliably say apps/python/current/python.exe knowing that you'll end up with the most recently installed version.

However, it also meant complicating the automatic removal of these apps, as Python's shutil.rmtree doesn't take junctions into account and proceeds to delete everything inside of it. If the version then is deleted first, then current would be invalid, causing Python to throw its hands up. I couldn't find much about this online, except for one small mention of os.rmdir being able to account for this.

What I eventually ended up with was this monstrosity.

def _rm_directory_junctions(root):
    for base, dirs, files in os.walk(root):
        for dirname in dirs:
            abspath = os.path.join(base, dirname)

            # Python cannot detect whether a directory is a soft
            # directory symlink, but must be removed using `os.unlink`
            try:
                os.unlink(abspath)
                log.debug("Unlinked directory symlink '%s'.." % abspath)
            except OSError:
                pass

            # What a mess. Python cannot delete a directory created with
            # `mklink /J` which is what Scoop creates for its `current/`
            # version.
            try:
                for cmd in ('fsutil reparsepoint delete "%s"',
                            'attrib -R "%s"',
                            'rmdir "%s"'):
                    subprocess.check_output(cmd % abspath)
                log.debug("Unlinked junction '%s'.." % abspath)
            except subprocess.CalledProcessError:
                pass

    # Finally, we can delete the rest, non-junctioned files and folders
    shutil.rmtree(root)

However, considering Scoop exists only on Windows, I found it safe to rely on Windows utilities and resorted to the much shorter and more reliable:

subprocess.check_call('rmdir /S /Q "c:\path\to\scoop_home"')

2h later

Presto, here's what we've got.

zscoop

I'm calling is ZScoop, in that it's Scoop, but for Rez. I've designed it to work akin to rez wheel which I'll extract into zpip for consistency. Then, for Linux, I'll implement zyum to leverage its repository of binary installs for the CentOS operating system, and we should be home free.


Virtual Command-line

As a complete side-note.

Scoop generates something it calls "shims" for every installed package, like vim and python. A shim is an executable much like Python's "scripts", in that they act like an executable but really just forward the call to another executable. In Python, they're forwarded to a Python script, like python -m mymodule. In Scoop, they're forwarded to their corresponding executable, like ../apps/python/3.7/python.exe.

This shim was written in C# and compiled which made me think, rather outlandishly, about wrapping command-lines in Python (?).

$ python command_line.py
> $ ls
command_line.py file1.py directoryA directoryB
> $ touch hello.txt
> $ echo World! >> hello.txt
> $ cat hello.txt
World!
> $ start "" explorer
...

What's going on here is that every command is simply forwarded to subprocess.check_call.

import signal
import subprocess

# CTRL+C equals death
signal.signal(signal.SIGINT, signal.SIG_DFL)

while True:
    command = input("> $ ")

    try:
        exec(command)
    except (SyntaxError, NameError):
        try:
            subprocess.check_call(command,
                                  shell=True,
                                  universal_newlines=True)
        except subprocess.CalledProcessError:
            print("Failed")

    except Exception:
        print("Unhandled exception")
        raise

    else:
        # The command ran as Python, and that's OK
        continue

So then what's the point? :/

Primarily, isolation. Each command is called in its own instance of cmd.exe in this case. Which means that calls to set VAR=True won't actually affect your environment. That's bad for a typical terminal, but good for Rez, as it gives Rez final-say on what does and doesn' affect the environment.

The environment is entirely provided for by the running Python instance. You'll notice exec is run, regardless of a Syntax or NameError. That means we're able to use Python in-conjuction with the terminal.

> $ os.environ["PATH"] += r";c:\path\to\git"
> $ git clone https://github.com/mottosso/bleeding-rez
...

Now why is this useful? Because Python is identical across any platform. It means the user can interact with his system via e.g. bash or cmd or powershell or fish what have you, but still have a common vocabulary for interacting with Rez and what is effectively a "meta shell", managing its configuraion and higher order functionality like the environment.

Syntax maintains a clear separation between what is shell and what is Python.

> $ command arg1 arg2
> $ command("arg1", "arg2")

In that, Python is functions and object-orientation. Shell is command followed by spaces.


Tomorrow

It's Friday, so tomorrow is Monday. I'll wrap up this installer method, take it for a spin and return to ticking boxes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment