Skip to content

Instantly share code, notes, and snippets.

@sveetch
Last active September 16, 2021 23:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sveetch/20993a397dac0d6355a35b07568c5279 to your computer and use it in GitHub Desktop.
Save sveetch/20993a397dac0d6355a35b07568c5279 to your computer and use it in GitHub Desktop.
A script to extract installed packages versions from a Buildout project
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
A script to extract installed packages versions from a Buildout project.
Install
*******
This is a "one man army" script without specific requirements. Just drop the file in
your project and use it.
However, some environment don't have a "setuptools" installed globally (either the
system or pyenv), then it will require to be done. We recommend the last compatible
version: ::
pip install 'setuptools==44.0.0'
Require
*******
* Python2.7 (not Python3 compatible but it should not affect working on a Python3
buildout project);
* setuptools>=7.0,<45 (more recent setuptools may change Distribute behaviors);
Usage
*****
For help, just execute this script with a Python2 interpreter like this: ::
python dr_eggs.py -h
Or export into a JSON file: ::
python dr_eggs.py --format=json > eggs.json
Or with specific filepath to the django script: ::
python dr_eggs.py --format=json --script ../../foo/bin/django-instance > eggs.json
The long explanation you want to scroll out
*******************************************
Why
---
Because we have a lot of Buildout projects to maintain and since they stand on Python2
and old libraries, a lot of incompatible package install fail on updates. This leads to
long search among libraries to find the compatible ones.
Also, we need a "drop & use" solution to avoid adding new configuration or dependancies
to these old project. So other solutions like Buildout plugins or third party libraries
are not desired.
What
----
A simple script which stand on some assertions on our Buildout projects (configuration,
structure, etc..) to take some shortcut and directly run to our goal.
It parses the Python "binary" script builded by Buildout to launch Django. This script
always include the whole list of installed eggs and can be believed almost blindly.
The script has been developed for Python2 to be used in our old environments without
any problem.
The parser is pretty naive and won't like specific syntax out the ones we known, also it
assumes the first variable (at top level) matching is the right one, no smart selection
here.
Todo
----
* Be more safe when given filepath script is invalid (not Python or whatever syntax)
with a relevant output message;
* Be more safe when given filepath script does not exists with a relevant output
message;
* Add a new filepath argument for develop-dir path where to search for developed package
and store it with a special flag. I don't know if we may instead detect them from
collected eggs;
"""
__version__ = "0.2.0"
import datetime
import json
import os
import ast
from pkg_resources import Distribution
class PythonScriptParser(object):
"""
A naive Python script parser with ``ast`` module.
Naive because only very few set of abstract syntax is implemented, only for
the thing we are looking for.
This perform some syntax representation from methods, since ``ast`` does not provide
them. We use the representations to get name and value for assignments and find the
one we are looking for.
On syntax representation methods almost everything that is not implemented
will raise a ``NotImplementedError`` exception so your code can catch it to ignore
code that do not match the relevant subject.
The subject is to find installed package from Python "binary" script from a buildout
project. Usually, this script contains something like a ``sys.path[0:0]`` assignment
which contains a list of package path (to eggs).
"""
def represent_attribute(self, node):
"""
Return string content of an ``ast.Attribute``.
Arguments:
node (ast.Attribute): The node to inspect.
Raises:
NotImplementedError: For every syntax part we don't support.
Returns:
string: attribute name
"""
content = node.value.id
if hasattr(node, "attr"):
content = content + "." + node.attr
return content
def represent_slice(self, node):
"""
Return content of an ``ast.Slice``.
Arguments:
node (ast.Slice): The node to inspect.
Raises:
NotImplementedError: For every syntax part we don't support.
Returns:
string: slice content surrounded by brackets
"""
content = ""
if getattr(node, "lower") is not None:
content = str(node.lower.n)
content = content + ":"
if getattr(node, "upper") is not None:
content = content + str(node.upper.n)
if getattr(node, "step") is not None:
raise NotImplementedError(
"Only 'lower' and 'upper' are implemented for ast.Slice, not 'step'"
)
return "[" + content + "]"
def represent_substring(self, node):
"""
Return content of an ``ast.Subscript``.
Arguments:
node (ast.Subscript): The node to inspect.
Raises:
NotImplementedError: For every syntax part we don't support.
Returns:
string: substring content
"""
content = ""
content = content + self.represent_attribute(node.value)
if isinstance(node.slice, ast.Slice):
content = content + self.represent_slice(node.slice)
else:
raise NotImplementedError(
"Only ast.Slice is implemented for subscript, not ast.Index or other"
)
return content
def represent_str(self, node):
"""
Return string content of an ``ast.Str``.
Arguments:
node (ast.Str): The node to inspect.
Raises:
NotImplementedError: For every syntax part we don't support.
Returns:
string: string content
"""
return node.s
def represent_list(self, node):
"""
Return items of an ``ast.List``.
Arguments:
node (ast.List): The node to inspect.
Raises:
NotImplementedError: For every syntax part we don't support.
Returns:
list: items
"""
content = []
for child in node.elts:
if isinstance(child, ast.Str):
content.append(
self.represent_str(child)
)
else:
raise NotImplementedError(
"Only ast.Str is implemented for ast.List.elts items"
)
return content
def represent_assign(self, node):
"""
Return string content of an ``ast.Assign`` (variable assignment).
Arguments:
node (ast.Assign): The node to inspect.
Raises:
NotImplementedError: For every syntax part we don't support.
Returns:
tuple: The variable name and its value. The name is a string, the value
may be anything, but actually only a list or a string is implemented.
"""
variable_name = ""
content = None
if getattr(node, "targets") is not None:
# There may not be multiple targets, no ?
variable_name = variable_name + self.represent_substring(node.targets[0])
else:
raise NotImplementedError(
"Only ast.Assign with targets attribute is implemented"
)
if getattr(node, "value") is not None:
if isinstance(node.value, ast.Str):
content = self.represent_str(node.value)
elif isinstance(node.value, ast.List):
content = self.represent_list(node.value)
else:
raise NotImplementedError(
"Only ast.Str and ast.List are implemented for ast.Assign.value"
)
else:
raise NotImplementedError(
"Only ast.Assign with value attribute is implemented"
)
return variable_name, content
def seek_for_packages(self, tree, pattern):
"""
Search only for variable with a list which have the variable name as given in
``pattern`` argument.
Arguments:
tree (ast.Node): The node tree to inspect.
pattern (string): The assignment variable to search.
Returns:
tuple: Variable name and value for the searched variable name pattern. Will
return ``None`` if no variable have been matched for given name pattern.
"""
for topnode in tree.body:
if isinstance(topnode, ast.Assign):
try:
name, content = self.represent_assign(topnode)
except NotImplementedError:
pass
else:
if name == pattern:
return name, content
return None
class BuildoutPackagesCollector(object):
"""
Scan a Buildout project and collector every installed packages.
Keyword Arguments:
sort (boolean): Select if found packages are sorted by their path or not.
Default is True.
Attributes:
SCRIPT_FILEPATH (string): Default script filepath to scan.
SCRIPT_PACKAGESET_PATTERN (string): Default variable name pattern to search to
get the package list.
"""
SCRIPT_FILEPATH = "bin/django-instance"
SCRIPT_PACKAGESET_PATTERN = "sys.path[0:0]"
def __init__(self, sort=True):
self.sort = sort
self.registry = []
def store_package(self, distrib):
"""
Return collected package informations from given distribution.
Arguments:
distrib (pkg_resources.Distribution): A Distribution object.
Returns:
dict: Package informations (name, version, requirements, egg name).
"""
self.registry.append({
"name": distrib.project_name,
"version": distrib.version,
"requires": [str(item) for item in distrib.requires()],
"egg": distrib.egg_name(),
"develop": False,
})
def parse_script_source(self, script_filepath, pattern):
"""
Parse script to get package list
Arguments:
script_filepath (string): File path to the script to scan.
Returns:
list: A list of packages found in scanned script.
"""
fp = open(script_filepath, "r")
script_content = fp.read().encode("utf-8")
fp.close()
parser = PythonScriptParser()
tree = ast.parse(script_content)
found = parser.seek_for_packages(tree, pattern)
# When the parser did not match any variable
if not found:
return
varname, varcontent = found
return varcontent
def as_requirements_file(self, registry):
"""
Return a requirements.txt file from exact installed egg versions.
Arguments:
registry (list): List of packages informations as returned from scan
method.
Returns:
string: Package exact versions.
"""
today = datetime.date.today()
lines = ["# Freezed versions on {}".format(today)]
develop_mention = "# Versions from develop eggs or installed packages"
develop_lines = []
# First standard eggs
for item in registry:
if not item["develop"]:
lines.append(
"{name}=={version}".format(
name=item["name"],
version=item["version"],
)
)
# Then develop eggs and libraries (from environment site-packages)
for item in registry:
if item["develop"]:
develop_lines.append(
"{name}=={version}".format(
name=item["name"],
version=item["version"],
)
)
# Append develop mention with a divider white space
if len(develop_lines) > 0:
develop_lines = ["", develop_mention] + develop_lines
return "\n".join(lines + develop_lines)
def as_buildout_version(self, registry):
"""
Return a version.cfg file from exact installed egg versions.
This is just a wrapper around ``as_requirements_file`` method to replace
requirements file syntax with the buildout version file.
Arguments:
registry (list): List of packages informations as returned from scan
method.
Returns:
string: Package exact versions.
"""
return self.as_requirements_file(registry).replace("==", " = ")
def scan(self, script_filepath=None, pattern=None):
"""
Main method to scan project requirements
Keyword Arguments:
script_filepath (string): File path to the script to scan. Default to
``BuildoutPackagesCollector.SCRIPT_FILEPATH`` value.
pattern (string): Variable name pattern to search to get the package list.
Default to ``BuildoutPackagesCollector.SCRIPT_PACKAGESET_PATTERN``
value.
"""
script_filepath = script_filepath or self.SCRIPT_FILEPATH
pattern = pattern or self.SCRIPT_PACKAGESET_PATTERN
# Get packages from parsed script source
packages = self.parse_script_source(script_filepath, pattern)
if self.sort:
packages = sorted(packages)
# Collect informations about packages
for pkg in packages:
if pkg.endswith(".egg"):
distrib = Distribution.from_filename(pkg)
self.store_package(distrib)
else:
# This can be either a developed package, the django project directory
# or the Python site-packages dir. Only the develop package would
# interest us but it need to be correctly detected and flagged as so
# when collected.
pass
return packages
def output(self, format="json"):
"""
Output registry to a convenient format for export.
This directly use content from ``BuildoutPackagesCollector.registry`` attribute.
Keyword Arguments:
format (string): Either "json", "requirements" or "buildout".
* "json" : Output the full registry as JSON.
* "pip": A "requirements.txt" file.
* "buildout": A Buildout versions file.
Both JSON and Buildout format divide standard eggs from develop
eggs.
Returns:
string: Output format vary depending given format argument.
"""
if format == "pip":
return self.as_requirements_file(self.registry)
elif format == "buildout":
return self.as_buildout_version(self.registry)
return json.dumps(self.registry, indent=4)
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(
description=(
"Scan and output versions from a Buildout project installed packages."
),
)
parser.add_argument(
"--script",
default=None,
help=(
"Give a custom filepath for script source with package list. "
"Default to '%s'."
) % BuildoutPackagesCollector.SCRIPT_FILEPATH
)
parser.add_argument(
"--pattern",
default=None,
help=(
"Variable name pattern to search for variable with the package list. "
"(Remember to quote it and escape possible special characters) "
"Default to '%s'."
) % BuildoutPackagesCollector.SCRIPT_PACKAGESET_PATTERN
)
parser.add_argument(
"--format",
choices=["json", "pip", "buildout"],
default="json",
help="Choose the format you want to export. Default to 'json'."
)
parser.add_argument(
"--no-develop",
action="store_true",
help="Disable scanning develop eggs and libraries.",
)
args = parser.parse_args()
b = BuildoutPackagesCollector()
found = b.scan(
script_filepath=args.script,
)
# Check results
if not found:
print "/!\ No match for given script."
else:
print b.output(args.format)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment