Skip to content

Instantly share code, notes, and snippets.

@vodik
Last active June 2, 2018 18:42
Show Gist options
  • Save vodik/efe2310b8d0e5c065a1a38e582d4e580 to your computer and use it in GitHub Desktop.
Save vodik/efe2310b8d0e5c065a1a38e582d4e580 to your computer and use it in GitHub Desktop.
Hy Importer

Hy Loader

This is incomplete; please consider contributing to the documentation effort.

In order for Hy and Python to work together as nicely as it does, Hy code needs to be able to import Python code, and vice versa.

This is done through Python’s import hooks. However, since the implementation and feature set available differs between the various versions of Python, its worth noting how the system works and the various limitations and quirks, so not to fall into certain pitfalls.

Notes

Need to discuss PEP 302

  • State “DONE” from “TODO” [2018-03-31 Sat 22:45]

Namespace modules are relevant - PEP 420

Python 3 importer works with ModuleSpec - PEP 451

Mention __pycache__ - PEP 3147

Express explicit preference for Python 3

PEP 302

Python new import hooks, as specified in PEP 302, allows Hy to integrate with Python seamlessly, with Hy code able to import Python code and Python code able to import Hy code.

The new import hooks allows Hy to hook into the Python import hooks and customize them. When a Hy module is requested, Hy has a chance to look for its own modules, evaluate them, and load them into Python.

Module Loading

The implementation of the Python import system differ greatly between Python 3 and Python 2, and thus impacts Hy’s ability to integrate into the wider Python ecosystem.

Python 3’s import system is backed importlib and most of its import mechanism is pure Python code.

The Python 2 import system, in contrast, is backed by the deprecated and lower level imp library and a mix of exposed Python code and built-in functionality.

Import Protocol

https://docs.python.org/3/reference/import.html#finders-and-loaders

If the named module is not found in sys.modules, then Python’s import protocol is invoked to find and load the module. This protocol consists of two conceptual objects, finders and loaders. A finder’s job is to determine whether it can find the named module using whatever strategy it knows about. Objects that implement both of these interfaces are referred to as importers - they return themselves when they find that they can load the requested module.

How Importing Works

Python always first tries to find a matching entry inside sys.modules, and if one is found, returns that.

Should the module not be found, then there are slight differences between the two platforms.

Python 3

On Python 3, the process starts by looking at sys.meta_path. This variable contains an array of module loaders. Python tries them, one at a time, until a loader returns successfully loads a module. Typically contains three default entries:

  1. The BuiltinImporter which handles finding builtin modules, such as builtins or sys.
  2. The FrozenImporter which handles finding frozen modules.
  3. And the PathFinder which handles finding Python modules in the filesystem.

There may be a few more entries, depending on the setup, as packages like six and pkg_resources hook into this mechanism to implement some of their functionality. For example, six hooks into the import system to make Python 3 import paths work on Python 2.

Hy injects a Hy module finder into the sys.meta_path array right before the builtin PathFinder. The reasons for why Hy code must be attempted first is elaborated on below.

Finders do not actually load modules. If they can find the named module, they return a module spec, an encapsulation of the module’s import-related information, which the import machinery then uses when loading the module.

The module finder then is split into three components:

  1. The HyPathFinder
  2. The HyFileFinder
  3. And the HyLoader

The three parts provide a ModuleSpec which Python is able to then convert into a module.

mention import protocol

cover path_hooks and that Hy’s importer has its own on py3

cover find_spec

Python 2

Python 2 also goes through the sys.meta_path list as the first step of its import process, this time calling find_module.

By default on Python 2, this list is empty, but Hy will hook itself into the interpreter by injecting an entry into this list.

mention sys.path_hook

mention explicitly recursive imports in Python 2

Known Issues

  • The Hy metaloader must be specified first, and can’t be fallen back on
  • Valid Hy modules could be confused as valid Python namespace modules. This means that a Hy module could correctly import, but not contain any attributes.
  • We can’t support namespace modules because otherwise valid Python modules start looking like Hy namespace modules (reverse of the problem above).

Mixing Hy and Python

While its generally not a good idea to mix Hy and Python code in the same package, as it can lead to confusing behaviours, it technically works under Python 3, but not under Python 2.

Why meta_path instead of path_hook

Thread safety

@asmodehn
Copy link

  • Valid Hy modules could be confused as valid Python namespace modules. This means that a Hy module could correctly import, but not contain any attributes.

From how I understand the reasoning behind namespace packages, they should not : having a __init__.hy in a folder should be enough for the package to NOT be a namespace. => HyPathFinder.find_spec should take care of that. See https://github.com/asmodehn/filefinder2/blob/master/filefinder2/_filefinder2.py#L142 for background on this.
Goal here is that someone needing a PathFinder just inherits from filefinder2.machinery.PathFinder and override what needs to be overridden.
An example of an override (making a proper package based on content, not on __init__.py file presence ): https://github.com/pyros-dev/rosimport/blob/master/rosimport/_ros_directory_finder.py#L42

  • We can’t support namespace modules because otherwise valid Python modules start looking like Hy namespace modules (reverse of the problem above).

The namespace logic implemented is pretty tricky, but this is suitable to have (to follow python, and not introduce hard to spot differences), and (intuitively) should be possible.

@vodik
Copy link
Author

vodik commented May 31, 2018

I wrote an in depth response once the Hy issue, in case you didn't notice it.

The problem is that a:

  • The PathFinder is not what's deciding to create a namespace module - its the FileFinder (I linked to the right line of code in the Hy PR - I forgot the root of the problem in the first response. Forgive me, its been 3 months since I dug into it)
  • The FileFinder's decision to return a namespace modules short circuits the FileFinder chaining (as specified in PEP 302), incorrectly preventing further loaders from having a chance to give it a shot.
import contextlib
import os
import sys
from importlib.machinery import FileFinder, SourceFileLoader

# So, this is counter intutive, but we're going to add a new *loader*,
# that we want to go *first*, before .py.
#
# Its easier to highlight the problem this way
sys.path_hooks.insert(0, FileFinder.path_hook((SourceFileLoader, [".badpy"])))

with contextlib.suppress(FileExistsError):
    os.mkdir("good")

with contextlib.suppress(FileExistsError):
    os.mkdir("bad")

with contextlib.suppress(FileExistsError):
    os.mkdir("contrast")

# We'll always be able to directly find modules if we don't do
# packages: good/demo.py -> good.demo works. We fall though our loader
# and the standard ones afterwards do their thing
#
# This works because the full filename matches, so __init__ stuff is bypassed.
# That's the fallback behaviour when an exact match isn't found
with open("good/demo.py", "w") as module:
    module.write("""print("Hello World")""")
    import good.demo
    print(good.demo)  # <module 'good.demo' from '/tmp/good/demo.badpy'>

# But, we can no longer import packages!
#
# This loads as a namespace module now. Why? Because the first loader
# (.badpy one) will now will look for bad/__init__.badpy, doesn't find it, but
# decides that since we're insider a folder, we a namespace
#
# No more chaining of loaders anymore, we shutdown too  early - even
# though the next one afterwards would be more than happy to do its thing.
with open("bad/__init__.py", "w") as module:
    module.write("""print("Hello World")""")
    import bad
    print(bad)  # HA! <module 'bad' (namespace)>

# But we can still load modules if we use `__init__.badpy`.
with open("contrast/__init__.badpy", "w") as module:
    module.write("""print("Hello World")""")
    import contrast
    print(contrast)  # <module 'contrast' from '/tmp/contrast/__init__.badpy'>

@vodik
Copy link
Author

vodik commented May 31, 2018

I'm going to try to open a CPython issue on Friday with a patch.

Am I missing anything?

@asmodehn
Copy link

asmodehn commented Jun 2, 2018

I modified a tiny bit the code to run it :

import contextlib
import os
import sys
import shutil
import importlib
from importlib.machinery import FileFinder, SourceFileLoader

print(sys.version_info)
print(importlib.__file__)

# So, this is counter intutive, but we're going to add a new *loader*,
# that we want to go *first*, before .py.
#
# Its easier to highlight the problem this way
sys.path_hooks.insert(0, FileFinder.path_hook((SourceFileLoader, [".badpy"])))

# resetting eternal state to allow reruns
shutil.rmtree('good', ignore_errors=True)
os.mkdir("good")

shutil.rmtree('bad', ignore_errors=True)
os.mkdir("bad")

shutil.rmtree('contrast', ignore_errors=True)
os.mkdir("contrast")

# We'll always be able to directly find modules if we don't do
# packages: good/demo.py -> good.demo works. We fall though our loader
# and the standard ones afterwards do their thing
#
# This works because the full filename matches, so __init__ stuff is bypassed.
# That's the fallback behaviour when an exact match isn't found
with open("good/demo.badpy", "w") as module:
    module.write("""print("Hello World")""")
    import good.demo
    print(good.demo)  # <module 'good.demo' from '/tmp/good/demo.badpy'>

# But, we can no longer import packages!
#
# This loads as a namespace module now. Why? Because the first loader
# (.badpy one) will now will look for bad/__init__.badpy, doesn't find it, but
# decides that since we're insider a folder, we a namespace
#
# No more chaining of loaders anymore, we shutdown too  early - even
# though the next one afterwards would be more than happy to do its thing.
with open("bad/__init__.py", "w") as module:
    module.write("""print("Hello World")""")
    import bad
    print(bad)  # HA! <module 'bad' (namespace)>

# But we can still load modules if we use `__init__.badpy`.
with open("contrast/__init__.badpy", "w") as module:
    module.write("""print("Hello World")""")
    import contrast
    print(contrast)  # <module 'contrast' from '/tmp/contrast/__init__.badpy'>

gives me :
Python3 (original importlib)

sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
/usr/lib/python3.5/importlib/__init__.py
<module 'good.demo' from '/home/alexv/Projects/hy/good/demo.badpy'>
<module 'bad' from '/home/alexv/Projects/hy/bad/__init__.py'>
<module 'contrast' (namespace)>

I guess that is what you would expect should happen ?
So I seem to have a different behaviour than you describe (at least on my python 3.5).

Here are few remarks :

  • I would think that putting the CustomFileFinder first is not good (zipimporter should likely be first)
  • I would think that using the SourceFileLoader unmodified is not what's expected from the Python devs side (although I agree it would be what I would first try as well). Best way to know is probably to ask the Import SIG but it might be dead... I actually didn't try yet to contact anyone there yet.
# This loads as a namespace module now. Why? Because the first loader
# (.badpy one) will now will look for bad/__init__.badpy, doesn't find it, but
# decides that since we're insider a folder, we a namespace

This likely comes from reusing SourceFileLoader without modification. overriding find_spec should be enough to get rid of that problem. Not sure if that is intended or not however. I remember having that problem before (maybe when I was taking python3 importlib and putting it in python2, which won't work, since the API used is different, with different behaviour expectations - think PEP451 requires python 3.4+ ), but I didn't experience that problem on my python 3.5 on that code example, using python's importlib.

I would advise you to really dig into filefinder2. There are quite a few tests to validate behaviour on different python versions.

As a side note, if I run that code on python2 with filefinder2, with a simple :

import filefinder2
from filefinder2.machinery import FileFinder, SourceFileLoader

filefinder2.Py3Importer().__enter__()

print(sys.version_info)
print(filefinder2.__file__)

at the beginning, I get :

(import_check2) alexv@alexv-XPS-Tablet:/tmp/import_check$ python vodik_import_check.py 
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
/home/alexv/.virtualenvs/import_check2/local/lib/python2.7/site-packages/filefinder2/__init__.pyc
<module 'good.demo' from '/tmp/import_check/good/demo.badpy'>
<module 'bad' (built-in)>
<module 'contrast' from '/tmp/import_check/contrast/__init__.badpy'>

which is more or less the result you described I believe.

Here is a quickly written class to fix it :

class badpyFileFinder(FileFinder):

    def __init__(self, path, *loader_details):
        super(badpyFileFinder, self).__init__(path, *loader_details)

    def __repr__(self):
        return 'badpyFileFinder({!r})'.format(self.path)

    @classmethod
    def path_hook(cls, *loader_details):
        """A class method which returns a closure to use on sys.path_hook
        which will return an instance using the specified loaders and the path
        called on the closure.

        If the path called on the closure is not a directory, or doesnt contain
         any files with the supported extension, ImportError is raised.

         This is different from default python behavior
         but prevent polluting the cache with custom finders
        """
        def path_hook_for_badpyFileFinder(path):
            """Path hook for importlib.machinery.FileFinder."""

            if not (os.path.isdir(path)):
                raise ImportError('only directories are supported')

            exts = [x for ld in loader_details for x in ld[1]]
            if not any(fname.endswith(ext) for fname in os.listdir(path) for ext in exts):
                raise ImportError(
                    'only directories containing {ext} files are supported'.format(ext=", ".join(exts)))
            return cls(path, *loader_details)
        return path_hook_for_badpyFileFinder

    def find_spec(self, fullname, target=None):
        """
        Try to find a spec for the specified module.
        :param fullname: the name of the package we are trying to import
        :return: the matching spec, or None if not found.
        """

        # We attempt to load a .badpy file as a module
        tail_module = fullname.rpartition('.')[2]
        base_path = os.path.join(self.path, tail_module)
        for suffix, loader_class in self._loaders:
            full_path = base_path + suffix
            if os.path.isfile(full_path):  # maybe we need more checks here (importlib filefinder checks its cache...)
                return self._get_spec(loader_class, fullname, full_path, None, target)

        # Otherwise, we try find python modules
        return super(badpyFileFinder, self).find_spec(fullname=fullname, target=target)

# So, this is counter intutive, but we're going to add a new *loader*,
# that we want to go *first*, before .py.
#
# Its easier to highlight the problem this way
sys.path_hooks.insert(0, badpyFileFinder.path_hook((SourceFileLoader, [".badpy"])))

[...]

which should give you the python3 behaviour (minus namespace package, that becomes a built-in... there might still be some tuning to do there...) :

(import_check2) alexv@alexv-XPS-Tablet:/tmp/import_check$ python vodik_import_check.py 
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
/home/alexv/.virtualenvs/import_check2/local/lib/python2.7/site-packages/filefinder2/__init__.pyc
<module 'good.demo' from '/tmp/import_check/good/demo.badpy'>
<module 'bad' from '/tmp/import_check/bad/__init__.py'>
<module 'contrast' (built-in)> 

Adding this specialized badbyFileFinder should also be enough to fix your use case, since filefinder2 is mostly a copy of importib2, which is a mostly a copy python3 importlib code anyway.

So looking at all this, it seems that we could even make filefinder2 implementation better, by modifying a little the filefinder2.FileFinder class, so that the specialized class is not needed... We would need to be very careful to not break anything else however (hence the tests that are already there - any change comes with a lot of surprises...).

Thinking about this, I believe I do not test using the FileFinder/PathFinder class as an interface yet, I only test that various import calls behave in consistent ways across all python's versions, where possible... PR very welcome ;-) it would be very good to have, at least as an example of what should work and what shouldn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment