Last active June 2, 2018 18:42
Hy Importer

Hy Loader

This is incomplete; please consider contributing to the documentation effort.

In order for Hy and Python to work together as nicely as it does, Hy code needs to be able to import Python code, and vice versa.

This is done through Python’s import hooks. However, since the implementation and feature set available differs between the various versions of Python, its worth noting how the system works and the various limitations and quirks, so not to fall into certain pitfalls.


PEP 302

Python new import hooks, as specified in PEP 302, allows Hy to integrate with Python seamlessly, with Hy code able to import Python code and Python code able to import Hy code.

The new import hooks allows Hy to hook into the Python import hooks and customize them. When a Hy module is requested, Hy has a chance to look for its own modules, evaluate them, and load them into Python.

Module Loading

The implementation of the Python import system differ greatly between Python 3 and Python 2, and thus impacts Hy’s ability to integrate into the wider Python ecosystem.

Python 3’s import system is backed importlib and most of its import mechanism is pure Python code.

The Python 2 import system, in contrast, is backed by the deprecated and lower level imp library and a mix of exposed Python code and built-in functionality.

Import Protocol

If the named module is not found in sys.modules, then Python’s import protocol is invoked to find and load the module. This protocol consists of two conceptual objects, finders and loaders. A finder’s job is to determine whether it can find the named module using whatever strategy it knows about. Objects that implement both of these interfaces are referred to as importers - they return themselves when they find that they can load the requested module.

How Importing Works

Python always first tries to find a matching entry inside sys.modules, and if one is found, returns that.

Should the module not be found, then there are slight differences between the two platforms.

Python 3

On Python 3, the process starts by looking at sys.meta_path. This variable contains an array of module loaders. Python tries them, one at a time, until a loader returns successfully loads a module. Typically contains three default entries:

  1. The BuiltinImporter which handles finding builtin modules, such as builtins or sys.
  2. The FrozenImporter which handles finding frozen modules.
  3. And the PathFinder which handles finding Python modules in the filesystem.

There may be a few more entries, depending on the setup, as packages like six and pkg_resources hook into this mechanism to implement some of their functionality. For example, six hooks into the import system to make Python 3 import paths work on Python 2.

Hy injects a Hy module finder into the sys.meta_path array right before the builtin PathFinder. The reasons for why Hy code must be attempted first is elaborated on below.

Finders do not actually load modules. If they can find the named module, they return a module spec, an encapsulation of the module’s import-related information, which the import machinery then uses when loading the module.

The module finder then is split into three components:

  1. The HyPathFinder
  2. The HyFileFinder
  3. And the HyLoader

The three parts provide a ModuleSpec which Python is able to then convert into a module.

Python 2

Python 2 also goes through the sys.meta_path list as the first step of its import process, this time calling find_module.

By default on Python 2, this list is empty, but Hy will hook itself into the interpreter by injecting an entry into this list.

Known Issues

  • The Hy metaloader must be specified first, and can’t be fallen back on
  • Valid Hy modules could be confused as valid Python namespace modules. This means that a Hy module could correctly import, but not contain any attributes.
  • We can’t support namespace modules because otherwise valid Python modules start looking like Hy namespace modules (reverse of the problem above).

Mixing Hy and Python

While its generally not a good idea to mix Hy and Python code in the same package, as it can lead to confusing behaviours, it technically works under Python 3, but not under Python 2.

Why meta_path instead of path_hook

Thread safety

asmodehn commented Jun 2, 2018

I modified a tiny bit the code to run it :

import contextlib
import os
import sys
import shutil
import importlib
from importlib.machinery import FileFinder, SourceFileLoader


# So, this is counter intutive, but we're going to add a new *loader*,
# that we want to go *first*, before .py.
# Its easier to highlight the problem this way
sys.path_hooks.insert(0, FileFinder.path_hook((SourceFileLoader, [".badpy"])))

# resetting eternal state to allow reruns
shutil.rmtree('good', ignore_errors=True)

shutil.rmtree('bad', ignore_errors=True)

shutil.rmtree('contrast', ignore_errors=True)

# We'll always be able to directly find modules if we don't do
# packages: good/ -> good.demo works. We fall though our loader
# and the standard ones afterwards do their thing
# This works because the full filename matches, so __init__ stuff is bypassed.
# That's the fallback behaviour when an exact match isn't found
with open("good/demo.badpy", "w") as module:
    module.write("""print("Hello World")""")
    import good.demo
    print(good.demo)  # <module 'good.demo' from '/tmp/good/demo.badpy'>

# But, we can no longer import packages!
# This loads as a namespace module now. Why? Because the first loader
# (.badpy one) will now will look for bad/__init__.badpy, doesn't find it, but
# decides that since we're insider a folder, we a namespace
# No more chaining of loaders anymore, we shutdown too  early - even
# though the next one afterwards would be more than happy to do its thing.
with open("bad/", "w") as module:
    module.write("""print("Hello World")""")
    import bad
    print(bad)  # HA! <module 'bad' (namespace)>

# But we can still load modules if we use `__init__.badpy`.
with open("contrast/__init__.badpy", "w") as module:
    module.write("""print("Hello World")""")
    import contrast
    print(contrast)  # <module 'contrast' from '/tmp/contrast/__init__.badpy'>

gives me :
Python3 (original importlib)

sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
<module 'good.demo' from '/home/alexv/Projects/hy/good/demo.badpy'>
<module 'bad' from '/home/alexv/Projects/hy/bad/'>
<module 'contrast' (namespace)>

I guess that is what you would expect should happen ?
So I seem to have a different behaviour than you describe (at least on my python 3.5).

Here are few remarks :

  • I would think that putting the CustomFileFinder first is not good (zipimporter should likely be first)
  • I would think that using the SourceFileLoader unmodified is not what's expected from the Python devs side (although I agree it would be what I would first try as well). Best way to know is probably to ask the Import SIG but it might be dead... I actually didn't try yet to contact anyone there yet.
# This loads as a namespace module now. Why? Because the first loader
# (.badpy one) will now will look for bad/__init__.badpy, doesn't find it, but
# decides that since we're insider a folder, we a namespace

This likely comes from reusing SourceFileLoader without modification. overriding find_spec should be enough to get rid of that problem. Not sure if that is intended or not however. I remember having that problem before (maybe when I was taking python3 importlib and putting it in python2, which won't work, since the API used is different, with different behaviour expectations - think PEP451 requires python 3.4+ ), but I didn't experience that problem on my python 3.5 on that code example, using python's importlib.

I would advise you to really dig into filefinder2. There are quite a few tests to validate behaviour on different python versions.

As a side note, if I run that code on python2 with filefinder2, with a simple :

import filefinder2
from filefinder2.machinery import FileFinder, SourceFileLoader



at the beginning, I get :

(import_check2) alexv@alexv-XPS-Tablet:/tmp/import_check$ python 
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
<module 'good.demo' from '/tmp/import_check/good/demo.badpy'>
<module 'bad' (built-in)>
<module 'contrast' from '/tmp/import_check/contrast/__init__.badpy'>

which is more or less the result you described I believe.

Here is a quickly written class to fix it :

class badpyFileFinder(FileFinder):

    def __init__(self, path, *loader_details):
        super(badpyFileFinder, self).__init__(path, *loader_details)

    def __repr__(self):
        return 'badpyFileFinder({!r})'.format(self.path)

    def path_hook(cls, *loader_details):
        """A class method which returns a closure to use on sys.path_hook
        which will return an instance using the specified loaders and the path
        called on the closure.

        If the path called on the closure is not a directory, or doesnt contain
         any files with the supported extension, ImportError is raised.

         This is different from default python behavior
         but prevent polluting the cache with custom finders
        def path_hook_for_badpyFileFinder(path):
            """Path hook for importlib.machinery.FileFinder."""

            if not (os.path.isdir(path)):
                raise ImportError('only directories are supported')

            exts = [x for ld in loader_details for x in ld[1]]
            if not any(fname.endswith(ext) for fname in os.listdir(path) for ext in exts):
                raise ImportError(
                    'only directories containing {ext} files are supported'.format(ext=", ".join(exts)))
            return cls(path, *loader_details)
        return path_hook_for_badpyFileFinder

    def find_spec(self, fullname, target=None):
        Try to find a spec for the specified module.
        :param fullname: the name of the package we are trying to import
        :return: the matching spec, or None if not found.

        # We attempt to load a .badpy file as a module
        tail_module = fullname.rpartition('.')[2]
        base_path = os.path.join(self.path, tail_module)
        for suffix, loader_class in self._loaders:
            full_path = base_path + suffix
            if os.path.isfile(full_path):  # maybe we need more checks here (importlib filefinder checks its cache...)
                return self._get_spec(loader_class, fullname, full_path, None, target)

        # Otherwise, we try find python modules
        return super(badpyFileFinder, self).find_spec(fullname=fullname, target=target)

# So, this is counter intutive, but we're going to add a new *loader*,
# that we want to go *first*, before .py.
# Its easier to highlight the problem this way
sys.path_hooks.insert(0, badpyFileFinder.path_hook((SourceFileLoader, [".badpy"])))


which should give you the python3 behaviour (minus namespace package, that becomes a built-in... there might still be some tuning to do there...) :

(import_check2) alexv@alexv-XPS-Tablet:/tmp/import_check$ python 
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
<module 'good.demo' from '/tmp/import_check/good/demo.badpy'>
<module 'bad' from '/tmp/import_check/bad/'>
<module 'contrast' (built-in)> 

Adding this specialized badbyFileFinder should also be enough to fix your use case, since filefinder2 is mostly a copy of importib2, which is a mostly a copy python3 importlib code anyway.

So looking at all this, it seems that we could even make filefinder2 implementation better, by modifying a little the filefinder2.FileFinder class, so that the specialized class is not needed... We would need to be very careful to not break anything else however (hence the tests that are already there - any change comes with a lot of surprises...).

Thinking about this, I believe I do not test using the FileFinder/PathFinder class as an interface yet, I only test that various import calls behave in consistent ways across all python's versions, where possible... PR very welcome ;-) it would be very good to have, at least as an example of what should work and what shouldn't.

