msarahan/Conda-build rendering tour.ipynb

## Conda-build rendering tour.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build import api\n",
    "from pprint import pprint\n",
    "def pprint_meta(m):\n",
    "    pprint(m.meta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The entry point: api.render\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/api.py#L30\n",
    "\n",
    "Note: every CLI command ends up in api module calls.  You can skip the CLI and just start with the api module when you're looking to debug stuff.  Look for tests that use `api.*` functions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First stop in api.render: consolidating configuration\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/api.py#L40\n",
    "\n",
    "This is how default arguments get set, and how arbitrary parameters can get set.  It is how you can programmatically pass a variant.\n",
    "\n",
    "Variant here is singular: it is a dictionary where each key has only a single value, not a collection of values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build.config import get_or_merge_config\n",
    "config = get_or_merge_config(None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config.build_folder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next step: render top-level recipe into metadata objects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build.render import render_recipe\n",
    "\n",
    "metadata_tuples = render_recipe('.', config=config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(metadata_tuples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When would you have more than one metadata_tuple here?\n",
    "\n",
    "If your variants apply to the elements of the top-level recipe (instead of only the outputs), then you'll see multiple tuples here.\n",
    "\n",
    "If a variant matrix applies to both top-level and outputs, it is applied/distributed at the top level, which means that the output loop does not re-distribute this variant matrix.\n",
    "\n",
    "In our example, our python variant applies only at the output level, so we have one top-level metadata object, and we'll see the looping over variants in the output handling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "meta = metadata_tuples[0][0]\n",
    "download = metadata_tuples[0][1]\n",
    "render_in_env = metadata_tuples[0][2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(meta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pprint_meta(meta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if downloading the source was necessary for rendering,\n",
    "#    it will already be downloaded, and this will be False,\n",
    "#    indicating that future processes don't need to re-download it.\n",
    "download"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is vestigial.  It was True if the environment \n",
    "#    had tools in it that were needed to render the recipe.  \n",
    "#    Currently, you must have all tools necessary for rendering\n",
    "#    installed before rendering.\n",
    "render_in_env"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's dive into the render_recipe function that we've seen the output from.\n",
    "\n",
    "First, we find the file that we should build.  Mostly people just point conda-build at directories containing recipes, but there are a few other options: https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L760-L782\n",
    "\n",
    "Next, a MetaData object is instantiated with the path to the recipe: https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L785"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build.metadata import MetaData\n",
    "m = MetaData('.', config=config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# note undefined_jinja_vars key here: creating this MetaData object\n",
    "#   is not a complete rendering.  It is only parsing YAML for some\n",
    "#   initial values, such as the package name and source URLs.\n",
    "#   undefined_jinja_vars are rendered as blanks for this pass.\n",
    "m"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# now that we have the package name, we can\n",
    "#   create the build folder (packagename_timetamp)\n",
    "# This modifies the config object in-place.\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L792-L795\n",
    "m.config.compute_build_id(m.name())\n",
    "m.config.build_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# It's important to do this step before source is downloaded,\n",
    "#    so that source goes to the correct location\n",
    "m.config.build_folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if source is necessary for rendering, download it now\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L797-L802\n",
    "\n",
    "if m.needs_source_for_render and not m.source_provided:\n",
    "        try_download(m, no_download_source=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When is `needs_source_for_render` True?\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1588\n",
    "\n",
    "`self.uses_vcs_in_meta or self.uses_setup_py_in_meta or self.uses_regex_in_meta`\n",
    "\n",
    "This is your first taste of conda-build's parsing of text in recipes.  There's a lot of logic based on regex patterns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# \"Final\" - this means a metadata object is already fully rendered \n",
    "#     and should not undergo further treatment.  Part of being \"final\"\n",
    "#     is that conda has fully resolved all of the build/host dependencies.\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L803\n",
    "m.final\n",
    "\n",
    "# metadata might be finalized if a metadata object is passed into render (as opposed to a recipe path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# our metadata is not finalized yet, so we obtain variants and distribute them\n",
    "from conda_build.variants import get_package_variants\n",
    "# variants here is plural - as an argument being passed in, it is a dictionary of lists\n",
    "#    The return here is the opposite: the matrix evaluated and returned as a \n",
    "#    list of dicts, where each dict has only a single string value for each key.\n",
    "variants = get_package_variants(m)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_package_variants(m, variants={'python': ['3.7']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build.render import distribute_variants\n",
    "# \"distributing\" them means creating a new MetaData object with a specific variant combination\n",
    "# and re-rendering with these values.\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L815-L819\n",
    "rendered_metadata = distribute_variants(m, variants)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see our first instance of the need for recursion here.  We have outputs that reference other outputs, but we haven't parsed the outputs into metadata objects yet.  There are flags to tell conda-build when it's OK to ignore these, and when it's not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rendered_metadata = distribute_variants(m, variants, allow_no_other_outputs=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this is the (meta, download, render_in_env) tuple from earlier\n",
    "len(rendered_metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How does conda-build know which variants to loop over, and which to ignore or save for later?\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L662-L683\n",
    "\n",
    "* If noarch, ignore python\n",
    "* extract raw recipe text (this looks like dead code to me): https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L673-L679\n",
    "* Where recipe text is actually parsed and interpreted to find variable usage: https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L2120-L2158\n",
    "* Filter out variants that don't affect the top-level recipe: https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L683"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "top_used_vars = m.get_used_loop_vars()\n",
    "top_used_vars"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# here we filter out any extra variants that have no effect on the top-level recipe\n",
    "# notice that our two python versions have been cut to just the one variant\n",
    "m.get_reduced_variant_set(top_used_vars)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# however, the original variants are still stored\n",
    "m.config.input_variants"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For top-level packages that use a variant and a subpackage that also uses that variant, we don't want to loop over that variant set twice.  The outer, top loop will create all the extra stuff we need.  For those instances, we simplify the stored variants, so that distributing variants for the outputs doesn't do the extra loop: https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L694-L703"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download source code if it's necessary to render:\n",
    "#   https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L717-L721"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Finally, use the variant to render the templated top-level meta.yaml\n",
    "# variant here is a dict, where values can only be strings (not lists)\n",
    "m.config.variant"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`parse_until_resolved` is where the Jinja2 execution happens:\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L989-L1019\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L938-L942\n",
    "\n",
    "it is iterative, because the result of one template parameter or function may depend on the value of another.  It loops over parsing and stops when the result of parsing hasn't changed relative to the last parsing pass.\n",
    "\n",
    "The variables and functions that are available to use in the jinja2 template are populated (put in the \"context\"):\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/jinja_context.py#L507-L532\n",
    "\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1499-L1506\n",
    "\n",
    "It is also where global clobbering or appending take place (for example, we used to use clobbering to disable noarch for distro builds): https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L947-L962"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m.parse_until_resolved()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# oops, we still don't have our outputs.\n",
    "m.parse_until_resolved(allow_no_other_outputs=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pprint_meta(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we have our top-level metadata tuples.  Remember that at this stage, we've been ignoring our outputs.  Let's go back to api.render to see how they come in.\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/api.py#L47\n",
    "\n",
    "Each top-level metadata object needs to get its own outputs.  If they could be shared and built once, then they would be top-level metadata objects.  These outputs will depend on the state of the top-level recipe, and will then be further split out according to the remaining variants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here's a method you'll learn to hate:\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/api.py#L49-L52\n",
    "# Let's see why!\n",
    "# our bypass_env_check argument here is telling it\n",
    "#    not to use conda to find actual versions that\n",
    "#    will be installed, nor check satisfiability\n",
    "len(m.get_output_metadata_set(bypass_env_check=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# do nothing if we're already final\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1964-L1966"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# starting out with our top-level metadata, we\n",
    "# figure out what variables we need to loop over\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1970-L1971\n",
    "used_variables = m.get_used_loop_vars(force_global=True)\n",
    "top_loop = m.get_reduced_variant_set(used_variables) or m.config.variants[:1]\n",
    "top_loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this is the inner loop, where the variants that were not used\n",
    "#    at the top level are distributed across the outputs\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1973\n",
    "# We copy the input top-level output, because we want to re-examine it\n",
    "#    with different variant values:\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1974-L1975\n",
    "#\n",
    "# With our new, re-parsed top-level recipe, we\n",
    "#   parse the output information into dictionaries\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1985\n",
    "from conda_build.metadata import get_output_dicts_from_metadata\n",
    "outputs = get_output_dicts_from_metadata(m)\n",
    "outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pick a specific output (let's choose the one that has python)\n",
    "output = outputs[2]\n",
    "output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For each output, augment requirements with version constraints from variant\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1989-L1994\n",
    "from conda_build import utils\n",
    "variant = m.config.variant\n",
    "requirements = output.get('requirements')\n",
    "if requirements:\n",
    "    requirements = utils.expand_reqs(requirements)\n",
    "    for env in ('build', 'host', 'run'):\n",
    "        utils.insert_variant_versions(requirements, variant, env)\n",
    "    output['requirements'] = requirements\n",
    "output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# turn that output dictionary into a MetaData object\n",
    "# https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1995\n",
    "out_m = m.get_output_metadata(output)\n",
    "print(out_m.name())\n",
    "out_m.meta['requirements']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The transformation of that output dictionary  to the MetaData object is where things can  get very confusing.  As a mapping of dictionary keys to object attributes, it isn't hard, but we tried to make the   transition to multiple outputs as easy as  possible by interpreting the top-level recipe as an implicit output.  If an output shares a name with the top-level recipe, then it tries to merge things.  This has been a source of many bugs and bad behavior.\n",
    "\n",
    "Cleaning up this particular part of conda-build\n",
    "  would be a good improvement.  Making the\n",
    "  separation between a top-level recipe model\n",
    "  and output models would clean up this code\n",
    "  but would likely break the thousands of recipes\n",
    "  that don't specify outputs.\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1843-L1927"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each metadata object stores a reference to itself and any other output that has been transformed into metadata so far:\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L1997-L2006\n",
    "\n",
    "This is how we can refer to other subpackages in our Jinja2 functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_output_metadata = {}\n",
    "out_metadata_map = {}\n",
    "for out in outputs:\n",
    "    requirements = out.get('requirements')\n",
    "    if requirements:\n",
    "        requirements = utils.expand_reqs(requirements)\n",
    "        for env in ('build', 'host', 'run'):\n",
    "            utils.insert_variant_versions(requirements, variant, env)\n",
    "        out['requirements'] = requirements\n",
    "    out_metadata = m.get_output_metadata(out)\n",
    "\n",
    "    # keeping track of other outputs is necessary for correct functioning of the\n",
    "    #    pin_subpackage jinja2 function.  It's important that we store all of\n",
    "    #    our outputs so that they can be referred to in later rendering.  We\n",
    "    #    also refine this collection as each output metadata object is\n",
    "    #    finalized - see the finalize_outputs_pass function\n",
    "    all_output_metadata[(out_metadata.name(),\n",
    "                         utils.HashableDict({k: out_metadata.config.variant[k]\n",
    "                for k in out_metadata.get_used_vars()}))] = out, out_metadata\n",
    "    out_metadata_map[utils.HashableDict(out)] = out_metadata\n",
    "    m.other_outputs = all_output_metadata\n",
    "len(m.other_outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check the original sort order\n",
    "[_['name'] for _ in out_metadata_map]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "once all the output dictionaries have been used\n",
    "  to create objects, we sort the metadata by dependency\n",
    "  order.  This is so that we can finalize metadata\n",
    "  in the correct order, for when one subpackage\n",
    "  depends on another.  The dependency subpackage\n",
    "  must be finalized before the subpackage that\n",
    "  depends on it.\n",
    "  \n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L2016"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build.metadata import toposort, check_circular_dependencies\n",
    "render_order = toposort(out_metadata_map)\n",
    "check_circular_dependencies(render_order)\n",
    "[_['name'] for _ in render_order]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There may be some variants that are duplicated.\n",
    "These are filtered by collecting the output \n",
    "dictionaries and metadata under keys that use\n",
    "their \"used variables\" along with those values.\n",
    "This filters outputs to the subset containing\n",
    "only variants that have affected these outputs.\n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/metadata.py#L2023-L2024"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conda_packages = {}\n",
    "for output_d, m in render_order.items():\n",
    "    if not output_d.get('type') or output_d['type'] in ('conda', 'conda_v2'):\n",
    "        conda_packages[m.name(), utils.HashableDict({k: m.config.variant[k]\n",
    "                                      for k in m.get_used_vars()})] = (output_d, m)\n",
    "    elif output_d.get('type') == 'wheel':\n",
    "        if (not output_d.get('requirements', {}).get('build') or\n",
    "                not any('pip' in req for req in output_d['requirements']['build'])):\n",
    "            build_reqs = output_d.get('requirements', {}).get('build', [])\n",
    "            build_reqs.extend(['pip', 'python {}'.format(m.config.variant['python'])])\n",
    "            output_d['requirements'] = output_d.get('requirements', {})\n",
    "            output_d['requirements']['build'] = build_reqs\n",
    "            m.meta['requirements'] = m.meta.get('requirements', {})\n",
    "            m.meta['requirements']['build'] = build_reqs\n",
    "        non_conda_packages.append((output_d, m))\n",
    "    else:\n",
    "        # for wheels and other non-conda packages, just append them at the end.\n",
    "        #    no deduplication with hashes currently.\n",
    "        # hard part about including any part of output_d\n",
    "        #    outside of this func is that it is harder to\n",
    "        #    obtain an exact match elsewhere\n",
    "        non_conda_packages.append((output_d, m))\n",
    "        \n",
    "# no deduplication really happens in this example\n",
    "len(conda_packages)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "finalizing metadata resolved expressions like\n",
    "   {{ pin_subpackage() }} and also uses conda\n",
    "   to obtain the actual versions of requested\n",
    "   dependencies that would be used in the build\n",
    "   and host sections.\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/render.py#L476\n",
    "\n",
    "This is the part of conda-build that will barf if\n",
    "   any dependencies are not installable.\n",
    "It will also fall over in sometimes helpful\n",
    "   ways when there is a circular dependency\n",
    "   among subpackages.  This logic is pretty\n",
    "   fragile, so don't lean too heavily on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from conda_build.metadata import finalize_outputs_pass\n",
    "final_conda_pkgs = finalize_outputs_pass(m, conda_packages, pass_no=0)\n",
    "final_conda_pkgs = [(out_d, m) for out_d, m in final_conda_pkgs.values()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There's a bit more cleanup and deduplication that happens back in our outer loop over the top-level variants: \n",
    "\n",
    "https://github.com/conda/conda-build/blob/b736890b98a4761c38d3bfafe155923613a283d8/conda_build/api.py#L54-L76\n",
    "\n",
    "Ultimately, we return the same list of tuples of (meta, download, render_in_env) that we saw returned from `render_recipe` with the top-level metadata, except that these meta objects have come from output dictionaries, and are finalized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_metas = {}\n",
    "for od, om in final_conda_pkgs:\n",
    "    if not om.skip() or not config.trim_skip:\n",
    "        if 'type' not in od or od['type'] == 'conda':\n",
    "            if not om.final:\n",
    "                try:\n",
    "                    om = finalize_metadata(om,\n",
    "                            permit_unsatisfiable_variants=permit_unsatisfiable_variants)\n",
    "                except (DependencyNeedsBuildingError, NoPackagesFoundError):\n",
    "                    if not permit_unsatisfiable_variants:\n",
    "                        raise\n",
    "\n",
    "            # remove outputs section from output objects for simplicity\n",
    "            if not om.path and om.meta.get('outputs'):\n",
    "                om.parent_outputs = om.meta['outputs']\n",
    "                del om.meta['outputs']\n",
    "\n",
    "            output_metas[om.dist(), om.config.variant.get('target_platform'),\n",
    "                        tuple((var, om.config.variant[var])\n",
    "                            for var in om.get_used_vars())] = \\\n",
    "                ((om, download, render_in_env))\n",
    "        else:\n",
    "            output_metas[\"{}: {}\".format(om.type, om.name()), om.config.variant.get('target_platform'),\n",
    "                        tuple((var, om.config.variant[var])\n",
    "                            for var in om.get_used_vars())] = \\\n",
    "                ((om, download, render_in_env))\n",
    "\n",
    "tuples = list(output_metas.values())\n",
    "for t in tuples:\n",
    "    pprint_meta(t[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: this is not quite right.  The python version constraint is missing on the py-xgboost MetaData.  I'm pretty sure that's because of the weird way that I've extracted all of this notebook code out of its proper context.  Still, that is exactly indicative of the kinds of bugs to watch out for: missing or incorrect dependency data.  It is almost always caused by either too much caching, or by not doing a deep enough copy and getting cross-talk between objects.\n",
    "\n",
    "We can check that by just going back out to the api.render call and looking at its results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "api.render('.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "{'name': 'py-xgboost', 'requirements': {'host': ['libxgboost 0.80 1', 'python'], 'run': ['libxgboost 0.80 1', '_py-xgboost-mutex 2.0 cpu_0', 'python']}}\n",
    "```\n",
    "\n",
    "Nope, that still is missing the python pins in the host and run envs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if we use the CLI, the versions are correct.\n",
    "!conda-render ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#This points to some finalization not being done by default in api.render.\n",
    "api.render('.', finalize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We don't finalize by default because it takes quite a bit of time and we just want to get an answer fast from the API, while users can request finalization if they need it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

## conda_build_config.yaml
python:
  - 2.7
  - 3.6
r_version:
  - 3.5.0
r_implementation:
  - 'r-base'
  - 'mro-base'  # [not osx]

## meta.yaml
{% set name = "xgboost" %}
{% set version = "0.80" %}

package:
  name: {{ name|lower }}
  version: {{ version }}

build:
  number: 1
  skip: true  # [win or linux32]

requirements:
  build:
    - make

outputs:
  - name: libxgboost

  - name: _py-xgboost-mutex
    version: 2.0
    build:
      string: cpu_0

  - name: py-xgboost
    requirements:
      host:
        - {{ pin_subpackage('libxgboost', exact=True) }}
        - python
      run:
        - {{ pin_subpackage('libxgboost', exact=True) }}
        - {{ pin_subpackage('_py-xgboost-mutex', exact=True) }}
        - python

  - name: py-xgboost-cpu
    requirements:
      run:
        - {{ pin_subpackage('py-xgboost', exact=True) }}

  - name: xgboost
    requirements:
      run:
        - {{ pin_subpackage('py-xgboost', exact=True) }}

  - name: fake_r_{{ r_implementation }}
    version: {{ r_version }}

  - name: _r-xgboost-mutex
    version: 2.0
    build:
      string: cpu_0

  - name: r-xgboost
    build:
      rpaths:
        - lib/R/lib
    requirements:
      host:
        - {{ pin_subpackage('libxgboost', exact=True) }}
        - fake_r_{{ r_implementation }}
      run:
        - {{ pin_subpackage('libxgboost', exact=True) }}
        - {{ pin_subpackage('_r-xgboost-mutex', exact=True) }}
        - fake_r_{{ r_implementation }}

  - name: r-xgboost-cpu
    requirements:
      host:
        - fake_r_{{ r_implementation }}
      run:
        - fake_r_{{ r_implementation }}
        - {{ pin_subpackage('r-xgboost', exact=True) }}

about:
  home: https://github.com/dmlc/xgboost
  license: Apache-2.0
  summary: |
    Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for
    Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink
    and DataFlow
  description: |
    XGBoost is an optimized distributed gradient boosting library designed to be highly efficient,
    flexible and portable. It implements machine learning algorithms under the Gradient Boosting
    framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many
    data science problems in a fast and accurate way. The same code runs on major distributed
    environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
  doc_url: https://xgboost.readthedocs.io/
  dev_url: https://github.com/dmlc/xgboost/

extra:
  recipe-maintainers:
    - beckermr
    - aldanor
	python:
	- 2.7
	- 3.6
	r_version:
	- 3.5.0
	r_implementation:
	- 'r-base'
	- 'mro-base' # [not osx]
	{% set name = "xgboost" %}
	{% set version = "0.80" %}

	package:
	name: {{ name\|lower }}
	version: {{ version }}

	build:
	number: 1
	skip: true # [win or linux32]

	requirements:
	build:
	- make

	outputs:
	- name: libxgboost

	- name: _py-xgboost-mutex
	version: 2.0
	build:
	string: cpu_0

	- name: py-xgboost
	requirements:
	host:
	- {{ pin_subpackage('libxgboost', exact=True) }}
	- python
	run:
	- {{ pin_subpackage('libxgboost', exact=True) }}
	- {{ pin_subpackage('_py-xgboost-mutex', exact=True) }}
	- python

	- name: py-xgboost-cpu
	requirements:
	run:
	- {{ pin_subpackage('py-xgboost', exact=True) }}

	- name: xgboost
	requirements:
	run:
	- {{ pin_subpackage('py-xgboost', exact=True) }}

	- name: fake_r_{{ r_implementation }}
	version: {{ r_version }}

	- name: _r-xgboost-mutex
	version: 2.0
	build:
	string: cpu_0

	- name: r-xgboost
	build:
	rpaths:
	- lib/R/lib
	requirements:
	host:
	- {{ pin_subpackage('libxgboost', exact=True) }}
	- fake_r_{{ r_implementation }}
	run:
	- {{ pin_subpackage('libxgboost', exact=True) }}
	- {{ pin_subpackage('_r-xgboost-mutex', exact=True) }}
	- fake_r_{{ r_implementation }}

	- name: r-xgboost-cpu
	requirements:
	host:
	- fake_r_{{ r_implementation }}
	run:
	- fake_r_{{ r_implementation }}
	- {{ pin_subpackage('r-xgboost', exact=True) }}

	about:
	home: https://github.com/dmlc/xgboost
	license: Apache-2.0
	summary: \|
	Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for
	Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink
	and DataFlow
	description: \|
	XGBoost is an optimized distributed gradient boosting library designed to be highly efficient,
	flexible and portable. It implements machine learning algorithms under the Gradient Boosting
	framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many
	data science problems in a fast and accurate way. The same code runs on major distributed
	environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
	doc_url: https://xgboost.readthedocs.io/
	dev_url: https://github.com/dmlc/xgboost/

	extra:
	recipe-maintainers:
	- beckermr
	- aldanor