Skip to content

Instantly share code, notes, and snippets.

@0xbe7a
Last active January 30, 2024 21:45
Show Gist options
  • Save 0xbe7a/bbf8a323409be466fe1ad77aa6dd5428 to your computer and use it in GitHub Desktop.
Save 0xbe7a/bbf8a323409be466fe1ad77aa6dd5428 to your computer and use it in GitHub Desktop.

Proposal: Introduction of Feature Sets

Objective

The aim is to introduce a feature set mechanism in the pixi package manager. This mechanism will enable clear, conflict-free management of dependencies tailored to specific environments, while also maintaining the integrity of fixed lockfiles.

Motivating Example: Test Dependencies and Multiple Python Versions

Consider a scenario where a project needs to be tested across multiple Python versions, each requiring a different set of dependencies. In this case, defining separate feature sets for each Python version (like py39, py310, etc.) allows for easy switching between environments without conflicts. Similarly, for development purposes, a test feature set can include dependencies necessary for testing and linting, which are not required in the production environment.

Design Considerations

  1. Non-Combinatorial: To ensure the dependency resolution process remains manageable, the solution should avoid a combinatorial explosion of dependency sets.
  2. Single Feature Activation: The design should allow only one feature set to be active at any given time, simplifying the resolution process and preventing conflicts.
  3. Fixed Lockfiles: It's crucial to preserve fixed lockfiles for consistency and predictability. Solutions must ensure reliability not just for authors but also for end-users, particularly at the time of lockfile creation.

Proposed Solution

Feature Set Definitions

Introduce feature sets in the pixi.toml configuration file, with each set comprising dependencies specific to a given environment or use case. For instance, a test feature set may include dependencies like pytest and pre-commit, essential for development but not for production.

pixi.toml Example:

[features]
test = ["pytest", "pre-commit"]
py39 = ["py39"]
py310 = ["py310"]

[dependencies]
requests = "*"
pytest = { version = ">= 1.2", optional = true }
pre-commit = { version = ">= 2", optional = true }
py39 = { package = "python", version = "3.9", optional = true}
py310 = { package = "python", version = "3.10", optional = true}

Lockfile Structure

Within the pixi.lock file, a package may now include an additional feature field, specifying the feature set to which it belongs. This structure ensures clarity and prevents unnecessary duplication of dependencies across different environments.

Feature Set Activation

Users can manually activate the desired feature set via command line or configuration. This approach guarantees a conflict-free environment by allowing only one feature set to be active at a time.

Command Configuration

Commands defined in pixi.toml can specify the supported feature sets. For example, a testing command can be linked to the test feature set, ensuring it runs with the correct dependencies.

Command Configuration Examples:

[tasks.test]
cmd = "pytest"
feature_set = ["test"]
[tasks.test]
cmd = "pytest"
feature_set = ["py39", "py310"]

Benefits

  • Simplicity: Clear and straightforward dependency management is achieved by making each feature set mutually exclusive.
  • Consistency: The solution upholds the principle of fixed lockfiles, ensuring stable and predictable dependency management across different project stages.

Drawbacks

In the proposed feature set mechanism, users can activate only one feature set at a time. This design decision simplifies the dependency resolution process and prevents conflicts. However, it can be limiting in certain scenarios:

Scenario with Orthogonal Feature Sets

  • Orthogonal feature sets are sets that could theoretically be combined because they don't interfere with each other. For example, one set might pertain to linting tools, while another might specify a Python version.
  • The limitation arises when a user wants to activate multiple such orthogonal sets simultaneously. For instance, they might want to run linting tools (lint feature set) under a specific Python version (py39 feature set).
  • In the current design, they would have to define a new feature set that combines both sets of dependencies. This becomes cumbersome if there are many such orthogonal sets, as it requires defining every possible combination.
@ruben-arts
Copy link

This is amazing, thanks for the elaborate explanation on how you would like this feature to work. This is going to be extremely helpful for us!
I really like the examples you gave, it was immediately clear by looking at the configuration what you wanted create.

Just a few questions, that I would like to know your idea on:

  • In the case of lint you might only need pre-commit. Would you define everything as a optional and use the features to arrange the dependency sets?
  • What should the lockfile do with dependencies that require different versions for a dependencies of a package that is changed by a feature. For example py27 and py39 need significantly different dependencies what should the lockfile do with that?
  • Do you see the features handle more than just dependencies? eg. tasks, activation.scripts, build-dependencies and all its target variants? Or would you want something different for that?
  • What would be needed for the test + py39 and test + py310 matrix like environments?

You don't have to have an answer but I'm just curious on your ideas as you have clearly already gave it a good thought!

@travishathaway
Copy link

travishathaway commented Nov 21, 2023

Hi @0xbe7a,

I just wanted to give you a few of my thoughts and some feedback on the proposal.

I like the idea of feature sets as a way to arbitrarily group dependencies which can then be run for various tasks. It seems like this could be pretty ergonomic too:

$ pixi run test --feature=py310

Maybe if we had that type of CLI interface, we could avoid the need for defining these feature sets in the tasks themselves in the pixi.toml?

I also think the drawback of not being able to combine feature sets is not good in terms of developer experience. I would naturally expect to be able to combine these like so:

$ pixi run test --feature=py39 --feature=test

Another thing about the proposal is that I'm not too crazy about is the mixing of feature definitions in the dependencies section. I think it would be easier to reason about if that were all located in the features section itself:

[dependencies]
python = ">=3.8"
requests = "*"

[features.test]
pytest = "*"
"pre-commit" = "*"

[features.py39]
python = { version = "3.9"}

[features.py310]
python = { version = "3.10"}

The dependencies section then becomes the default set of dependencies that are used when no --feature options are passed to the pixi run command.

This way, you also don't need to mark things as optional. For example, if your project requires python it feels a little weird to have it marked as optional = true.

Anyhow, these are just my thoughts. I really appreciate you taking the time to write a proposal for this! This is definitely the feature I'd like to see the most in pixi 😄 .

@dhirschfeld
Copy link

I agree with @travishathaway that you'd ideally like to be able to combine different sets of dependencies.

I might also just be too used to the pip way of doing things, but I'm not sure we need a new concept of "features" for this - to my mind they're just named groups of dependencies:

[project]
name = "mypkg"

[dependencies.run]
python = "*"
requests = "*"

[dependencies.test]
pytest = "*"
pre-commit = "*"

[dependencies.dev]
"mypkg[deps=test]" = "*"
ruff = ">=1.15"


[tasks.test-py39]
cmd = "pytest"
dependencie = ["python=3.9", "mypkg[deps=test]"]

@0xbe7a
Copy link
Author

0xbe7a commented Nov 21, 2023

RE @ruben-arts

1. Regarding the Use of Features for Dependency Sets

In the case of lint you might only need pre-commit. Would you define everything as optional and use the features to arrange the dependency sets?

Yes, I suggest adding syntactic sugar to simplify this process:

[features]
pytest = ["pytest", "pytest-xdist", "pytest-emoji"]
lint = ["pre-commit"]
development = ["feature:pytest", "feature:lint"]

# This is just syntactic sugar for
development = ["pytest", "pytest-xdist", "pytest-emoji", "pre-commit"]

This allows users to build up environments of sub-features without duplicating dependencies for higher-level environments and addresses the issue with orthogonal feature sets to some extent.

2. Handling Different Versions for Dependencies

What should the lockfile do with dependencies that require different versions for a dependencies of a package that is changed by a feature. For example, py27 and py39 need significantly different dependencies what should the lockfile do with that?

In this case, the pixi.lock file would have multiple entries for a package like python, each with a different set of dependencies. This is similar to the current approach for platform-specific dependencies. We could create one entry for each feature set in the lockfile, possibly deduplicating to combine all equal packages into a single record containing all platforms and features.

3. Scope of Features Beyond Dependencies

Do you see the features handle more than just dependencies? E.g., tasks, activation scripts, build dependencies and all its target variants? Or would you want something different for that?

The tasks definition already includes a field to specify supported feature sets. I'll discuss with @pavelzw how to include activation scripts, build dependencies, and targets in this system.

4. Matrix-Like Environments

What would be needed for the test + py39 and test + py310 matrix-like environments?

With the syntactic sugar mentioned earlier, we could define it like this:

[features]
test = ["pytest", "pre-commit"]
test_py_39 = ["py39", "feature:test"]
test_py_310 = ["py310", "feature:test"]

[dependencies]
requests = "*"
pytest = { version = ">= 1.2", optional = true }
pre-commit = { version = ">= 2", optional = true }
py39 = { package = "python", version = "3.9", optional = true}
py310 = { package = "python", version = "3.10", optional = true}

[tasks.test]
cmd = "pytest"
feature_set = ["test", "test_py_39", "test_py_310"]

Note: We could also add a quality of life feature where the “test” task automatically supports all sets containing the test subset, but we can discuss this later.

RE @travishathaway

Subfeatures and CLI Interface

  • Combining Feature Sets: I understand the appeal of allowing users to combine feature sets directly via the CLI. However, this approach loses the ability to create a single lock-file covering all use-cases. Implementing this would require 2^n solves for each subselection of features.
  • Hierarchical Features: Agreeing with your point on ergonomics, I believe allowing hierarchical features (as mentioned in my earlier reply to Ruben) could be a good compromise.

Syntax and Structure in the pixi.toml

  • Separating Feature Definitions: I really like your idea of separating feature definitions from the dependencies section. I think this approach is functionally equivalent and can be decided later in the development process.

RE @dhirschfeld

  • Coverage of Ideas: I believe the point about hierarchical features (1. from my response to Travis) covers your idea as well.
  • Terminology: Regarding the naming, "dependency groups" is indeed more descriptive than "feature-sets." However, for the sake of consistency in this discussion, I'll continue using "feature-sets" until we finalize other details.

@pavelzw
Copy link

pavelzw commented Nov 21, 2023

Awesome! A few annotations:

I agree with Bela on only providing a single feature to be able to allow better file locking. IMO it's more important to have pinned environments than to have the ability to select multiple features which could potentially not be solvable (pixi install --feature=py39 --feature=py38) and lead to runtime errors. I think the hierarchical feature proposition would cover this as well with a bit of overhead in the initial pixi.toml configuration.
Also, you would let the recipe author "feel" how the lockfile size grows with each feature set that they add.

Default features

I think it would be nice to have a "default feature". Something like

default-feature = "development"

[features]
test = ["test"] # references dependency groups
lint = ["lint"]
development = ["feature:test", "feature:lint"] # or subfeatures

[dependencies.test]
pytest = "*"
pytest-xdist = "*"
pytest-emoji = "*"

[dependencies.lint]
pre-commit = "*"
ruff = "*"
$ pixi install
# equivalent to `pixi install --feature=development`
$ pixi install --disable-default-feature
# equivalent to `pixi install` but with no features enabled (i.e. for production)

Don't require everything to be solvable

What I really dislike about poetry is the following use case:

You have a package that wants to support pydantic=1 as well as pydantic=2. But when you use pydantic=2, you also need pydantic-settings which itself needs pydantic=2.
Poetry strictly requires that the environment must be solvable when all "features" are turned on.
With poetry, this is not possible to do (I struggled with this in dmontagu/fastapi-utils#277 (comment))

Being able to support conflicting features is a core ability that pixi should have.
This pydantic use case would look something like this:

[dependencies]
pydantic = ">=1,<3"

[dependencies.pdc1]
pydantic = ">=1,<2"

[dependencies.pdc2]
pydantic = ">=2,<3"
pydantic-settings = "*"

Activation scripts

I think it would make sense to incorporate features into activation scripts as well. Then, you could also create env variables that are only relevant for testing etc.

[features]
test = ["test"]

[activation.test]
scripts = ["env-vars.sh"]

If you run pixi run --feature=test, then (and only then), env-vars.sh is being executed.

I think for build dependencies etc, I still need a clearer picture on how the build process with pixi actually should look like.

@0xbe7a
Copy link
Author

0xbe7a commented Nov 21, 2023

If we split features into environments and features it aligns pretty well with Travis's syntax and simplifies the management.
If we define environments are a list of features, and features as a list of dependencies, build-dependencies, and activation-scripts we can built the following example:

[project]
name = "polarify"
description = "Simplifying conditional Polars Expressions with Python 🐍 🐻‍❄️"
platforms = ["linux-64", "osx-64"]

[dependencies]
python = ">=3.9"
pip = "*"
polars = ">=0.14.24,<0.20"
hatchling = "*"

[dependencies.test]
pytest = "*"
pytest-md = "*"
pytest-emoji = "*"
hypothesis = "*"

[dependencies.lint]
pre-commit = "*"

[environments]
development = ["lint", "test"]
test = ["test"]

[tasks.test]
cmd = "pytest"
feature = "test"

[tasks.lint]
cmd = "pre-commit run --all"
feature = "lint"

[tasks]
postinstall = "pip install --no-build-isolation --no-deps --disable-pip-version-check -e ."

Key Concepts

  1. Implicit Feature Definition: The “test” and “lint” features are implicitly defined within dependencies.test and dependencies.lint. This reduces redundancy and prevents unwanted environment creation.

  2. Environment and Feature Distinction: Only environments defined in [environments] get created and locked. This distinction separates the concept of an environment from a feature.

  3. Generated Locked Environments:

    • development: Includes everything from lint and test, i.e., pytest, pytest-md, pytest-emoji, hypothesis, pre-commit, plus all base dependencies.
    • test: Includes pre-commit in addition to base dependencies.
  4. Environment-Feature Task Alignment: Environments can execute tasks that require included features. For instance, the development environment can run the lint task as it includes the lint feature.

I'm not sure hierarchical/composite functions are really necessary, and I think we can find a compromise between simplicity and flexibility. If we disallow them, it's clear and easy to see which tasks can be executed in which environment based on the features included. If we allowed them, the feature might only be included as a sub-feature and it would be difficult to reason which tasks are supported and executable by which environments for the end-user.

@baszalmstra
Copy link

baszalmstra commented Nov 23, 2023

Nice work guys! Here are my 2 cents:

I like @0xbe7a suggestion to split environments and features. That way the user is indeed aware (and in control) of which environments are resolved and usable.

TOML issues

There is an issue with the TOML syntax though because these two things are equal:

[dependencies.test]
pre-commit = "*"

and

[dependencies]
test = { pre-commit = "*" }

This might make it difficult to implement/confusing given that the syntax for defining a dependency and a dependency group is very similar.

We have also discussed implementing a more complex "selection" syntax (similar to what cargo does). E.g.

[target.'cfg(feature="test")`.dependencies]
pre-commit = "*"

which we should support regardless since it provides maximum flexibility (also in combination with platform etc).

But maybe a more ux friendly might be:

[feature-dependencies.test]
pre-commit = "*"

(or call it [dependency-group] instead of feature?)

Although I'm not sure what we would do with build-dependencies etc. (Do we even need build-dependencies for features??)

Or perhaps simply:

[feature.test.dependencies]
pre-commit = "*"

@pavelzw
Copy link

pavelzw commented Nov 23, 2023

I like the

[feature.test.dependencies]
pre-commit = "*"

syntax. This way we could also easily incorporate activation scripts etc

[feature.test.activation]
scripts = ["env-vars-testing.sh"]

@ruben-arts
Copy link

I like the idea of creating environments from a set of features. Here are a few syntax ideas to define the created environments.
Note that this doesn't work with allowing to choose features from the CLI as we can't lock what isn't defined in the pixi.toml.

Defining multiple environments:

[dependencies]
numpy = "*"

[feature.py39.dependencies]
python = "3.9"

[feature.py310.dependencies]
python = "3.10"

[feature.test.dependencies]
pytest = "*"

[environments]
py39 = ["default", "py39"]
py310 = ["default", "py310"]
test39 = ["py39", "test"]

The cli would look like:

pixi run -e py39 python foo.py
pixi run -e test39 pytest

Making the "default" environment be overwritten by multiple features is not super clear to me yet. Here is an idea:

[environments]
# Renaming it to the "main" environment  and use the nameless configuration as "default"
main = ["default", "py39"]
py310 = ["default", "py310"]
test39 = ["main", "test"]

Defining environments specific configuration

Setting the platforms and system-requirements can really important to support multiple ways of running the project. Without supporting all machines at all times. This can be very important if you are developing a project that is going to be running on a different machine in production. (ML and Robotics)

[feature.cuda.dependencies]
pytorch-cuda = "*"

[environments.cuda]
# cuda can only work on these platforms (hypothetically)
platforms = ["osx-64", "osx-arm64", "linux-64"]
features = ["default", "cuda"]
system-requirements = { cuda = "12.0" }

Long name problem

This is not persee an opinion but just a list of ideas of how we could do the naming of the tables:

# Option 1
[target.linux-64.feature.test.activation]
scripts = ["env_vars.sh"]

# Option 2: More like Cargo.toml
[target.'cfg(feature=test, linux-64)'.activation]
scripts = ["env_vars.sh"]

# Option 3: Have a way to configure/predefine names
[target]
linux-cuda = {platform=["linux-64"], feature="cuda"}
[target.linux-cuda.activation]
scripts = ["env_vars.sh"]

Defining tasks

We should also have a ergonomic way to overwrite and define tasks in combination with environments
Would we define them per environment or per feature? I think it would be hard to integrate the "environments" in the nameless task configuration.

[tasks]
train = "python train.py"

[feature.cuda.tasks]
# Overwrites the nameless "train" task if cuda feature is in the environment
train = "python train.py --gpu"

[feature.test.tasks]
test = "pytest"

@pavelzw
Copy link

pavelzw commented Nov 23, 2023

[target.'cfg(feature=test, linux-64)'.activation]

I personally am not really a fan of the cfg syntax 😅 looks a bit odd

Would we define them per environment or per feature?

I would suggest per feature since otherwise you would need to redefine it for every environment:

[environments]
py39 = ["py39"]
py39test = ["test", "py39"]
py310 = ["py310"]
py310test = ["test", "py310"]
py311 = ["py311"]
py311test = ["test", "py311"]

[feature.py39.dependencies]
python = "3.9.*"

[feature.py310.dependencies]
python = "3.10.*"

[feature.py311.dependencies]
python = "3.11.*"

[feature.test.dependencies]
pre-commit = "*"
pytest = "*"

[feature.test.tasks]
test = "pytest --color"

If this were per environment, we would need to duplicate

[environment.<env-name>.tasks]
test = "pytest --color"

for each environment.

If you really want to have a specific task per environment, you could create a 1-to-1 mapping of features to environments.

@pavelzw
Copy link

pavelzw commented Nov 24, 2023

Another idea that came to my mind which could be useful:

A use-case that occurs quite often for me is that i want my dev dependencies to be a strict superset of my prod dependencies. This way, one can actually make sure that the tests that one executed actually matter in production and don't produce different results just because the prod environment was solved differently.

I'm not sure what the best way to integrate this into pixi.toml is. Does anybody have suggestions?
Maybe something like this?

[environments]
py39 = ["py39"]
py39test = ["test", "py39"]
py310 = ["py310"]
py310test = ["test", "py310"]
py311 = ["py311"]
py311test = ["test", "py311"]
prod = ["py311"]
prod-test = [{environment = "prod"}, "test"]
# or alternatively
# prod-test = ["evironment:prod", "test"]
# or
# prod = [{constraints = "env:prod-test"}, "py311"]
# prod-test = ["py311", "test"]

[feature.py39.dependencies]
python = "3.9.*"

[feature.py310.dependencies]
python = "3.10.*"

[feature.py311.dependencies]
python = "3.11.*"

[feature.test.dependencies]
pre-commit = "*"
pytest = "*"

[feature.test.tasks]
test = "pytest --color"

@majidaldo
Copy link

is this the official discussion thread? i've created "hydraconda" so i have much to contribute.

@pavelzw
Copy link

pavelzw commented Jan 30, 2024

There was also some discussions on prefix-dev/pixi#584

now, there is already quite a lot implemented in the latest pixi build on main (not released yet, check the build artifacts)

@ruben-arts
Copy link

ruben-arts commented Jan 30, 2024

Yeah end of this week we should have the feature released in an MVP state. I think it's better to start a PR on the design docs if you have additional ideas or a discussion on pixi's discussion board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment