Skip to content

Instantly share code, notes, and snippets.

@asmacdo
Created April 15, 2019 14:51
Show Gist options
  • Save asmacdo/4234ebe1265aa0fb1916c75c74fc108b to your computer and use it in GitHub Desktop.
Save asmacdo/4234ebe1265aa0fb1916c75c74fc108b to your computer and use it in GitHub Desktop.
Overview of Pulp Python plugin features, and their readiness for integration
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
.collapsible {
background-color: #FFF;
color: blue;
cursor: pointer;
padding: 8px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
}
.active, .collapsible:hover {
background-color: #FFF;
}
.content {
padding: 0 18px;
max-height: 0;
overflow: hidden;
transition: max-height 0.2s ease-out;
background-color: #f1f1f1;
}
</style>
</head>
<body>
<h2>Katello: Integration with `pulp_python`</h2>
<h4>Organization of document</h4>
<p>
This document is a high level overview of the state of the Python plugin. Additional
information is nested beneath, offering as much information as is interesting and helpful to
the reader.
</p>
<h4>
Color Legend:
</h4>
<ol>
<li>
<p style="background-color:Chartreuse";>Green</p>
<p>Ready to go.</p>
</li>
<li>
<p style="background-color:Yellow";>Yellow</p>
<p>Does not block integration, addititive (backwards compatible) changes are expected.</p>
</li>
<li>
<p style="background-color:Red";>Red</p>
<p>Does not *have* to block integration, but will probably have backwards incompatible
changes before it is complete.</p>
</li>
</ol>
<ol>
<li>
<b>Implemented Workflows</b>
<ol>
<li>
<button style="background-color:Chartreuse"; class="collapsible">Whitelist/Blacklist Sync</button>
<div class="content">
Sync by specifying which packages should be retrieved. Similar syntax to
requirements.txt
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">Upload</button>
<div class="content">
Tarballs, (eggs?), and whls (check) produced with `twine` can be uploaded to Pulp as
`Artfiacts`. A POST to `PythonContent` referencing the artifact will generate a
PythonContent unit, which can then be added/removed to/from `Repositories`. </br>
There has been some discussion on whether we should change how this works. We need to
decide the current implementation is acceptable, given the recent pulpcore changes in
the Content serializers. (@dalley)
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">Publish</button>
<div class="content">
The Python Publisher is currently functional, but is vestigial and will be removed.
This is covered in the gap analysis.
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">Lifecycle</button>
<div class="content">
Lifecycling is done entirely with the `pulpcore` feature `Distributions`, which are
used by pulp-python. The plugin's use of `Distributions` is totally vanilla, so the
documentation in `pulpcore` docs cover this. </br>
<b>GAP?</b>One potential pain point for Katello will be if we need to take more
control of Distributions and implement some python-specific behavior. To avoid
Katello from implementing this feature twice, the pulp-python team needs to make a
decision of whether or not we will need custom Distributions. Why would we? Because
a custom Distribution gives us the power to fully implement an API that can be used
by a client. For instance, it may be necessary to `twine` uploads to go into an
existing "package index", which for us is served by a distribution. However, I
think we can probably get away with vanilla Distributions long term if we decide to
publish the JSON api as custom REST endpoints, (Django Views)
</div>
</li>
</ol>
</div>
</li>
<li>
<b>Overview of Objects</b>
<ol>
<li>
<button style="background-color:Chartreuse"; class="collapsible">Content</button>
<div class="content">
A single content type in pulp-python's `PulpPackageContent` represents all available
content units for Python Packages (whl, sdist, egg??) which all contain similar metadata.
The metadata is generated by an upstream Python project called `packaging`, which is the
same library used by PyPI, so this is unlikely to change soon.
The Content object appears relatively complete and stable. It includes a lot of fields that
our users probably don't care about, but they are automatically populated during sync and
upload workflow.
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">Remote</button>
<div class="content">
The Remote contains all of the features necessary to sync from PyPI. Yellow because
it needs some very light cleanup.
I've flagged some pulpcore fields that I don't think we use in an open PR. If they
are indeed `dead`, we will remove them and add back as necessary.
If any of the following features are necessary, please let us know and we will
prioritize it.
'validate','ssl_validation', <b>'proxy_url'</b>, 'username', 'password',<b>'policy'</b>
Does a plugin have to implement `proxy_url` or is this covered by pulpcore?
Policy is the lazy sync feature, and will be covered in the "Gap analysis"
</div>
</li>
</ol>
<li>
<b>Gap analysis:</b>(not implemented but necessary)
<ol>
<li>
<button style="background-color:Yellow"; class="collapsible">GAP: Full PyPI sync</button>
<div class="content">
User must provide a list of packages they want to sync. Full PyPI sync is planned
and has relatively low complexity and risk, and can probably be implented
quickly. Unfortunately, it will probably not perform well. To use PyPI's `simple
API`, a full sync of 175,954 projects will require 175,954 metadata requests and
an order of magnitude more package requests. The total (as of June, 2018) is 2 TB
of total download space, not including database. Thus, even after lazy is
implemented, this will be slow. is not great for performance.
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">GAP: Incremental sync</button>
<div class="content">
This is yellow because it is important, but not entirely necessary for first
implementation. Not necessary without full PyPI sync.
Incremental sync is not implemented or planned and is (semi) blocked upstream.
PyPI (warehouse) will need to move to its next generation REST API, which will
improve the situation for syncing. It will be cached (fast) upstream, and will open
the door for Incremental sync. Thus, the initial sync will still be slow, but
subsequent syncs will only update projects that have changed upstream. I have a
functional skeleton of this API implemented, but I need to spend some time on it
before I can merge and get adoption on PyPI itself. This work is definitely doable,
but the upstream community can be slow to respond, so **if this is a priority, it
should be raised now.**
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">GAP: Lazy Sync</button>
<div class="content">
Yellow, because it should not block initial implementation, but will be a high value
feature and should be completed before integration is complete.
Lazy sync has not been implemented. Python's first stage may need to be refactored to
accomidate this. The Python team will need to investigate how much work is necessary.
</div>
</li>
<li>
<button style="background-color:Red"; class="collapsible">Refactor: Remove Publisher</button>
<div class="content">
The Publish workflow will change, which will affect initial implementation. This
should not be too difficult to fix later, but preferably this work will be complete
before Katello starts.
The publisher contains no extra fields for the Python plugin, and is only used to
kick off a sync task. Currently we intend to make some `pulpcore` changes to the
publish workflow. When complete, publish will be accomplished by a POST to
pulp-python defined `Publications`.
The above change will probably be finished soon (for pulpcore), and will affect most
or all plugins. This should not take very much effort for each plugin.
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">GAP: Pulp-Pulp sync (capsule sync)</button>
<div class="content">
This is yellow because it will block full integration, but does not need to block
initial implementation.
The Python plugin currently publishes the `simple API`, which is consumed by CLIs
like `pip` and `pipenv`. We do NOT currently implement the `JSON API`, which is
necessary for syncing from a repository published by another instance of Pulp. AFAIK,
this feature will be necessary for Katello users, but is not a blocker for beginning
integration.
This feature is complete for Pulp 2, so should have a relatively low complexity.
</div>
</li>
</ol>
</li>
<li>
<b>Wishlist :</b>(not implemented but probably not necessary)
<p>
These are considered low priority for now. If Katello believes any of these are
important for initial implementation, please let us know now.
</p>
<ol>
<li>
<button class="collapsible">Pipenv integration</button>
<div class="content">
This might "just work" for free. We need to check and see. If it works, we need to
add functional tests.
</div>
</li>
<li>
<button class="collapsible">Twine upload</button>
<div class="content">
In the future, we are interested in supporting the Python packaging tool `twine`,
which is how users upload their packages to PyPI. Ideally, they could use twine to
upload into Pulp as well. This feature is relatively unresearched though, and could
be tricky. However, I would guess that Katello users would rather just click an
"Upload" button like any other plugin. (Do we want to behave like other content
types, or do we want to behave like PyPI?)
</div>
</li>
</ol>
<li>
<b>Documentation State</b>
<p>
Overall, the pulp-python docs are in a very good place. They have good structure, and are a
huge improvement on Pulp 2. Because they are published on read-the-docs along with its REST
API reference docs, they should be significantly easier to use than pulp-file was.
<ol>
<li>
<button style="background-color:Chartreuse"; class="collapsible">Quickstarts (Green)</button>
<div class="content">
Every "major feature" (sync, upload, add/remove, publish, and host) is covered with a
quickstart style workflow. Each workflow includes links to relevent reference REST API
docs.
</div>
</li>
<li>
<button style="background-color:Yellow"; class="collapsible">REST API Reference</button>
<div class="content">
The REST API docs have been covered, but many options are not well explained and there
are not enough examples. This is work that asmacdo is currently doing, and will
hopefully be updated to green this week.
</div>
</li>
</ol>
</li>
</ol>
</li>
<script>
var coll = document.getElementsByClassName("collapsible");
var i;
for (i = 0; i < coll.length; i++) {
coll[i].addEventListener("click", function() {
this.classList.toggle("active");
var content = this.nextElementSibling;
if (content.style.maxHeight){
content.style.maxHeight = null;
} else {
content.style.maxHeight = content.scrollHeight + "px";
}
});
}
</script>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment