Skip to content

Instantly share code, notes, and snippets.

@mottosso
Created July 23, 2019 06:06
Show Gist options
  • Save mottosso/54451ac5dd50ffdc8ba3e309e55c2d71 to your computer and use it in GitHub Desktop.
Save mottosso/54451ac5dd50ffdc8ba3e309e55c2d71 to your computer and use it in GitHub Desktop.
Slack Conversation about Reproducible Requests/Resolves

Marcus Ottosson [Jul 17th at 10:50 AM] Hi all, if I'd like to perform a render on a farm, I'd like to somehow "send" the current context there. How would you guys do this at the moment?

I'm considering either:

  1. Replicate the request, but include all versions; e.g. rez env packageB==1.0.2 packageB==5.13 ... but it could get quite long and is a little tedious to put together
  2. Export the REZ_RXT_FILE from e.g. Maya, and use that on the server as rez env -i context.rxt But does it handle being exported from Windows and run on Linux?
  3. Or some other way? (edited) 56 replies

Dhruv Govil [2 days ago] Theres a rez var i believe that says all the resolved package versions that can be reused se a request .

Use that.

Otherwise you may have dependencies that aren't in your original request that can shift out from under you

Blazej Floch [2 days ago] 2. Not really due to the implicit requests. I believe a idiomatic way is to rez-env $REZ_USED_REQUEST os-XXX arch-XXX platform-XXX -t $REZ_USER_TIMESTAMP --no-implicit

Blazej Floch [2 days ago] This pretty much re-resolves at the state of the current context but replaces the implicit requests.

Blazej Floch [2 days ago] You need to re-resolve because you don't know if the other platform might have the same requirements/variants available. So this could fails.

Blazej Floch [2 days ago] As opposed to Dhruv's answer this does not make use of Resolved packages - again because you don't know about availability. But if you can guarantee it you could also strip the implicit packages and use that. I think I prefer my method, because if it wasn't part of your request, it wasn't important for you. So technically our requests are more specific and we bail on the details of the resolve. Interested to hear other opinions.

Dhruv Govil [2 days ago] I disagree with the "if it wasn't in your request it wasn't important"

That could be cultural but there's a lot of implicit dependencies that are very important but are not specified

Dhruv Govil [2 days ago] If I listed every dependency I actually wanted to pin I'd have a very very long request

Blazej Floch [2 days ago] I personally think this a philosophical disagreement. It's like going into a bar and ordering a sandwich. Which sandwich? I don't care. You get peanut butter. Now you say, my buddy wants the exact same sandwich. Which is ok. Only he is allergic and dies. I instead say: My buddy also want's any sandwich from the kitchen stock at the same time, but he has other requirements - like being allergic. He gets marmalade. And that is ok! Why? Because it was not important to us to begin with. If you wanted to enforce this then we would go into a bar and say: I want a peanutbutter sandwich. The thing I don't like with stuffing resolve into a request is in my experience it is nearly impossible to have resolves as soon as your platforms diverge in variants - and they have to. So my rule is: You care about it? Request it. And everything that makes up a stable environment must to be "pinned" in the request. Else nothing stops you from getting something completely different the moment someone introduces a new package for the next resolve. (edited)

Blazej Floch [2 days ago] As sanity check we do a diff, but it is the toolset managers responsibility to make sure the requests are "as specific as necessary, as undefined as possible" for a flexible environment. (edited)

Blazej Floch [2 days ago] Alternative is something like docker compose where you control everything but also pay the cost of maintaining the differences. That's ok - but I believe that's not the mission of rez.

Blazej Floch [2 days ago] At least in my setup :)

Dhruv Govil [2 days ago] Imho the sandwich analogy doesn't hold water. For example I'd order a club sandwich. The menu lets me know some of the ingredients but not all. I can make specific requests for some of them but I may not know I cared about all the tiny choices till later.

Dhruv Govil [2 days ago] It also means my config to launch the DCC needs to mandate every single thing as a version lock unless I want floating.

The goal of the farm is to replicate my users setup as close to exact as possible.

Blazej Floch [2 days ago] Not really. Don't forget about the timestamp.

Dhruv Govil [2 days ago] So that means pinning every version of every package they use.

If some dependencies aren't possible then I'd rather let them fail at start than waste time on the farm with the incorrect setup

Dhruv Govil [2 days ago] Even with a timestamp, imho it's better to be explicit. It also helps debugging for my TDs when they can just look at a jobs launch arguments and know right away what they're getting.

Blazej Floch [2 days ago] I totally respect that and I believe it has benefits. In all my testing on a production repo it just showed that at the end you end up with a huge list of packages. Then some don't match and you remove them. Etc. etc. but essentially you are solving manually. You are reading package definitions of why things did not work. It becomes tedious. (edited)

Blazej Floch [2 days ago] But even if you leave the farm out: I still strongly believe that everything important needs to be in the request. The docs clearly state that the resolve can change overnight with a new rez version. The only thing that is guaranteed to be stable is the request.

Blazej Floch [2 days ago] So our process is a bit different.

Blazej Floch [2 days ago] We have, what we call a deployment.

Blazej Floch [2 days ago] This would resolve all the environments for all os-ses upfront and we can verify it resolved ok via diff. (either implemented via baked rxt or timestamp+request+rez-version+rezconfig). Obviously it does not account for development envs but that's ok. (edited)

Blazej Floch [2 days ago] Like said, different schools of thought :)

Dhruv Govil [2 days ago] Oh if you're resolving up front the entire thing, then we're on the same page. You're just paying that cost up front

But if you don't do that, then yes, agree to disagree

Marcus Ottosson [1 day ago] Interesting. @jc What are your thoughts on this? You mentioned going the REZ_USED_RESOLVE route, have you encountered the issues raised by Blazej?

jc [1 day ago] @Marcus Ottosson @Blazej Floch I am fortunate enough to have to deal with only one platform, and it is Linux. So for me I've rarely seen issues with package conflicts. We had such problems with osx, but I use the implicit packages env var to avoid that

Marcus Ottosson [1 day ago] Thanks @jc

Blazej Floch [1 day ago] Makes sense.

Marcus Ottosson [1 day ago] If I were to try and reformulate the problem, we've got:

  1. Overly specific
  2. Underly specific

Whereas with (1) you can get into a situation where a specific request from a Windows workstation results in a Windows-only package being included on Linux.

With (2), you run the risk of a later version of a package being released in between, for example, a Maya file being opened and later submitted to a farm, which would result in a not specific-enough request, such as core_pipeline-1 to get core_pipeline-1.1.1 on the local machine but core_pipeline-1.2.0 on the farm.

Is that about it?

Blazej Floch [1 day ago] 2.2 does not happen due to the timestamp

Marcus Ottosson [1 day ago] Ah, yes that makes sense. So the time, along with the request from REZ_USED_REQUEST should suffice for a render submission?

Marcus Ottosson [1 day ago] Are there any other examples of where (2) is at risk?

Blazej Floch [1 day ago] My point is just that you end up with potentially more + wrong packages in the resolve for the particular platform and it affects the result. It's really hard to demonstrate without a whiteboard but if you manage processes with a python package. On Windows you need a win32all-python package while on linux you use subprocess or other unix tool-packages. So you end up creating a variant for the particular platform.

name = processmanager
variants = [
['platform-windows', 'win32all'],
['platform-linux', 'unix-process-tool'],

Note: Each variant might introduce a chain of other dependencies! Maybe generic ones that work on both platforms. Now you resolve on Windows and you get win32all and all its dependencies. You throw the resolve as request on linux. It will fail on win32all. So you remove the win32all from the list. It will obviously choose the right variant now, but you kept all the dependencies of win32all for no reason. And these dependencies might affect the resolve of the unix-process-tool chain. To me this is wronger then the alternative which means that over time you get stricter and better requests, while leaving the implementation details out for the resolver to figure out.

Again my 50 Cents :)

Blazej Floch [1 day ago] Also just the step of removing win32all AND the implicit packages to me is very fragile. You are making the resolvers job as I mentioned before.

Blazej Floch [1 day ago] This can quickly get out of hand.

Blazej Floch [1 day ago] I know people are a bit afraid of moving platforms during production but this is why you either pre-bake and verify the differences or you guarantee the resolve input is the same (request+timestamp+rez-version+rezconfig+...). Either way this IMHO is the right method to guarantee stability. But note that the different platforms are still resolved with the same request. And if there is an unwanted fluctuation you need to adjust your request.

Marcus Ottosson [1 day ago] Sounds sensible to me

Marcus Ottosson [1 day ago] Thanks Blazej

Marcus Ottosson [1 day ago] @Dhruv Govil Do you have any additional thoughts on this?

Dhruv Govil [1 day ago] My production concerns have always been to match the farm system as closely as possible to the users setup.

If Blazej presolves the entire request, then it precludes what I'm saying since it solves the same issue at a different point.

But generally I'm for making the user side request as simple as possible, so potentially very implicit, and the farm side is explicit to the users configure.

Marcus Ottosson [1 day ago] Ah, you would avoid submitting from a Windows machine to a Linux machine?

Marcus Ottosson [1 day ago] To ensure the environment is as exact of a match as possible? (edited)

Dhruv Govil [1 day ago] I would avoid it if possible, but if not then I'd have another stage that would try and figure out what kind of resolve would most closely match the source while still working on the target OS.

My concern is always when implicit dependencies change. Could even be something as simple as removing failed parts of the resolve till the resolve succeeds, and then make sure to log the difference as part of the job

Dhruv Govil [1 day ago] Basically I think its better that a job not start at all, then for a job to start and be wrong.

Marcus Ottosson [1 day ago] Fail fast, that makes sense.

Marcus Ottosson [1 day ago] Thanks gents, this was exactly what I was looking for. 💞

Allan Johns [1 day ago] so I can tel you what we do at Method. Here we re-resolve the same(ish) env on the farm. However we do this with timestamping as well, so typically we'l get the exact same env resolved on the farm node. In theory you could get a different one if the distro (for eg) of the node were different, but in practice we don't see this very often (sometimes though.. like eg when we're in the process of migrating to newer CentOS)

Allan Johns [1 day ago] I don't think there's a one approach that fits all with this stuff. Needs will differ depending on studio. It's just as valid to send a context to your farm and re-source it to configure the same env, but that obviously comes with constraints - ie, the assumption that the same env is valid on the farm node (so we're probably assuming no os/distro differences to workstations for eg)

Allan Johns [1 day ago] if you do re-resolve on the farm though, also take into account that this may absolutely slam your memcache. At Method we can hit 800-1500 rps on memcache for eg (luckily it and redis are beasts that can deal with this kind of load)

Allan Johns [1 day ago] and actually there's a lot more to consider too. At Method we have this whole "live resolve" system, which we're going to explain at SIG if you get the chance to come to the talk. So it's not about replicating one env on the farm per se. For eg, your farm job might subproc out to maya.. so then THAT env needs to be the same also! Getting that to work is pretty complex, but we do have that at Method

Allan Johns [1 day ago] when you combine that scenario with the other capabilities we have - package and profile locking - it get's fairly involved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment