Skip to content

Instantly share code, notes, and snippets.

@NickLaMuro
Created January 5, 2018 22:18
Show Gist options
  • Save NickLaMuro/42a64a9361ff051191e5a4cae317edf0 to your computer and use it in GitHub Desktop.
Save NickLaMuro/42a64a9361ff051191e5a4cae317edf0 to your computer and use it in GitHub Desktop.
ManageIQ/manageiq 15757 Pull Request review comments moved to cut down on clutter

@Fryguy so before I work on this further, I wanted to clarify a few things based on your "review" (it wasn't an official #GitHubPullRequestReview2017™, but the content of it makes me want to treat it as such).

First your definition of risk still leaves questions for me, but I would like to clarify a bit from my attempt to unpack it previously, which was done poorly. So I can see "risk" being part of two categories:

  • A "security risk", in which user could exploit it to execute part of our code base we don't want them to. Since the eval statements are used strictly to write the methods for this class when the class/module is loaded, and not at execution time, so if I did this right, that shouldn't be a problem in this case. That said, it seems like at the time of writing your comments you might not have fully reviewed this, so it is entirely possible I did something wrong and poked a whole in the app that wasn't previously there, so not suggesting you shouldn't check my work for introduced vulnerabilities.
  • The other form of "risk" I could see you referring to is something that I would classify as more "complexity"/"stability"/"maintainability". This is something I can get behind as being a concern, and would very much like an opinion on. I went down the route of evals specifically because it would DRY up the implementation by sharing as much similar code as possible that was already being done in the case statement in the original implementation, but gain a performance boost at the same time (more on that regarding performance below). I also could have achieved this by defining all these methods manually and avoiding all the evals entirely (something I suggested in the PR description), but I figured having the code more DRY was more desirable when changes were necessary. That said, I am actually cool with that route if that is preferred, it just makes maintenance a little bit more involved in a different way.

The other part that I wanted to clarify was why I didn't use define_method instead of eval, which you suggested (I think...) when pointing me to https://github.com/ManageIQ/manageiq/blob/68e7457/app/models/mixins/inter_region_api_method_relay.rb#L21-L38 . The reason for that would probably be better illustrated using this gist I put together comparing the two:

https://gist.github.com/NickLaMuro/b9a84ae813ffb3932ac3b293eeb5b4bc

I didn't test the "non-string version" of module eval, but my guess is it would be roughly as slow as the define_method versions. I believe the reasoning for this (without actually staring at the ruby codebase for a week trying to figure it out) is partially do to that define_method has to pass around a proc for it's execution, and the eval forms act as almost exactly as if you required a file with this code hard coded in it. The other part based on how I implemented this code in the example is in the eval form, I am able to hard code in the values from the constant, versus the define_method method where it has to do a hash lookup each time.

The second bit was actually intentional for the examples implementation since the code I have here is similar in nature, where I am able to take some shortcuts because I can hardcode in some of the values when using eval when they are derived from the DERIVED_COLS constant. This is something I can't do using define_method. That said, again, I could have hard coded all of these methods by hand with the same performance, with the only downside being it wouldn't be very dry.

So, mostly asking for your opinion on what you would prefer versus convincing you that this way is the absolute #rightWay™. That said, of the possibilities, I would only personally would prefer NOT to implement this with define_method, for performance reasons strictly, but anything else I think should be on the table (2 cents).

tl;dr; Skip to the next section (past the line break) to get to my answer of your main question above (the last one in your comment), but the next parts are correcting some things that you are incorrectly assuming or drawing conclusions from in respect to what I said in my previous comment.

I don't think your gist is actually testing running the define_method on the fly.

That is intentional, because all of the methods defined using eval in this PR do just that, and nothing is defined on the fly (well... besides when the class is loaded... but that is the point). They are defined at compile time based on the DERIVED_COLS constants and never change (this is not meant to be something like a ActiveRecord::Association of has_many... though that too uses a form of eval under the hood). I was specifically trying to emphasis this point since I think the risk from a security perspective is not as relevant here.

I am how ever trying to emphasis in that gist the execution time difference of using eval over define_method when those methods are then used after they are defined, and argue there is a performance penalty from the generated method when using the define_method form (more on that below).

We aren't actually executing even close to a million calls in practice.

Million in a single call of the more top level perf_capture, well... no, probably not, but it also isn't just one iteration through the DERIVED_COLS, or specifically this line:

DERIVED_COLS.each { |col, method| send(method, col, state, attrs, result) }

either, and doing some quick addition in that gist wasn't at all supposed to fully represent the amount of processing actually done in the code I am changing.

That said, this is called MANY times during the call of a single perf_capture, and probably is best illustrated with a collection of screenshots when viewing a flamegraph generated by stackprof when in the "object allocation" mode:

  1. Can you see a single call of this yet?
  2. How about now?
  3. Starting to see it now...
  4. A little more...
  5. There we go!

Sorry for the snark (and I may have had a little too much fun making that...), but I think it illustrates just how many times that process_derived_cols can be called within a single call of perf_capture_realtime, which would be one MiqQueue job being executed calling this on a single VM, and that is done MANY times per min, so I think it is still a significant figure to be discussing.

Of note: This was specifically taken using the test data in spec/tools/vim_data/vim_data_test_helper.rb from the manageiq-providers-vmware repo, and I have some minor modifications I do to it to tweak the returned timestamps from that data, so the above might not be a 100% replication of a prod environment execution of this code. That said, I will try and put this benchmarking script up for viewing at some point soon (when I get a chance).

EDIT: Actually, the gist of what I was just referring to is actually linked in the gist in the description. I don't have the stackprof functionality in there, but the process in which I capture the metrics is effectively the same.


If you CAN'T do something in define_method, then that is your entire argument. Can you clarify that part a bit because I don't understand why you can't do it, or perhaps what you are doing in eval that is so much easier than doing it with define_method.

So I think this was your main question in the last comment, and I tried to answer this more tersely before, but I will try and explain this in more detail in case I was not clear.

Going back to my gist example, with the meta-programming of the UsingEval class (and others I guess), effectively I was writing code to "write code", so if I did it by hand, it should end up like this:

class UsingEval < BasePlayer
  def move_up(amount=1)
    @x += 0 * amount
    @y += 1 * amount
    @z += 0 * amount
  end

  def move_down(amount=1)
    @x += 0 * amount
    @y += -1 * amount
    @z += 0 * amount
  end
  
  # ...
end

So doing it by hand would be just as performant, but just not DRY in the slightest. In contrast, and what I think leads to part of the difference in speed between eval and define_method, is the eval will basically convert the code into the ruby byte code, where define_method most likely stores the block as a proc some where, and the binding of block when the method was created, and effectively runs .call on it when it is executed.

So if we were to actually write that out by hand, it probably would look something like (I think... taking some guesses here... and actually read some of the ruby source to get an idea, and this isn't actually practical, just for demonstration purposes... and I didn't test it in full...):

class UsingDefineMethod < BasePlayer
  # This next line was really hard to figure out how to do... just so you know...
  # Basically, it stores a `Binding` object with the context of what is inside the `proc`
  # to be used later:
  # 
  #   https://ruby-doc.org/core-2.0.0/Binding.html
  # 
  UpDirectionBinding = proc { dir = :up; binding }.call
  MoveUpProc = proc do |amount=1|
    dir = UpDirectionBinding.eval("dir") # there are newer methods available for this without eval, but not in 2.0.0... FYI (though, I think eval is used under the hood for those as well... basically)
    @x += METHODS_MODS[dir][:x] * amount
    @y += METHODS_MODS[dir][:y] * amount
    @z += METHODS_MODS[dir][:z] * amount
  end

  def move_up(amount=1)
    MoveUpProc.call(amount)
  end
  
  # ...
end

Two things of note:

  • You will notice specifically that I wrote this version with the METHODS_MODS[dir][:x] in tact, and this is what I was referring to as a "shortcut I can't do in define_method". In the eval forms, it is simply just a amount * positive/negative/zero modifier, where in this form, it has to be a amount * hash lookup to find positive/negative/zero modifier. While not a straight 1-to-1 with the changes in this PR, I am changing out the "hash look up to find a integer" demonstrated in the example gist, with a "hash lookup to find a string to be generated". In the eval form in this code base, I can just "hard code" that into the evaluated method definition, so no hash-lookup required at runtime. Not possible ("CAN'T") do that in the define_method form.
  • The binding business is important, because is probably what makes this slower under the hood. Basically, the dir value that is used for the look has to come from somewhere, and it is only in scope for the duration of the defining the method. This means that the call to define_method has to store off the binding somewhere and switch to that when evaluating this method, otherwise it wouldn't work.

Now, if you were observant, you would notice a flaw in my implementation from the gist previously that skews the metrics pretty significantly in my favor. The "hash lookup" portion that is done inside of the method could actually be done outside the block and assigned to local variables, and then loading in the scope to execute the method when called (which as we discussed, has to happen anyway) scope will allow the already looked up values to be used in the method and should speed it up. And it did (shhhh! don't tell Jason):

# new version of UsingDefineMethod in the gist

class UsingDefineMethod < BasePlayer
  METHODS_TYPES.each do |type|
    METHODS_MODS.keys.each do |dir|
      x = METHODS_MODS[dir][:x]
      y = METHODS_MODS[dir][:y]
      z = METHODS_MODS[dir][:z]
      define_method "#{type}_#{dir}" do |amount=1|
        @x += x * amount
        @y += y * amount
        @z += z * amount
      end
    end
  end
end

# Sample output when avoiding the hash lookup outside the block
#
#   $ ruby compare_define_method_with_eval.rb
#   Warming up --------------------------------------
#    using_define_method    27.592k i/100ms
#             using_eval    31.447k i/100ms
#      using_module_eval    31.492k i/100ms
#       using_class_eval    30.949k i/100ms
#   Calculating -------------------------------------
#    using_define_method    310.293k (± 3.8%) i/s -      1.573M in   5.076165s
#             using_eval    354.655k (± 4.3%) i/s -      1.792M in   5.064345s
#      using_module_eval    365.363k (± 4.3%) i/s -      1.827M in   5.008565s
#       using_class_eval    366.565k (± 4.3%) i/s -      1.857M in   5.075415s
#
#   Comparison:
#       using_class_eval:   366564.9 i/s
#      using_module_eval:   365362.7 i/s - same-ish: difference falls within error
#             using_eval:   354655.2 i/s - same-ish: difference falls within error
#    using_define_method:   310292.8 i/s - 1.18x  slower

But, even then, it is still ~15% slower than eval. Hopefully that all makes sense.

(heavy breath and wipe of the brow 😩 )

Again, I am fine with any of the following at this point (minus the last one):

  1. Keeping it as is (well... kinda)
  2. Reverting the last commit (again, I offered this in the PR description originally)
  3. Hard coding these methods in manually (code climate will probably whine... but meh)
  4. Use define_method instead

Yes, not liking define_method because it is slightly slower makes me pedantic... sue me! Again, I would like a decision on one of the above from you prior to me doing the rebase of this with master, so I can do the rebase and implement whatever that might be at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment