Uradamus/shuffle.lua

Last active November 3, 2023 09:39

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/Uradamus/10323382.js"></script>
Save Uradamus/10323382 to your computer and use it in GitHub Desktop.

Download ZIP

A simple array shuffle function for Lua implementing the Fisher-Yates shuffle.

Raw

shuffle.lua

	function shuffle(tbl)
	for i = #tbl, 2, -1 do
	local j = math.random(i)
	tbl[i], tbl[j] = tbl[j], tbl[i]
	end
	return tbl
	end

offe commented Oct 15, 2018

I just stumbled up this by accident. This is NOT Fisher-Yates, and it will not generate all permutations of the list with equal probability.

But the fix is easy. It should be:
local rand = math.random(i)

Dollar-500 commented Oct 18, 2018

so if I replace math.random(size) with math.random(i) then it’s a truly random shuffle?

hjpotter92 commented Nov 6, 2018

so if I replace math.random(size) with math.random(i) then it’s a truly random shuffle?

Yes. Check my reply on a similar question here. (https://stackoverflow.com/q/35572435/1190388)

Sleitnick commented Nov 8, 2018 •

edited

Loading

For the sake of completion, here are the fixes from the above comments. This is a proper implementation of the Fisher-Yates shuffle:

function shuffle(tbl)
  for i = #tbl, 2, -1 do
    local j = math.random(i)
    tbl[i], tbl[j] = tbl[j], tbl[i]
  end
  return tbl
end

I renamed rand to j for consistency among most implementations of this algorithm.

Cicolas commented Feb 5, 2019 •

edited

Loading

i get better results with this

shuffle = function (tbl)
    maxSize = #tbl
    for i=maxSize,1,-1 do
      tbl[i] = tbl[love.math.random(maxSize)]
    end
  end

jardon-u commented Jun 28, 2019

@nickelodeon0077 careful this is detroying the data

Author

Uradamus commented Jul 3, 2019

Wow, had no idea anyone ever looked at this besides the folks in the Love IRC shortly after I tossed this together to share it on there. For whatever reason I never received any notifications on this before now, so sorry to those who commented on this over the past year or so. Seems I did make a silly mistake though, going to update it to match Sleitnick's cleaned up version.

XeduR commented Jan 22, 2020

For the sake of completion, here are the fixes from the above comments. This is a proper implementation of the Fisher-Yates shuffle:
function shuffle(tbl)
  for i = #tbl, 2, -1 do
    local j = math.random(i)
    tbl[i], tbl[j] = tbl[j], tbl[i]
  end
  return tbl
end
I renamed rand to j for consistency among most implementations of this algorithm.

I'm arriving a bit late to this particular party, but there is an issue with this code. In Lua, you would be directly editing the original table with the aforementioned code, which means that there would be no need to return the table as the function already shuffles the original table and any table that is returned by the function would only be a reference to the original table.

Therefore if you only need to shuffle an existing table, then you can remove the "return tbl" from the end. If instead you need to create and return a new shuffled table, then you should do the following:

function shuffle(t)
  local tbl = {}
  for i = 1, #t do
    tbl[i] = t[i]
  end
  for i = #tbl, 2, -1 do
    local j = math.random(i)
    tbl[i], tbl[j] = tbl[j], tbl[i]
  end
  return tbl
end

X-Raym commented Jan 17, 2021

@XeduR
It seems that this shuffle (and the other above) can return a table similar to the first one.

Here could be a small trick to check if the table is the same or not, and reperform shiffle if it is:

-- SHUFFLE TABLE FUNCTION
function shuffle(t)
  local tbl = {}
  for i = 1, #t do
    tbl[i] = t[i]
  end
  for i = #tbl, 2, -1 do
    local j = math.random(i)
    tbl[i], tbl[j] = tbl[j], tbl[i]
  end
  if do_tables_match( t, tbl ) then
    tbl = shuffle(t)
  end
  return tbl
end

function do_tables_match( a, b )
    return table.concat(a) == table.concat(b)
end

XeduR commented Jan 26, 2021

True, but shuffling something doesn't necessarily mean that the result has to be different from the previous order. For instance, if you were to shuffle a deck of cards, there's always a chance that the cards will end up in the same order as before. By preventing this, you'd be making the shuffle less random.

If you had an array {a,b} and you were to shuffle it, then there'd be a lot of repetition, sure, however, the moment that you would enforce the result to always be different from the input, then you'd only have a single guaranteed result for each shuffle. First {a,b}, then {b,a} and then repeating.

There are other issues with using table.concat() for comparison. First, you could enter infinite loops. If you had an array of {1,11,111}, the result would always be the same. Sure, you could add symbols between the entries, but there could still be issues with certain arrays. Secondly, this approach would only ensure that two entries have changed positions. You could have an array with 10 000 entries with only two entries having swapped places and it'd pass the check. Arguably it wouldn't be much more random than just reusing the initial array. Then there's the question of "how much do you gain in terms of randomness versus how much do you lose in terms of time from the table.concat() operations?"

All in all, I wouldn't worry about whether the input and output tables match. The likelihood of them matching, especially for lengthier arrays, is small and even if said tables did match at times, that'd be expected.

X-Raym commented Jan 26, 2021

@XeduR
You are right to point out the limitations of my functions, and by no mean my functions replace all the above, it is just a particular integration for particular cases (but could be pretty common), so I want to highlight few things.

The "no initial order" feature would be preferable in some circumstances, especially if

the number of elements in a set is very low (even 2 or 3), to mimic the fact that usually large number of sets will produce a different outcome no matter what (as you mentioned). So I don't see predictable output (in the case of a two entry set) as an issue but as a feature.
the initial order and post-shuffle order are directly visible by the user, and shuffling is a result of one of his action (in this case, we can think that by "shuffling" he means "I want this but in an different order", else he may not want to perform a shuffle at all - though in a game we might leave him with the possibility to get to initial order, but in other circumstances, having the user to perform an action 2 or 3 times - for eg if the set is very small - wouldn't be very user friendly).
of course table.concat means the values are predictable in their formatting, like a list of numbers, for which concatenation with a special character will by no mean by an issue at all, and could be more CPU friendly that other methods of array comparison (at least on code writing :P - just a single line)

So maybe there is an english word for this kind of "shuffling without getting back to initial order" which would be more appropriate to name my function, but I didn't find one,
Regarding the cases I mentioned, I still thought it is a valid proposition. At least what I needed to implement was exactly that: algo for small sets of formatted values where a different order is a necessity. May not be necessary for large sets, and may be undesirable in other fields (gaming maybe).

daveola commented Feb 18, 2021 •

edited

Loading

You are right to point out the limitations of my functions, and by no mean my functions replace all the above, it is just a particular integration for particular cases (but could be pretty common), so I want to highlight few things.

Your code has other problems other than the ones that @XeduR correctly pointed out.

Consider what happens when your code is called to shuffle {1} or even {1,1,1} or, best of all, {}

Perhaps you don't have those cases in your code, but checks for them are wise. You'd be better off doing something like this (in pseudocode):

newlist = shuffle(list)
foreach element in newlist/list {
  if different, then return newlist
}
-- somehow we got the same list!
if size(newlist)>2 then swap any two random elements of newlist one time
return newlist

That way for your simple unique arrays that can have a different order, you'll create that with minimal overhead. But if you can't create that, you won't get stuck in an infinite loop.

X-Raym commented Feb 18, 2021

@daveola
In simple scripts (not framework or shared file functions) I tend to prefer making the check before calling the function, for readability.

if IsValid(t) then
  t = shuffle(t)
  -- do other stuffs
end

rather than

t_temp = shuffle(t)  -- we store in dummy variable cause we don't want out table tobe erased if it fails
if IsValid(t_temp) then -- if shuffle is good
   t = t_temp -- commit the changes to t
   t_temp = nil -- clean this var
  -- do tothers he stuff
end

Maybe there is more concized way to write but you get the idea. It is mostly about code style I suppose/

daveola commented Feb 18, 2021

While perhaps in your code {2,2} is "invalid", I would be surprised if you could guarantee that {2} is invalid.

More to the point, I would recommend always writing more general purpose code, rather than more specific code.

X-Raym commented Feb 18, 2021

@daveola
That is good advice, though remember my code is just a minimal variation over an original simple Gist code snippet which doesnt have any kind of sanitization either,
The idea here is just to show a certain logic, not a full proofed function to be implemented as it is in a solid framework.
Indeed, some might want different ways to haves error reported (second variable with string value, one array with a code message and a string, just nil, pass original table as ref etc). these requirements are projects specific and so, are left to the user. 😃

daveola commented Feb 19, 2021

Well, we could spin on this forever, but I'm not talking about project specifics. I'm talking about making your code robust so it works without having builtin requirements that will cause infinite loops if violated. It's easy to do, it's good programming practice, and it is likely to demonstrate fallacies you could be making in your assumptions.

You can claim otherwise and I won't fight you further on it, but if you're looking to write cleaner, better code, it would probably serve you well to heed this advice.

X-Raym commented Feb 20, 2021

@daevola Understood, I got your point. My snippet has an issue with special cases indeed.

What would be your take on this then ? (In Lua, not pseudocode) 😄

daveola commented Feb 21, 2021

I think my psuedocode is pretty simple to write in lua, it's almost line-by-line, that exercise is left for the reader. :)

X-Raym commented Feb 21, 2021

@daveola
Devils is in the details :P
But it is surely way more efficicient than the table.concat things as it dont need to process the whole arrays.

I also thought aboit avoiding the value comparaison and shuffle table indexes themselves are we are sure they cant be duplicate there. It can maybe be interesting in some cases like non continuous table of objects.

asbestosbill commented Mar 18, 2021

Hi, I'm new to Lua, and came here because I'm writing code to shuffle a deck of cards. If I'm reading this section correctly:

for i = #tbl, 2, -1 do local j = math.random(i) tbl[i], tbl[j] = tbl[j], tbl[i] end

It is matching the last "card" (in my case) with another random card and swapping them. Then it narrows the range by one and swaps the second to last card with any of the remaining 51 cards (including itself), and so forth. My main question is why limit the possible swap candidates as it goes? Instead of math.random(i), why not math.random(#tbl)? Seems like that would introduce greater changes, but does it just not matter much?

daveola commented Mar 19, 2021 •

edited

Loading

It doesn't introduce greater changes, and it does matter much.

Your suggestion actually makes the final list less random because it's less likely to give every card a chance to be moved.

Here's the way to look at Fisher Yates that will help explain.

Say you start with 52 cards. When you replace card 52 with a random card from the deck, you are now building a new deck, with the card you've selected (the new card in location #52). You can think of card 52 in a new stack, separate from the other 51 cards.

Now you pick another card from that 51, and put it right next to the first card in the new stack. Now they sit in positions 51, 52; and all the cards from 1-50 are still the original deck.

So you keep going - pick a card from the original deck and put it at the beginning of the new deck - the new deck keeps getting bigger, and the original deck eventually disappears.

Another way to think about doing this is to basically create a new array, and pick items from the deck randomly to put in the new array. The problem is that you need to keep resizing the original deck as you remove items, and you need to construct a new array.

The genius of the Fisher Yates is to realize that the original deck and the new deck combined will always contain no more than the 52 (or whatever number) of cards you start with, and can fit in the original array, so you don't need to resize/move items that get removed. You simply move them from the "original" portion of the array (the "left" side) to the "new" portion of the array (the "right" side) until the original side is gone and all you are left with is the new deck.

If you instead use math.random(#tbl) then you are sometimes picking your new card from the new deck as well as the original deck, and that means many of the original deck will likely not get shuffled, and your final deck will have many similarities to your original deck instead of bein truly shuffled.

asbestosbill commented Apr 20, 2021 •

edited

Loading

your final deck will have many similarities to your original deck instead of bein truly shuffled

I would say that truly shuffled decks can and do have similarities to unshuffled decks, and that the true Fisher Yates method also leaves the possibility of cards remaining where they were. To determine whether the chances are lesser or greater would require a lot of statistical math, though, so I'll just have to assume Fisher and Yates did that math.

For example, with a two-object array, the odds of it staying the same are 50% for either method—for the original method because there's only one iteration, and for my method because there are four outcomes (swap-swap, swap-noswap...) that are split in results. For a three-object array, the original method has a 1/6 chance to leave the whole thing untouched, but my method has a 4/27 chance—slightly lower.

daveola commented Apr 20, 2021

The math is not complicated. It's a truly random shuffle, so the odds of getting the same deck again (or any specific deck) is 1/N! since N! is the full number of possible decks.

To demonstrate, if you have 52 cards, and you pick the first one randomly, there are 52 possibilities. Then you pick the next card from 51 possibilities. Then pick a card from 50.. So 52x51x50x... is 52!. And that is pretty much exactly what Fisher-Yates is doing. It looks complicated, but it's not, it's doing a random selection of a new card from all cards that are still available.

asbestosbill commented Apr 20, 2021 •

edited

Loading

Correct. But for my method, the math is much more complicated because passed cards can be changed by future swaps. So I wrote some code to brute force it, and the odds of getting exactly the same deck are slightly lower with my method.

[edit] Now, average difference is a bit different, and if you have a quick way to calculate it, I'd love to hear it.
[edit2] I did a four card deck by hand and it got 76.042% vs 76.368% change: My method is slightly more random. I'll have to write code for the original F-Y method if I want to go beyond that, because no way am I doing a 5+ card deck by hand...
[edit3] I redacted my 3-card findings because I made an error. The real figures were 66.6…% vs 67.9% between original Fisher-Yates and my variation.

asbestosbill commented Apr 20, 2021

So, to summarize, my variation does introduce greater changes, but it does not matter much—the difference is very slight.

daveola commented Apr 22, 2021

I don't know what you mean by 'greater changes' but your variation is less random.

asbestosbill commented Apr 24, 2021

For either method, there is a finite number of outcomes. My script tallied up every outcome of a deck size up to 7 and compared each to the original deck to get a count of indexes with different values. I took the average of those and divided by the deck size to get a percentage of change. Those are the figures being compared a few posts ago that are very slightly higher than the percentage of average change using the original method. I also proved that the chance of ending up with an unchanged deck is smaller. So if you want to claim that it's less random, you'll have to come up with another metric to judge it by, but those are the two that make sense to me.

You've done a lot of rambling about how the original FY method operates, which was obvious from the start, but you haven't said one word, let alone, say, some math? to prove that is more random by limiting its swaps as it goes.

daveola commented Apr 26, 2021

First we can define randomness. If you look at the outcome of a shuffle given an specific initial deck, what are the odds of getting every possible deck? If the odds are the same for every possible deck (say 1/N for N possible decks) and this is regardless of the starting deck, they we know we have a fully random shuffle. You can't get "more" random than that. Anything that doesn't accomplish that is a less random shuffle.

In the simplest case, a coin flip is fully random if the outcome of the flip (two possible states) have equal probability (50%), and it's regardless of it's initial state. Anything that differs from that is less random.

If you want to have a different definition of random, then that's up to you, but you're going to be disagreeing with the rest of the world, and we probably don't need to bother with this conversation further.

F-Y is fully random. I believe I proved that above, but if you need me to clarify how, I'd be happy to try.

The reason it works is by limiting the two swaps, and I believe I clarified that above as well.

There are other possible ways to shuffle a deck, that's for sure. But if you take the F-Y method and simply remove the limiting of the swap candidate to include the last cards as well, then it cannot possibly "introduce greater changes" in terms of randomness, since F-Y is already as random as a shuffle can be.

The reason your suggestion is problematic requires fully appreciating what FY is doing.

FY is essentially taking a deck of cards and then picking a random card, and putting it in a new pile. Then pick another random card and add it to the top of that pile, and continue until you've gone through all the cards.

Your suggestion is to take a deck of cards, and then pick a random card, and put it in a new pile. Then you pick another random card, either from the old pile, or the new pile, and put it in the new pile. And you do this the same number of times as F-Y, which is once for the number of cards you have.

You'll find that a couple of things happen. In F-Y, every card is "touched" - in the sense that every card has a chance of being moved to every possible location with equal probability. Just what we want from a "shuffle". In your version, you are more likely to leave a card in it's original location.

To see that with math, considering the cards you are picking as candidates for the swap, looking at a 52 card deck.

In FY, we pick a random card for the first swap. Then we pick one of the remaining 51. Then one of the remaining 50. And so on. Every single card is guaranteed to be selected for one of the 52 swaps. And each of those swaps has an equal chance of going to one of the 52 final locations. Genius.

In your version, when you select a card to swap, you are randomly selecting from the original deck as well as the new deck.

But you want math, so I'll give you math. And it turns out, I don't even have to write it out myself, because all it took was a quick search. Here's a fantastic article on the topic:

https://blog.codinghorror.com/the-danger-of-naivete/

daveola commented Apr 26, 2021

And here's a great article explaining why FY is brilliant and has some animation to show it:

https://bost.ocks.org/mike/shuffle/

asbestosbill commented Apr 26, 2021

Oh, nice! They explained it way better than you did! Shoulda posted that first. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment