Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jxxcarlson/2b028fe107e0668d30a15b560d7ce3c3 to your computer and use it in GitHub Desktop.
Save jxxcarlson/2b028fe107e0668d30a15b560d7ce3c3 to your computer and use it in GitHub Desktop.
Schelling's model
For the Schelling model, there is one file of interest, https://github.com/jxxcarlson/schelling/blob/master/src/Schelling.elm.
The main data structures are Cell, an opaque type, and Array Cell.
I use 1D arrays, but process them as if they were 2D arrays using
the functions
location :(Int, Int) -> Int
location (row, col) =
nRows * row + col
matrixIndex : Int -> (Int, Int)
matrixIndex n =
(n // nCols, modBy nCols n)
Seems retro, but actually works well for this problem. I don't
know if it has bad performance implications.
The main function is
updateCells : List (Int, Int) -> Array Cell -> Array Cell
updateCells tupleList cellArray =
List.foldl updateCell cellArray tupleLis
The functon of the tupleList : List (Int, Int) to (a) determine the order in which
cells are updated and (b) to furnish a random number which is used
in the update process for the given cell. In practie, (via Main.elm),
the tupleList is a list or pairs where the projection to the first
coordinate is a random permutation of the list 0, 1, ... (N^2 - 1),
where the array is NxN. The projection to the second coordinate is a
list of random integers.
The function upDateCells is just a driver for
updateCell : (Int, Int) -> Array Cell -> Array Cell
updateCell (idx, rand) cellArray =
let
(row, col) = matrixIndex idx
updatedCell = updateEmotionalStateOfCellAtIndex (row, col) cellArray
in
if emotionalState updatedCell == Unsatisfied then
swapWithUnoccupiedCell rand updatedCell cellArray
else
replace updatedCell cellArray
While the app works just fine for 40x40 arrays with 500 ms delay between updates,
I would like for it perform as well as possible. In general, I am interested
in places where the code can be improved.
@folkertdev
Copy link

The main data structures are Cell, an opaque type, and Array Cell.
I use 1D arrays, but process them as if they were 2D arrays using
the functions

I don't think this is any worse than a "proper" 2D array, and it might actually be a little better because you only do one Array.get to get a value instead of 2. The reason for the might is that I'm not a 100% sure how parts of the array are reused when you update only a small part of it. In any case, a 1D array is a valid choice here

I would use a case near line 40 instead of an if.

Other than that the code looks good to me. To optimize it further I think you need to do some profiling. I usually use the chrome devtools for this, I would go about it as follows:

  • set up elm-explorations/benchmark, and create a benchmark for a relatively small input. The goal is for the benchmark to run in about 7 seconds. (in this case 10x10 would probably be a good starting point, but experiment a little).
  • run the benchmark. It is important that you use the --optimize flag to remove some noise in the profiling report, so you actually have to compile to an html file and run that (for instance with elm-reactor).
  • while the benchmark is running, profile the browser: chrome devtools has a performance tab, click the big dot to start recording. Make sure you end the recording before the benchmark finishes.

What we have now is a profile of the browser trying to run your function as fast as it can. This makes the data reasonably robust.
The profiler also keeps track of exactly where the execution time was spent. In this case the bottom-up tab of the results is most interesting.
The list does not give the function names (most are anonymous), but you can click on them to go to the source location and find the name of the function in question.

@folkertdev
Copy link

going through the code from the repository, I think a simple optimization can be made at

numberOccupied : Array Cell -> Int
numberOccupied cellArray =
   cellArray
     |> Array.filter (\cell -> occupied cell)
     |> Array.length

This will

  • allocate a new array
  • walk over cellArray
  • put the element into the new array if it satisfies the predicate
  • determine the length of the new array (this is stored at the root, so O(1))

So, it will allocate a whole new array that is possibly large, only to determine its length. A fold is more efficient

numberOccupied : Array Cell -> Int
numberOccupied cellArray =
    let folder cell count = 
            if occupied cell then 
                1 + count
            else
                count
    in
        Array.foldl folder 0 cellArray

When writing fast code, perhaps more so in garbage collected languages than otherwise, allocation is something to avoid.

for completeness:

fractionSatisfied : Array Cell -> Float
fractionSatisfied cellArray =
    let folder cell (occupied, satisfied) = 
            case cell of 
                Unoccupied -> 
                    -- no changes
                    ( occupied, satisfied )

                Occupied _ _ _ emotion -> 
                    ( occupied + 1
                    , if emotion == Satisfied then satisfied + 1 else satisfied 
                    )
    in
        Array.foldl folder (0, 0) cellArray
            |> (\(occupied, satisfied) -> toFloat satisfied / toFloat occupied)

What I've done here is some manual inlining of the helpers. I think the readability doesn't suffer too much here but in general that is something to watch out for. On the other hand, the elm compiler doesn't do much inlining (I think arithmetic and bitwise operators are the only things) and function calls are expensive, so inlining helpers can make code go faster sometimes.

@jxxcarlson
Copy link
Author

jxxcarlson commented Apr 5, 2019

I see (I think) --- is it Array.filter that is allocating. new array? Indeed the arrays can be large.

@jxxcarlson
Copy link
Author

jxxcarlson commented Apr 5, 2019

The profiling revealed the culprit -- or a culprit. I found that the function indexAux was being called more than any other of "my" functions -- 3963 times and 4158 ms, accounting for 8.68% of the execution time, compared to the most called function, F9 at 4722 ms and 9.09%. The code is below. The index function is O(N), and is used all the time. The problem is that cells can move in the array (like chess pieces on the board), so the implementation in the code below uses a linear search. I found a good solution which is O(1) and a new data structure. I'll describe it in the next post.

index : Cell -> Array Cell -> Int
index cell cellArray =
    let
      (status, idx) = Array.foldl (indexAux cell) (False, -1) cellArray
    in
      if status == True then
        idx
      else
         -1

indexAux : Cell -> Cell -> (Bool, Int) -> (Bool, Int)
indexAux givenCell cell state =
    if (id givenCell) == (id cell) then
      (True, (Tuple.second state) + 1)
    else if Tuple.first state == False then
      (False, (Tuple.second state) + 1)
    else
      state

@jxxcarlson
Copy link
Author

jxxcarlson commented Apr 6, 2019

Here is the relevant part of the old data structure:

type Cell
    = Occupied Id Threshold Identity EmotionalState
    | Unoccupied Id

type Id = Id Int

I changed it to this:

type Cell
    = Occupied CellIndex Id Threshold Identity EmotionalState
    | Unoccupied CellIndex Id

type CellIndex = CellIndex Int

type Id = Id Int

Then the index function is

idx : Cell -> Int
idx cell =
    case cell of
        Unoccupied (CellIndex k) _ -> k
        Occupied (CellIndex k) _ _ _ _ -> k

Of course one now has to be VERY careful about how cells are updated. If they are not "moved", all is OK. In this project, the only way cells move is by swapping two cells. So we run all changes through this function,
which swaps the indices when the cells are swapped. Then cells always
carry a valid index.

With Evan's amazing compiler, it took exactly one hour to conceive and implement
all the changes. The result (using the benchmark program) is a 47% decrease in
in execution time. This carries over the app. I had reduced the update interval
from 500 ms to 250 ms. But it was somewhat jerky. I've reduced the update
interval to 50ms and it is still very smooth. This is better than can be explained
by the above.

swapCells : Cell -> Cell -> Array Cell -> Array Cell
swapCells cell1 cell2 cellArray =
    let
       idx1 = idx cell1
       idx2 = idx cell2
    in
    cellArray
      |> setWithIndex idx2 (setCellIndex idx2 cell1)
      |> setWithIndex idx1 (setCellIndex idx1 cell2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment