Skip to content

Instantly share code, notes, and snippets.

@GDKO
Last active July 14, 2017 21:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GDKO/60c3d0fd423598f3c4e4 to your computer and use it in GitHub Desktop.
Save GDKO/60c3d0fd423598f3c4e4 to your computer and use it in GitHub Desktop.
Optimising the LookerUp strategy for an Iterated Prisoner's Dilemma tournament

Preface

I would suggest before continuing to read the excellent blog post by Martin Jones.

The LookerUp strategy

The LookerUp strategy uses a 64-key lookup table (keys are 3-tuples consisting of the opponent's starting actions, the opponent's recent actions, and our recent action) to decide whether to cooperate (C) or defect (D). The actions for each key were generated using an evolutionary algorithm.

The idea

Instead of having a binary action for each key, we could have a number between 0 and 1 that gives the probability for the decision. We will use the function random_choice from Axelrod Library with random_choice(0)=D and random_choice(1)=C.

Changing some code

To accomodate the change, some code needed to change in the LookerUp strategy created by Martin Jones. The pattern in EvolvedLookerUp class needed to change into a list of numbers.

# Original pattern
pattern_orginal         = 'CDCCDCCCDCDDDDDCCDCCDDDDDCDCDDDCDDDDCCCDDCCDDDDDCDCDDDCDCDDDDDDD'

# Changed into numbers
pattern_original_number = '1011011101000001101100000101000100001110011000001010001010000000'

# Changed into a list
pattern_original_list   = [1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,
                           1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,
                           0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,
                           1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

Now instead of returning a action, we add the random_choice function.

# Original
action = self.lookup_table[key]
return action

# Changed
action = float(self.lookup_table[key])
return random_choice(action)

Both the original (Evolved) and the changed (EvolvedG) were run against the other strategies to test if anything changed. Based on a few random elements in the opponents strategies, the output should look similar but not the same.

Axelrod_pso_1

Optimising the lookup table with Particle Swarm Optimisation (PSO)

We change the score_for function from Martin's blog to request a pattern that is then passed when creating our strategy.

def score_for_pattern(my_strategy_factory,pattern, iterations=200):
    """
    Given a function that will return a strategy, 
    calculate the average score per turn
    against all ordinary strategies. If the 
    opponent is classified as stochastic, then 
    run 100 repetitions and take the average to get 
    a good estimate. 
    """
    scores_for_all_opponents = []
    for opponent in axelrod.ordinary_strategies:

        # decide whether we need to sample or not
        if opponent.classifier['stochastic']:
            repetitions = 100
        else:
            repetitions = 1
        scores_for_this_opponent = []

        # calculate an average for this opponent
        for _ in range(repetitions):
            me = my_strategy_factory(pattern)
            other = opponent()
            # make sure that both players know what length the match will be 
            me.set_tournament_attributes(length=iterations)
            other.set_tournament_attributes(length=iterations)

            scores_for_this_opponent.append(score_single(me, other, iterations))

        mean_vs_opponent = sum(scores_for_this_opponent) / len(scores_for_this_opponent)
        scores_for_all_opponents.append(mean_vs_opponent)

    # calculate the average for all opponents
    overall_average_score = sum(scores_for_all_opponents) / len(scores_for_all_opponents)
    return(overall_average_score)

And create our strategy called TestGambler that requires a pattern.

class Gambler(Player):

    name = 'Gambler'
    classifier = {
        'memory_depth': float('inf'),
        'stochastic': True,
        'makes_use_of': set(),
        'inspects_source': False,
        'manipulates_source': False,
        'manipulates_state': False
    }

    @init_args
    def __init__(self, lookup_table=None):
        """
        If no lookup table is provided to the constructor, then use the TFT one.
        """
        Player.__init__(self)

        if not lookup_table:
            lookup_table = {
            ('', 'C', 'D') : 0,
            ('', 'D', 'D') : 0,
            ('', 'C', 'C') : 1,
            ('', 'D', 'C') : 1,
        }

        self.lookup_table = lookup_table
        # Rather than pass the number of previous turns (m) to consider in as a
        # separate variable, figure it out. The number of turns is the length
        # of the second element of any given key in the dict.
        self.plays = len(list(self.lookup_table.keys())[0][1])
        # The number of opponent starting actions is the length of the first
        # element of any given key in the dict.
        self.opponent_start_plays = len(list(self.lookup_table.keys())[0][0])
        # If the table dictates to ignore the opening actions of the opponent
        # then the memory classification is adjusted
        if self.opponent_start_plays == 0:
            self.classifier['memory_depth'] = self.plays

        # Ensure that table is well-formed
        for k, v in lookup_table.items():
            if (len(k[1]) != self.plays) or (len(k[0]) != self.opponent_start_plays):
                raise ValueError("All table elements must have the same size")


    def strategy(self, opponent):
        # If there isn't enough history to lookup an action, cooperate.
        if len(self.history) < max(self.plays, self.opponent_start_plays):
            return C
        # Count backward m turns to get my own recent history.
        history_start = -1 * self.plays
        my_history = ''.join(self.history[history_start:])
        # Do the same for the opponent.
        opponent_history = ''.join(opponent.history[history_start:])
        # Get the opponents first n actions.
        opponent_start = ''.join(opponent.history[:self.opponent_start_plays])
        # Put these three strings together in a tuple.
        key = (opponent_start, my_history, opponent_history)
        # Look up the action associated with that tuple in the lookup table.
        action = float(self.lookup_table[key])
        return random_choice(action)



class TestGambler(Gambler):
    """
    A LookerUp strategy that uses pattern supplied when initialised.
    """

    name = "TestGambler"

    def __init__(self,pattern):
        plays = 2
        opponent_start_plays = 2

        # Generate the list of possible tuples, i.e. all possible combinations
        # of m actions for me, m actions for opponent, and n starting actions
        # for opponent.
        self_histories = [''.join(x) for x in product('CD', repeat=plays)]
        other_histories = [''.join(x) for x in product('CD', repeat=plays)]
        opponent_starts = [''.join(x) for x in
                           product('CD', repeat=opponent_start_plays)]
        lookup_table_keys = list(product(opponent_starts, self_histories,
                                         other_histories))

        # Zip together the keys and the action pattern to get the lookup table.
        lookup_table = dict(zip(lookup_table_keys, pattern))
        Gambler.__init__(self, lookup_table=lookup_table)

Running the PSO

We use a python library called pyswarm to perform the PSO. We set the constrain to be between (0,1) for our parameters. with lb and ub. We then try to minimise the function optimise_pso, running the pso function which outputs the numbers for the parameters (xopt) and the score of the function (fopt).

from pyswarm import pso

lb = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
ub = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

def optimizepso(x):
    return -score_for_pattern(TestGambler,x)

# The parameters of phip, phig and omega will lead to slower conversion
xopt, fopt = pso(optimizepso, lb, ub, swarmsize=100, maxiter=20, processes=60, debug=True, 
                 phip=0.8, phig=0.8, omega=0.8)

Running the optimisation led to the following numbers

pattern_pso = [1.0 ,0.0,1.0,1.0 ,0.0 ,1.0,1.0,1.0,0.0 ,1.0 ,0.0,0.0,0.0,0.0,0.0,1.0 ,
               0.93,0.0,1.0,0.67,0.42,0.0,0.0,0.0,0.0 ,1.0 ,0.0,1.0,0.0,0.0,0.0,0.48,
               0.0 ,0.0,0.0,0.0 ,1.0 ,1.0,1.0,0.0,0.19,1.0 ,1.0,0.0,0.0,0.0,0.0,0.0 ,
               1.0 ,0.0,1.0,0.0 ,0.0 ,0.0,1.0,0.0,1.0 ,0.36,0.0,0.0,0.0,0.0,0.0,0.0 ]

# The parameters are the same as the EvolvedLookerUp except for

# OpStart, SelfLast2, OpLast2
#('CD', 'DD', 'DD'): 0.48   # Occurs 0.9%
#('CD', 'CC', 'DD'): 0.67   # Occurs 0.3%
#('DC', 'DC', 'CC'): 0.19   # Occurs 0.4% 
#('CD', 'CC', 'CC'): 0.93   # Occurs 1.0%
#('CD', 'CD', 'CC'): 0.42   # Occurs 0.1%
#('DD', 'DC', 'CD'): 0.36   # Occurs 0.0%

We now run our new strategy (Gambler) with the pattern_pso list against the other strategies

Axelrod_pso_2

Backstabbing the other strategies

If we specify always to defect the last two turns, we get better results.

Axelrod_pso_3

MIT License

Copyright (c) 2016 Georgios Koutsovoulos

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

@RedXan
Copy link

RedXan commented Jul 14, 2017

Could someone put the entire improved code for GamblerBS in a reply? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment