Skip to content

Instantly share code, notes, and snippets.

@crizCraig
Last active May 6, 2024 18:58
Show Gist options
  • Save crizCraig/f4c583f0ca535ce5565f3aa66353cf99 to your computer and use it in GitHub Desktop.
Save crizCraig/f4c583f0ca535ce5565f3aa66353cf99 to your computer and use it in GitHub Desktop.
Game of AGI

Game of AGI

Play the game in Colab

View simulator source

Game of AGI (GOA) is a socio-techno monte carlo simulation which estimates the existential risk posed by AGI. You play by setting various factors, i.e. AI spending and AGI safety spending, along with their uncertainties and GOA returns the existential risk, corruption risk, and value alignment probability associated with your guesses by sampling 1M times from guassian distributions of your guesses.

N.B. When referring to AGI's as separate entities, I am referring to autonomous groups of humans (i.e. companies, governments, groups, organizations) that are in control of some tech stack capable of AGI. There may be several autonomous AGIs within any certain stack, but we assume they have an umbrella objective defined by the controlling org.

Inputs

Each input parameter is modeled by a Guassian distribution with a set mean and standard deviation. When setting the standard deviation, it's helpful to remember that 67% of samples will fall within the mean +/- the standard deviation.

AGI safe by default

This is the probability that AGI will be aligned with humananity for reasons other than a general solution to long term alignment. In this "winging it" scenario, we solve safety issues as they come up in various types of applications, i.e. disinformation on Facebook, or hitting pedestrians in a self-driving car. Then as a result of iterating on more and more capable AI systems designed to safely meet our evolving needs, safe AGI emerges by default.

One should consider the likelihood of AGI emerging from Google/OpenAI when setting this, and their focus on safety relative to governments. Such labs are perhaps also more amenable to oversight than other well funded, mostly state-based groups as thes labs will be accountable to member-states' laws, and so likely safer in this respect as well.

We make the optimistic assumption that safety by default trumps corruption, i.e. a corrupt lab originating AGI will still be benevolant if AGI is safe by default.

Params

agi_safe_by_default

Value alignment probability

This is the probability that we will solve long term AGI alignment generally. For example, if iterated amplification and safety via debate provide a general solution with which we could align any AI, then this would be 1.

See the control problem.

Params

value_alignment_prob

AGI Safety Spending

Here we consider dollars as a proxy for effectiveness of AGI safety efforts.

Params

dollars_agi_safety Global annual spending in USD directly on ensuring AGI is safe. c.f. field of AGI safety

dollars_ai Global annual spending in USD on AI, eventually culminating in AGI

ideal_safety_to_ai_spend_ratio Ideal dollars_agi_safety / dollars_ai So if you think AGI safety spending should equal 1% of total spending on AI, set this to 0.01

ignore_spend Run the model without considering spending

Oversight

There are currently two far and away leaders in the race to AGI - Google and OpenAI. These two orgs have seemingly relatively beneficent goals, none-the-less such a concentration of potential power poses a major risk of those in control of AGI becoming corrupt. One solution to corruption could be some form of oversight. Oversight can also help ensure safety standards are being upheld in labs working on AGI. Things that such an oversight organization could do are:

  • Produce a safety score for top AI labs
  • Prepares courses on training for employees to spot safety violations and report them anonymously if they don't feel safe addressing them internally.
  • Provides a secure hotline for concerned scientists to raise safety concerns
  • Holds Inspections that include private, anonymous interviews with employees to get feedback on AGI safety (and sharing AGI safety breakthroughs similar to the IAEA's Technical Cooperation Progamme), and AGI progress

For now the exact form of this organization, i.e. an international agency, governmental agency, or industrial standards group is kept nebulous. Be optimistic that oversight will lead to less corruption and higher safety.

Params

oversight_dollars Global dollars currently spent on AGI safety oversight

ideal_oversight_dollars Ideal spending on oversight for AGI safety

ignore_oversight Whether to ignore the effects of oversight

Originating organizations

One key concept in measuring corruption is understanding how many autonomous groups of humans will be in control of AGI. The more such originating organizations, the less power will be concentrated in the hands of the few, and the more competition / checks and balances there will be between humans in control of AGI before the singularity. Also, the more people in each organization, the more likely there will be someone who acts as a whistle blower or mutineer in the name of humanity.

On the flip side, a larger number of AGI controlling entities introduces risk in the form of "bad apples" who are able to cause outsized destruction using AGI. This could be in the form of hackers, rogue states, and possibly militaries with short-sited goals. This is not currently modeled as the x-risk is so alarmingly high already, that increasing the risk even more doesn't seem to warrant any qualitative changes in actions that should be taken as a result of the model. i.e. We need oversight and safety spending. Also, it seems that a small number of labs are in contention to originate AGI currently so large numbers of originators aren't realistic. However, in terms of action items around promoting openness (i.e. should OpenAI be more open) - this type of modeling would help clarify that. The nuances of such a model should include the relative destructive vs constructive power of a rogue AGI organization (i.e. it is easier to destroy than to create) and the counterbalance to such destructive power offered by the higher total number of AGI controlling organizations.

Another issue not covered by the model is that of warring AGI's. I.e. if more than one AGI exists, it makes sense they would have competing goals and therefore may attempt to eliminate the other to achieve their goals. For the purposes of settings this input, consider that AI's and the humans designing them will favor cooperation over conflict.

Note that we currently consider a single entity controlling AGI to be almost certainly corrupt, and that this corruption will be an existential threat. This is extreme, but it's not obvious how extreme. For example, how much will alignment work matter if the groups developing AGI are corrupt? The model currently assumes not at all. However, it obviously depends on how corrupt the humans are. Again this is not modeled, corrupt == x-risk in this initial model.

Params

number_of_originators Number of AGI originating organizations

ppl_per_originator Number of people in the AGI originating organization

snowden_prob Probability that a single person in a corrupt originator becomes a whistleblower or even mutineer, c.f. Fritz Houtermans https://docs.google.com/document/d/164O4fmp-zsbeIenq3l-slo-VGfYUagNY/edit

Takeoff and replication

It's not just about origination but quick followers as they can be just as important. For this we need to consider things like the number of the breakthroughs needed on top of what's openly available for replication to occur. If replication can occur early within the takeoff window, we will get some of the effects of additional originators. When setting this parameter, one should consider the amount of open source work that will be available and the additional data, hardware, people, and software that will be needed to replicate an AGI. For multiple originations to occur simultaneously, we'd need some cross-originator coordination, i.e. Google trains an AI seemingly more capable than its best AI researchers at their jobs - they then coordinate with OpenAI to share the system and decentralize control over it. Without such coordination, there will be a replication delay between the first AGI and the next which this parameter models.

c.f. https://www.lesswrong.com/tag/ai-takeoff

Params

takeoff_in_days Number of days from AGI to the singularity

days_to_replicate Number of days for an independent group to replicate AGI

ignore_takeoff_and_replication Whether to ignore this and the replication delay

Model limitations

The model is currently a DAG and has no recurrence to model time. Obviously this is not the case in reality, as variables will have recurrent effects on eachother across time, but I've tried to simplify things as much as possible while still being able to express relationships between variables adequately. I'm also working with someone on a simulation across time - but this is in early stages.

Inputs are assumed to be I.I.D. Again I've tried my best to strike the right balance between expressively and simplicity here. I will try to keep this version simple, while also working on another more complex version for comparison between approaches.

All distributions are Guassian, whereas probabilities are better modeled with beta distributions. There are also some long tail effects that need to be accounted for, depending on how much uncertainty you provide in your inputs. I don't anticipate the outputs varying that widely with more accurate distributions, but we will see!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment