Nikolaj-K/MIRI_papers.md

## MIRI_papers.md

      
    Raw
  

              MIRI_papers.md
            
          
    Original list on the MIRI website:
https://intelligence.org/all-publications/

E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “Risks from Learned Optimization in Advanced Machine Learning Systems.” arXiv:1906.01820 [cs.AI].

39 pages
++
V Kosoy. 2019. “Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help.” Forthcoming at the Safe Machine Learning workshop at ICLR.

22 pages
+
A Demski and S Garrabrant. 2019. “Embedded Agency.” arXiv:1902.09469 [cs.AI].

32 pages
+
2019

S Armstrong and S Mindermann. 2018. “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents.” In Advances in Neural Information Processing Systems 31.

12 pages
++
D Manheim and S Garrabrant. 2018. “Categorizing Variants of Goodhart’s Law.” arXiv:1803.04585 [cs.AI].

10 pages
++
R Carey. 2018. “Incorrigibility in the CIRL Framework.” arXiv:1709.06275 [cs.AI]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

9 pages
+
2018

A Critch. 2017. “Toward Negotiable Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making.” arXiv:1701.01302 [cs.AI].

19 pages
+
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2017. “A Formal Approach to the Problem of Logical Non-Omniscience.” Paper presented at the 16th conference on Theoretical Aspects of Rationality and Knowledge.

15 pages
+++
K Grace, J Salvatier, A Dafoe, B Zhang, and O Evans. 2017. “When Will AI Exceed Human Performance? Evidence from AI Experts.” arXiv:1705.08807 [cs.AI].

21 pages
++
V Kosoy. 2017. “Forecasting Using Incomplete Models.” arXiv:1705.04630 [cs.LG].

29 pages
+
N Soares and B Levinstein. 2017. “Cheating Death in Damascus.” Paper presented at the 14th Annual Formal Epistemology Workshop.

19 pages
++
E Yudkowsky and N Soares. 2017. “Functional Decision Theory: A New Theory of Instrumental Rationality.” arXiv:1710.05060 [cs.AI].

36 pages
+++
2016

T Benson-Tilsen and N Soares. 2016. “Formalizing Convergent Instrumental Goals.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.

9 pages
++
A Critch. 2016. “Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents.” arXiv:1602.04184 [cs:GT].

16 pages
++
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “Logical Induction.” arXiv:1609.03543 [cs.AI].

20 pages
++++
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “Logical Induction (Abridged).” MIRI technical report 2016–2.

131 pages
++++
S Garrabrant, B Fallenstein, A Demski, and N Soares. 2016. “Inductive Coherence.” arXiv:1604.05288 [cs:AI]. Previously published as “Uniform Coherence.”

8 pages
+++
S Garrabrant, N Soares, and J Taylor. 2016. “Asymptotic Convergence in Online Learning with Unbounded Delays.” arXiv:1604.05280 [cs:LG].

16 pages
+
V Kosoy. 2016. “Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm.” arXiv:1608.04112 [cs.CC].

86 pages
++
J Leike, J Taylor, and B Fallenstein. 2016. “A Formal Solution to the Grain of Truth Problem.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

10 pages
++
L Orseau and S Armstrong. 2016. “Safely Interruptible Agents.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

10 pages
+++
K Sotala. 2016. “Defining Human Values for Value Learners.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.

11 pages
+
J Taylor. 2016. “Quantilizers: A Safer Alternative to Maximizers for Limited Optimization.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.

8 pages
++
J Taylor, E Yudkowsky, P LaVictoire, and A Critch. 2016. “Alignment for Advanced Machine Learning Systems.” MIRI technical report 2016–1.

25 pages
+++
2015

B Fallenstein and R Kumar. 2015. “Proof-Producing Reflection for HOL: With an Application to Model Polymorphism.” In Interactive Theorem Proving: 6th International Conference, ITP 2015, Nanjing, China, August 24-27, 2015, Proceedings. Springer.

16 pages
+
B Fallenstein and N Soares. 2015. “Vingean Reflection: Reliable Reasoning for Self-Improving Agents.” MIRI technical report 2015–2.

11 pages, arxiv
++
B Fallenstein, N Soares, and J Taylor. 2015. “Reflective Variants of Solomonoff Induction and AIXI.” In Proceedings of AGI 2015. Springer. Previously published as MIRI technical report 2015–8.

6 pages
++
B Fallenstein, J Taylor, and P Christiano. 2015. “Reflective Oracles: A Foundation for Classical Game Theory.” arXiv:1508.04145 [cs.AI]. Previously published as MIRI technical report 2015–7. Published in abridged form as “Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence” in Proceedings of LORI 2015.

8 pages
+++
S Garrabrant, S Bhaskar, A Demski, J Garrabrant, G Koleszarik, and E Lloyd. 2016. “Asymptotic Logical Uncertainty and the Benford Test.” arXiv:1510.03370 [cs.LG]. Paper presented at the Ninth Conference on Artificial General Intelligence. Previously published as MIRI technical report 2015–11.

6 pages, arxiv
+
K Grace. 2015. “The Asilomar Conference: A Case Study in Risk Mitigation.” MIRI technical report 2015–9.

68 pages
see also the one right below
+++
K Grace. 2015. “Leó Szilárd and the Danger of Nuclear Weapons: A Case Study in Risk Mitigation.” MIRI technical report 2015–10.

72 pages
+++
P LaVictoire. 2015. “An Introduction to Löb’s Theorem in MIRI Research.” MIRI technical report 2015–6.

27 pages
++++
N Soares. 2015. “Aligning Superintelligence with Human Interests: An Annotated Bibliography.” MIRI technical report 2015–5.

8 pages
+
N Soares. 2015. “Formalizing Two Problems of Realistic World-Models.” MIRI technical report 2015–3.

8 pages
++
N Soares. 2018. “The Value Learning Problem.” In Artificial Intelligence Safety and Security. Chapman and Hall. Previously presented at the IJCAI 2016 Ethics for Artificial Intelligence workshop, and published earlier as MIRI technical report 2015–4.

7 pages
+
N Soares and B Fallenstein. 2015. “Questions of Reasoning under Logical Uncertainty.” MIRI technical report 2015–1.

8 pages
++
N Soares and B Fallenstein. 2015. “Toward Idealized Decision Theory.” arXiv:1507.01986 [cs.AI]. Previously published as MIRI technical report 2014–7. Published in abridged form as “Two Attempts to Formalize Counterpossible Reasoning in Deterministic Settings” in Proceedings of AGI 2015.

15 pages
+++
&
6 pages
++
K Sotala. 2015. “Concept Learning for Safe Autonomous AI.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop.

4 pages
+
2014

S Armstrong, K Sotala, and S Ó hÉigeartaigh. 2014. “The Errors, Insights and Lessons of Famous AI Predictions – and What They Mean for the Future.” Journal of Experimental & Theoretical Artificial Intelligence 26 (3): 317–342.

31 pages
++
M Bárász, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, and E Yudkowsky. 2014. “Robust Cooperation on the Prisoner’s Dilemma: Program Equilibrium via Provability Logic.” arXiv:1401.5577 [cs.GT].

18 pages, arxiv
+++
T Benson-Tilsen. 2014. “UDT with Known Search Order.” MIRI technical report 2014–4.

8 pages
+
N Bostrom and E Yudkowsky. 2018. “The Ethics of Artificial Intelligence.” In Artificial Intelligence Safety and Security. Chapman and Hall. Previously published in The Cambridge Handbook of Artificial Intelligence (2014).

21 pages
+++
P Christiano. 2014. “Non-Omniscience, Probabilistic Inference, and Metamathematics.” MIRI technical report 2014–3.

51 pages
may or may not be succeeded by later papers
++++
B Fallenstein. 2014. “Procrastination in Probabilistic Logic.” Working paper.

3 pages
+++
B Fallenstein and N Soares. 2014. “Problems of Self-Reference in Self-Improving Space-Time Embedded Intelligence.” In Proceedings of AGI 2014. Springer.

12 pages
++
B Fallenstein and N Stiennon. 2014. “‘Loudness’: On Priors over Preference Relations.” Brief technical note.

6 pages
++
P LaVictoire, B Fallenstein, E Yudkowsky, M Bárász, P Christiano and M Herreshoff. 2014. “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem.” Paper presented at the AAAI 2014 Multiagent Interaction without Prior Coordination Workshop.

6 pages
+
L Muehlhauser and N Bostrom. 2014. “Why We Need Friendly AI.” Think 13 (36): 42–47.

7 pages
+
L Muehlhauser and B Hibbard. 2014. “Exploratory Engineering in AI.” Communications of the ACM 57 (9): 32–34.

7 pages
+
C Shulman and N Bostrom. 2014. “Embryo Selection for Cognitive Enhancement: Curiosity or Game-Changer?” Global Policy 5 (1): 85–92.

8 pages
+
N Soares. 2014. “Tiling Agents in Causal Graphs.” MIRI technical report 2014–5.

8 pages
+
N Soares and B Fallenstein. 2014. “Botworld 1.1.” MIRI technical report 2014–2.

37 pages
++
N Soares and B Fallenstein. 2017. “Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda.” In The Technological Singularity: Managing the Journey. Springer. Previously published as MIRI technical report 2014–8 under the name “Aligning Superintelligence with Human Interests: A Technical Research Agenda.”

13 pages
++
N Soares, B Fallenstein, E Yudkowsky, and S Armstrong. 2015. “Corrigibility.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop. Previously published as MIRI technical report 2014–6.

9 pages
++
E Yudkowsky. 2014. “Distributions Allowing Tiling of Staged Subjective EU Maximizers.” MIRI technical report 2014–1.

6 pages
+
2013

Beyond the papers of the earlier types (i.e. "soft" essays), this year has a few "Löbian" logic papers.
A Altair. 2013. “A Comparison of Decision Algorithms on Newcomblike Problems.” Working paper. MIRI.

17 pages
++
S Armstrong, N Bostrom, and C Shulman. 2015. “Racing to the Precipice: A Model of Artificial Intelligence Development.” AI & Society (DOI 10.1007/s00146-015-0590-7): 1-6. Previously published as Future of Humanity Institute technical report 2013–1.

9 pages
+
P Christiano, E Yudkowsky, M Herreshoff, and M Bárász. 2013. “Definability of “Truth” in Probabilistic Logic.” Draft. MIRI.

7 pages
++
B Fallenstein. 2013. “The 5-and-10 Problem and the Tiling Agents Formalism.” MIRI technical report 2013–9.

2 pages
+
B Fallenstein. 2013. “Decreasing Mathematical Strength in One Formalization of Parametric Polymorphism.” Brief technical note. MIRI.

1 page
+
B Fallenstein. 2013. “An Infinitely Descending Sequence of Sound Theories Each Proving the Next Consistent.” MIRI technical report 2013–6.

2 pages
+
B Fallenstein and A Mennen. 2013. “Predicting AGI: What Can We Say When We Know So Little?” Working paper. MIRI.

7 pages
+
K Grace. 2013. “Algorithmic Progress in Six Domains.” MIRI technical report 2013–3.

60 pages
++++
J Hahn. 2013. “Scientific Induction in Probabilistic Metamathematics.” MIRI technical report 2013–4.

3 pages
+
L Muehlhauser. 2013. “Intelligence Explosion FAQ.” Working paper. MIRI. (HTML)

22 pages
+++
L Muehlhauser and L Helm. 2013. “Intelligence Explosion and Machine Ethics.” In Singularity Hypotheses. Springer.

29 pages
+
L Muehlhauser and A Salamon. 2013. “Intelligence Explosion: Evidence and Import.” In Singularity Hypotheses. Springer. (Español) (Français) (Italiano)

27 pages
+
L Muehlhauser and C Williamson. 2013. “Ideal Advisor Theories and Personal CEV.” Working paper. MIRI.

8 pages
+++
W Sawin and A Demski. 2013. “Computable Probability Distributions Which Converge on Believing True Π1 Sentences Will Disbelieve True Π2 Sentences.” MIRI technical Report 2013–10.

5 pages
+
N Soares. 2013. “Fallenstein’s Monster.” MIRI technical report 2013–7.

9 pages
+
K Sotala and R Yampolskiy. 2014. “Responses to Catastrophic AGI Risk: A Survey.” Physica Scripta 90 (1): 1-33. Previously published as MIRI technical report 2013–2.

35 pages
+++
N Stiennon. 2013. “Recursively-Defined Logical Theories Are Well-Defined.” MIRI technical report 2013–8.

3 pages
+
R Yampolskiy and J Fox. 2013. “Artificial General Intelligence and the Human Mental Model.” In Singularity Hypotheses. Springer.

19 pages
+++
R Yampolskiy and J Fox. 2013. “Safety Engineering for Artificial General Intelligence.” Topoi 32 (2): 217–226.

21 pages
++
E Yudkowsky. 2013. “Intelligence Explosion Microeconomics.” MIRI technical report 2013–1.

96 pages.
++
E Yudkowsky. 2013. “The Procrastination Paradox.” Brief technical note. MIRI.

5 pages
+
E Yudkowsky and M Herreshoff. 2013. “Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.” Draft. MIRI.

40 pages
logic paper (Löb, etc.)
++
2012

S Armstrong and K Sotala. 2012. “How We’re Predicting AI – or Failing To.” In Beyond AI: Artificial Dreams. Pilsen: University of West Bohemia.

23 pages
Surves on past predictions
++
B Hibbard. 2012. “Avoiding Unintended AI Behaviors.” In Proceedings of AGI 2012. Springer.

13 pages
++
B Hibbard. 2012. “Decision Support for Safe AI Design.” In Proceedings of AGI 2012. Springer.

11 pages
+
L Muehlhauser. 2012. “AI Risk Bibliography 2012.” Working paper. MIRI.

11 pages
Bibliography
+
A Salamon and L Muehlhauser. 2012. “Singularity Summit 2011 Workshop Report.” Working paper. MIRI.

8 pages
+
C Shulman and N Bostrom. 2012. “How Hard Is Artificial Intelligence? Evolutionary Arguments and Selection Effects.” Journal of Consciousness Studies 19 (7–8): 103–130.

27 pages
On evolution
++
K Sotala. 2012. “Advantages of Artificial Intelligences, Uploads, and Digital Minds.” International Journal of Machine Consciousness 4 (1): 275-291.

20 pages
survey paper
+++
K Sotala and H Valpola. 2012. “Coalescing Minds: Brain Uploading-Related Group Mind Scenarios.” International Journal of Machine Consciousness 4 (1): 293–312.

22 pages
+
2011

P de Blanc. 2011. “Ontological Crises in Artificial Agents’ Value Systems.” arXiv:1105.3821 [cs.AI]

7 pages, arxiv
D Dewey. 2011. “Learning What to Value.” In Proceedings of AGI 2011. Springer.

8 pages
+
E Yudkowsky. 2011. “Complex Value Systems Are Required to Realize Valuable Futures.” In Proceedings of AGI 2011. Springer.

16 pages
++
2010

J Fox and C Shulman. 2010. “Superintelligence Does Not Imply Benevolence.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

7 pages
+
S Kaas, S Rayhawk, A Salamon, and P Salamon. 2010. “Economic Implications of Software Minds.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

8 pages
+
A Salamon, S Rayhawk, and J Kramár. 2010. “How Intelligible Is Intelligence?” In Proceedings of ECAP 2010. Verlag Dr. Hut.

8 pages
++
C Shulman. 2010. “Omohundro’s ‘Basic AI Drives’ and Catastrophic Risks.” Working paper. MIRI.

11 Pages
+
C Shulman. 2010. “Whole Brain Emulation and the Evolution of Superorganisms.” Working paper. MIRI.

10 pages
+
C Shulman and A Sandberg. 2010. “Implications of a Software-Limited Singularity.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

7 pages
Hardward and Software aspects to AGI development
+
K Sotala. 2010. “From Mostly Harmless to Civilization-Threatening.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

8 pages
Pathways to intelligence
++
N Tarleton. 2010. “Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics.” Working paper. MIRI.

10 pages
Followup on 2004, approach to discover which AI approach to take.
+
E Yudkowsky. 2010. “Timeless Decision Theory.” Working paper. MIRI.

120 pages
+++
E Yudkowsky, C Shulman, A Salamon, R Nelson, S Kaas, S Rayhawk, and T McCabe. 2010. “Reducing Long-Term Catastrophic Risks from Artificial Intelligence.” Working paper. MIRI.

7 pages
+
Pre 2010

P de Blanc. 2009. “Convergence of Expected Utility for Universal Artificial Intelligence.” arXiv:0907.5598 [cs.AI].

7 pages, arXiv
The earlier papers are all essays, comments and/or literature studies.
+
S Rayhawk, A Salamon, M Anissimov, T McCabe, and R Nelson. 2009. “Changing the Frame of AI Futurism: From Storytelling to Heavy-Tailed, High-Dimensional Probability Distributions.” Paper presented at ECAP 2009.

7 pages
+
C Shulman and S Armstrong. 2009. “Arms Control and Intelligence Explosions.” Paper presented at ECAP 2009.

6 pages
+
C Shulman, H Jonsson, and N Tarleton. 2009. “Machine Ethics and Superintelligence.” In Proceedings of AP-CAP 2009. University of Tokyo.

7 pages
++
C Shulman, N Tarleton, and H Jonsson. 2009. “Which Consequentialism? Machine Ethics and Moral Divergence.” In Proceedings of AP-CAP 2009. University of Tokyo.

5 pages
++++
E Yudkowsky. 2008. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” In Global Catastrophic Risks. Oxford University Press. Published in abridged form as “Friendly Artificial Intelligence” in Singularity Hypotheses. (官话) (Italiano) (한국어) (Português) (Pу́сский)

46 pages
+
E Yudkowsky. 2008. “Cognitive Biases Potentially Affecting Judgement of Global Risks.” In Global Catastrophic Risks. Oxford University Press. (Italiano) (Pу́сский) (Portuguese)

31 pages
+
E Yudkowsky. 2007. “Levels of Organization in General Intelligence.” In Artificial General Intelligence (Cognitive Technologies). Springer.

102 pages.
++++
E Yudkowsky. 2004. “Coherent Extrapolated Volition.” Working paper. MIRI.

38 pages.
Addon to the 2001 book.
Very casually written. (Not like an academic paper.)
++
E Yudkowsky. 2001. “Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures.” Working paper. MIRI.

282 pages.
book on future AI and it's relation to humans.
++