Created
August 25, 2023 13:56
-
-
Save danlaudk/b8f8bea01608ee7172effa7a388fd343 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
techniques which increase statistical power by leveraging formal models, are particularly relevant when human interaction (or limited datapoints from scaled compute) will be more relevant to run these experiments ("field experiments"). This is because such interaction or resource is limited and statistical power can be amplified by pairing statistical techniques with formal models derived from AI internals or from models of incentivized behavior (of the human and of the AI system). Within statistics, this long been the purvey of econometrics | |
One organization that already does research in this space is Apollo research | |
https://www.lesswrong.com/posts/MrdFL38Zi3DwTDkKS/apollo-research-is-hiring-evals-and-interpretability | |
Other orgs also indicate that experiments and causal models play a productive role ( DARPA XAI https://arxiv.org/abs/2106.05506 | |
, and in deception too https://arxiv.org/pdf/2307.10569.pdf | |
Transparency methods https://newsletter.mlsafety.org/p/ml-safety-newsletter-6 like analysing circuits, weight masking, and perturbation https://newsletter.mlsafety.org/p/ml-safety-newsletter-2 will be investigated as candidates for building out empirical perspectives. | |
- a by-product of the above investigation is to document the state of "empirics-for-AI-safety " | |
- on the use-cases , i would those that trigger an audit, involving situational awareness, goal-directedness, and long-term planning |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment