Skip to content

Instantly share code, notes, and snippets.

@danlaudk
Created August 25, 2023 13:56
Show Gist options
  • Save danlaudk/b8f8bea01608ee7172effa7a388fd343 to your computer and use it in GitHub Desktop.
Save danlaudk/b8f8bea01608ee7172effa7a388fd343 to your computer and use it in GitHub Desktop.
techniques which increase statistical power by leveraging formal models, are particularly relevant when human interaction (or limited datapoints from scaled compute) will be more relevant to run these experiments ("field experiments"). This is because such interaction or resource is limited and statistical power can be amplified by pairing statistical techniques with formal models derived from AI internals or from models of incentivized behavior (of the human and of the AI system). Within statistics, this long been the purvey of econometrics
One organization that already does research in this space is Apollo research
https://www.lesswrong.com/posts/MrdFL38Zi3DwTDkKS/apollo-research-is-hiring-evals-and-interpretability
Other orgs also indicate that experiments and causal models play a productive role ( DARPA XAI https://arxiv.org/abs/2106.05506
, and in deception too https://arxiv.org/pdf/2307.10569.pdf
Transparency methods https://newsletter.mlsafety.org/p/ml-safety-newsletter-6 like analysing circuits, weight masking, and perturbation https://newsletter.mlsafety.org/p/ml-safety-newsletter-2 will be investigated as candidates for building out empirical perspectives.
- a by-product of the above investigation is to document the state of "empirics-for-AI-safety "
- on the use-cases , i would those that trigger an audit, involving situational awareness, goal-directedness, and long-term planning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment