danlaudk/gist:b8f8bea01608ee7172effa7a388fd343

## gistfile1.txt
techniques which increase statistical power by leveraging formal models, are particularly relevant when human interaction (or limited datapoints from scaled compute) will be more relevant to run these experiments ("field experiments"). This is because such interaction or resource is limited and statistical power can be amplified by pairing statistical techniques with formal models derived from AI internals or from models of incentivized behavior (of the human and of the AI system).  Within statistics, this long been the purvey of econometrics

One organization that already does research in this space is Apollo research
https://www.lesswrong.com/posts/MrdFL38Zi3DwTDkKS/apollo-research-is-hiring-evals-and-interpretability

Other orgs also indicate that experiments and causal models play a productive role ( DARPA XAI  https://arxiv.org/abs/2106.05506
, and in deception too  https://arxiv.org/pdf/2307.10569.pdf

Transparency methods https://newsletter.mlsafety.org/p/ml-safety-newsletter-6 like analysing circuits, weight masking, and perturbation https://newsletter.mlsafety.org/p/ml-safety-newsletter-2 will be investigated as candidates for building out empirical perspectives.

- a by-product of the above investigation is to document the state of "empirics-for-AI-safety "
- on the use-cases , i would those that trigger an audit, involving  situational awareness, goal-directedness, and long-term planning
	techniques which increase statistical power by leveraging formal models, are particularly relevant when human interaction (or limited datapoints from scaled compute) will be more relevant to run these experiments ("field experiments"). This is because such interaction or resource is limited and statistical power can be amplified by pairing statistical techniques with formal models derived from AI internals or from models of incentivized behavior (of the human and of the AI system). Within statistics, this long been the purvey of econometrics

	One organization that already does research in this space is Apollo research
	https://www.lesswrong.com/posts/MrdFL38Zi3DwTDkKS/apollo-research-is-hiring-evals-and-interpretability

	Other orgs also indicate that experiments and causal models play a productive role ( DARPA XAI https://arxiv.org/abs/2106.05506
	, and in deception too https://arxiv.org/pdf/2307.10569.pdf

	Transparency methods https://newsletter.mlsafety.org/p/ml-safety-newsletter-6 like analysing circuits, weight masking, and perturbation https://newsletter.mlsafety.org/p/ml-safety-newsletter-2 will be investigated as candidates for building out empirical perspectives.

	- a by-product of the above investigation is to document the state of "empirics-for-AI-safety "
	- on the use-cases , i would those that trigger an audit, involving situational awareness, goal-directedness, and long-term planning