Skip to content

Instantly share code, notes, and snippets.

@lgarrison
Last active July 24, 2020 12:52
Show Gist options
  • Save lgarrison/1efabe4430429996733a9d29397423d2 to your computer and use it in GitHub Desktop.
Save lgarrison/1efabe4430429996733a9d29397423d2 to your computer and use it in GitHub Desktop.
A note on the RR term in autocorrelations
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The RR term in autocorrelations\n",
"Author: Lehman Garrison (http://lgarrison.github.io)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\newcommand{\\RR}{\\mathrm{RR}}$\n",
"$\\newcommand{\\DD}{\\mathrm{DD}}$\n",
"When computing a two-point correlation function estimator like\n",
"$\\xi(r) = \\frac{\\DD}{\\RR} - 1$,\n",
"the $\\RR$ term can be computed analytically if the domain is a periodic box.\n",
"In 2D, this is often done as\n",
"$$\\begin{align}\n",
"\\RR_i &= N A_i \\bar\\rho \\\\\n",
"&= N A_i \\frac{N}{L^2}\n",
"\\end{align}$$\n",
"where $\\RR_i$ is the expected number of random-random pairs in bin $i$, $N$ is the total number of points, $A_i$ is the area (or volume if 3D) of bin $i$, $L$ is the box size, and $\\bar\\rho$ is the average density in the box.\n",
"\n",
"However, using $\\bar\\rho = \\frac{N}{L^2}$ is not quite right. When sitting on a particle, only $N-1$ particles are available to be in a bin some distance away. The other particle is the particle you're sitting on, which is always at distance $0$. Thus, the correct expression is\n",
"$$\\RR_i = N A_i \\frac{N-1}{L^2}.$$\n",
"\n",
"The following notebook empirically demonstrates that using $N-1$ is correct, and that using $N$ introduces bias of order $\\frac{1}{N}$ into the estimator. This is a tiny correction for large $N$ problems, but important for small $N$.\n",
"\n",
"Cross-correlations of two different particle sets don't suffer from this problem; the particle you're sitting on is never part of the set of particles under consideration for pair-making.\n",
"\n",
"This $N-1$ correction is implemented in the [Corrfunc](https://github.com/manodeep/Corrfunc) code."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Generates a set of N uniform random points in boxsize L\n",
"# and returns binned pair counts\n",
"# Pair counts are doubled, and self-pairs are counted\n",
"def rand_DD(N, L, bins):\n",
" # Make a 2D random set of N particles in [0,L)\n",
" x = np.random.random((2,N))*L\n",
" \n",
" # Form the periodic distance array\n",
" dist = x[:,None] - x[...,None]\n",
" dist -= np.round(dist/L)*L\n",
" dist = (dist**2).sum(axis=0)**.5\n",
" \n",
" # Count pairs in 2 bins\n",
" DD = np.histogram(dist, bins=bins)[0]\n",
" \n",
" return DD"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DD/RR using density (N-1) = [ 1.00022627 0.99934544 0.99956887 1.00006867]\n",
"DD/RR using density (N) = [ 0.96817988 0.8994109 0.89961199 0.9000618 ]\n"
]
}
],
"source": [
"# DD parameters\n",
"N = 10\n",
"L = 1.\n",
"bins = np.linspace(0.0,0.49, 5)\n",
"n_iter = 100000 # number of times to repeat the experiment\n",
"seed = 42\n",
"np.random.seed(seed)\n",
"\n",
"# First, calculate RR analytically\n",
"dA = (bins[1:]**2 - bins[:-1]**2)*np.pi # The area of each 2D annular bin\n",
"\n",
"RR = np.empty((2, len(dA)))\n",
"RR[:] = N/L**2*dA\n",
"\n",
"# Calculate RR using N-1 and N\n",
"RR[0] *= N - 1\n",
"RR[1] *= N\n",
"\n",
"# If the first bin includes 0, the random term must include the self-pairs\n",
"if bins[0] == 0.:\n",
" RR[:, 0] += N\n",
"\n",
"xi_mean = np.zeros((2, len(bins)-1))\n",
"for i in xrange(n_iter):\n",
" DD = rand_DD(N, L, bins)\n",
" xi_mean += DD/RR\n",
" \n",
"xi_mean /= n_iter\n",
"print 'DD/RR using density (N-1) = {}'.format(xi_mean[0])\n",
"print 'DD/RR using density (N) = {}'.format(xi_mean[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we expected, using $N$ in the density leads to a $\\frac{1}{N} = 10\\%$ bias in $\\frac{\\DD}{\\RR}$, when averaging over many realizations of $\\DD$.\n",
"\n",
"The first bin has a smaller bias because we chose to include a bin that starts at $0$ separation, and thus includes self-pairs, which are indepdendent of the $N$ or $N-1$ density term."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [Root]",
"language": "python",
"name": "Python [Root]"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment