Skip to content

Instantly share code, notes, and snippets.

@teju85
Last active April 5, 2020 22:27
Show Gist options
  • Save teju85/0c9a16b8ccf31d6155211d0e276672a3 to your computer and use it in GitHub Desktop.
Save teju85/0c9a16b8ccf31d6155211d0e276672a3 to your computer and use it in GitHub Desktop.
Simulate the number of tests needed if we decide to "pool" the samples
// Compilation and running:
// g++ -std=c++11 covid-pooled-testing.cpp && ./a.out
#include <cstdio>
#include <stdint.h>
#include <cmath>
#include <cstdlib>
#include <algorithm>
/**
* @brief Computes number of ones in the input number
* @param num the input number
* @return the number of ones
*/
int __popcnt64(uint64_t num) {
int cnt = 0;
while (num > 0) {
cnt += num & 0x1;
num >>= 1;
}
return cnt;
}
/**
* @brief Computes the probability of the occurence of a particular combination of
* bits in the input number
* @param num input number
* @param prob probability of a bit having a value 1
* @param poolSize max number of bits to consider in the input number
* @return the probability
*
* @note this assumes IID across the bits!
*/
double probability(uint64_t num, double prob, int poolSize) {
auto numPositive = __popcnt64(num);
return std::pow(prob, numPositive) *
std::pow(1.0 - prob, poolSize - numPositive);
}
/**
* @brief total number of tests needed to ascertain the results of each
* individual for a given infection pattern. ie. i-th bit in the input
* number having a value of 1 means that i-th individual is positive.
* @param num input number containing the infection pattern
* @param poolSize number of individuals in the pool to be tested for
* @return total number of tests needed
*/
int numTestsNeeded(uint64_t num, int poolSize) {
if (num == 0 || (num == 1 && poolSize == 1)) return 1;
int ps = poolSize >> 1;
uint64_t psMask = (1ULL << ps) - 1ULL;
auto left = (num >> ps) & psMask;
auto right = num & psMask;
return numTestsNeeded(left, ps) + numTestsNeeded(right, ps) + 1;
}
/**
* @brief average number of tests needed to ascertain the results of a pool
* where averaging is done across the whole population and also across
* whole possibility of infection patterns in a pool. For a pool of size
* 4, there will be 2^4 patterns.
* @param prob probability of a person getting infected
* @param poolSize number of individuals in the pool to be tested for
* @return average number of tests needed per pool
*/
double averageTestsNeeded(double prob, int poolSize) {
auto totalTests = 0.0;
auto cnt = 1ULL << poolSize;
for (uint64_t i = 0LL; i < cnt; ++i) {
auto p = probability(i, prob, poolSize);
auto nTests = numTestsNeeded(i, poolSize);
totalTests += nTests * p;
}
return totalTests;
}
/**
* @brief average number of tests needed to ascertain the results of a pool
* when it is given that atleast one of the member in the pool is
* positive.
* @param prob probability of a person getting infected
* @param poolSize number of individuals in the pool to be tested for
* @return average number of tests needed per pool
*/
double averageTestsNeededIfAtleastOnePositive(double prob, int poolSize) {
auto totalTests = 0.0, totalProb = 0.0;
auto cnt = 1ULL << poolSize;
for (uint64_t i = 1LL; i < cnt; ++i) {
auto p = probability(i, prob, poolSize);
auto nTests = numTestsNeeded(i, poolSize);
totalProb += p;
totalTests += nTests * p;
}
return totalTests / totalProb;
}
/**
* @brief total number of *sequence* of tests needed to ascertain the results
* of each individual for a given infection pattern, when it is given
* that atleast one of them is positive.
* @param num input number containing the infection pattern
* @param poolSize number of individuals in the pool to be tested for
* @return total number of steps needed
*/
int numStepsNeeded(uint64_t num, int poolSize) {
if (num == 0 || (num == 1 && poolSize == 1)) return 1;
int ps = poolSize >> 1;
uint64_t psMask = (1ULL << ps) - 1ULL;
auto left = (num >> ps) & psMask;
auto right = num & psMask;
return std::max(numStepsNeeded(left, ps), numStepsNeeded(right, ps)) + 1;
}
/**
* @brief average number of *sequence* of tests needed to ascertain the
* results of a pool when it is given that atleast one of the member
* in the pool is positive.
* @param prob probability of a person getting infected
* @param poolSize number of individuals in the pool to be tested for
* @return average number of steps needed per pool
*/
double averageStepsNeededIfAtleastOnePositive(double prob, int poolSize) {
auto totalSteps = 0.0, totalProb = 0.0;
auto cnt = 1ULL << poolSize;
for (uint64_t i = 1LL; i < cnt; ++i) {
auto p = probability(i, prob, poolSize);
auto nTests = numStepsNeeded(i, poolSize);
totalProb += p;
totalSteps += nTests * p;
}
return totalSteps / totalProb;
}
constexpr auto TotalPopulation = 8000000000ULL;
void checkForOneCase(uint64_t totalInfected) {
const auto infectionProbability = totalInfected * 1.0 / TotalPopulation;
std::printf("Infection probability = %lf\n", infectionProbability);
for (int ps = 2; ps <= 16; ps <<= 1) {
auto nPosTests = averageTestsNeededIfAtleastOnePositive(
infectionProbability, ps);
auto nPosSteps = averageStepsNeededIfAtleastOnePositive(
infectionProbability, ps);
std::printf(" PoolSize=%2d avgTestsNeeded=%le avgPositiveTests=%le"
" avgStepsForPositives=%le\n", ps,
averageTestsNeeded(infectionProbability, ps), nPosTests,
nPosSteps);
}
}
int main(int argc, char** argv) {
for (uint64_t infected = 1000000ULL; infected <= 100000000ULL; infected += 10000000ULL)
checkForOneCase(infected);
return 0;
}
@teju85
Copy link
Author

teju85 commented Apr 4, 2020

Infection probability = 0.000125
  PoolSize= 2  avgTestsNeeded=1.000500e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.002000e+00 avgPositiveTests=5.000250e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.005999e+00 avgPositiveTests=7.001250e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.015994e+00 avgPositiveTests=9.004251e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.001375
  PoolSize= 2  avgTestsNeeded=1.005496e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.021970e+00 avgPositiveTests=5.002752e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.065834e+00 avgPositiveTests=7.013767e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.175217e+00 avgPositiveTests=9.046865e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.002625
  PoolSize= 2  avgTestsNeeded=1.010486e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.041890e+00 avgPositiveTests=5.005257e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.125396e+00 avgPositiveTests=7.026312e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.333158e+00 avgPositiveTests=9.089669e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.003875
  PoolSize= 2  avgTestsNeeded=1.015470e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.061760e+00 avgPositiveTests=5.007765e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.184686e+00 avgPositiveTests=7.038885e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.489833e+00 avgPositiveTests=9.132662e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.005125
  PoolSize= 2  avgTestsNeeded=1.020447e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.081581e+00 avgPositiveTests=5.010276e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.243706e+00 avgPositiveTests=7.051486e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.645256e+00 avgPositiveTests=9.175843e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.006375
  PoolSize= 2  avgTestsNeeded=1.025419e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.101352e+00 avgPositiveTests=5.012791e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.302457e+00 avgPositiveTests=7.064115e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.799444e+00 avgPositiveTests=9.219211e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.007625
  PoolSize= 2  avgTestsNeeded=1.030384e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.121073e+00 avgPositiveTests=5.015308e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.360940e+00 avgPositiveTests=7.076771e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=1.952410e+00 avgPositiveTests=9.262765e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.008875
  PoolSize= 2  avgTestsNeeded=1.035342e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.140745e+00 avgPositiveTests=5.017829e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.419157e+00 avgPositiveTests=7.089456e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=2.104172e+00 avgPositiveTests=9.306505e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.010125
  PoolSize= 2  avgTestsNeeded=1.040295e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.160368e+00 avgPositiveTests=5.020353e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.477110e+00 avgPositiveTests=7.102168e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=2.254741e+00 avgPositiveTests=9.350429e+00 avgStepsForPositives=5.000000e+00
Infection probability = 0.011375
  PoolSize= 2  avgTestsNeeded=1.045241e+00 avgPositiveTests=3.000000e+00 avgStepsForPositives=2.000000e+00
  PoolSize= 4  avgTestsNeeded=1.179941e+00 avgPositiveTests=5.022879e+00 avgStepsForPositives=3.000000e+00
  PoolSize= 8  avgTestsNeeded=1.534800e+00 avgPositiveTests=7.114908e+00 avgStepsForPositives=4.000000e+00
  PoolSize=16  avgTestsNeeded=2.404135e+00 avgPositiveTests=9.394536e+00 avgStepsForPositives=5.000000e+00

@teju85
Copy link
Author

teju85 commented Apr 4, 2020

  • avgTestsNeeded = average number of tests needed to ascertain all positive/negative cases across the whole pool
  • avgPositiveTests = average number of tests needed to ascertain all positive/negative cases across the whole pool, when it is given that atleast one of them is positive.
  • avgStepsForPositives = same as the previous case, but measures the number of "sequence" of tests needed to ascertain the result of the whole pool (eg: worst case, it is log2(PoolSize) + 1)

@chaithyagr
Copy link

Having a look at this now,
However, Just posting for the purpose of context:
Some results of the study (its german and I havent quite got down to the details of it): https://healthcare-in-europe.com/en/news/corona-pool-testing-increases-worldwide-capacities-many-times-over.html
https://eurekalert.org/pub_releases/2020-03/guf-pto033020.php

@teju85
Copy link
Author

teju85 commented Apr 4, 2020

ah... if the first pool test comes positive, they seem to be doing individual testing. Then, the above code doesn't apply to this scenario! (above code assumes a full binary-tree-like search)

@chaithyagr
Copy link

Quick random question, is there any particular reason to split the populations into halves for testing?
I am not aware if it is known to perform better.

You were right and I think we both can agree on the fact that the testing must in some form encode the probablities of a set of people having the disease.
I do agree with the fact that you would need to club the people with the least probability of having the disease, so mostly your idea of clubbing families together seems like a good idea in this case.

@chaithyagr
Copy link

chaithyagr commented Apr 4, 2020

(above code assumes a full binary-tree-like search)

Yes, which is why I was asking, is there any advantage of doing binary search? We could subsample the population into again N subsets..

A point here on the fact that we need enough buffer solution to carry out such multiple repeated testing also. However, one advantage of RT PCR is that it works by replication, so even having traces of virus, will show a positive.

@chaithyagr
Copy link

As of 3rd April, in India, 69245 tests are done and 2653 are positive, so the Probability of positive comes to around 0.03 :P
However, it is important to note that right now the testing is limited to suspicious cases alone

@chaithyagr
Copy link

chaithyagr commented Apr 4, 2020

Quick random question, is there any particular reason to split the populations into halves for testing?
Just putting some thoughts and math, please correct me if I am wrong..

Worse case analysis:
Say our population is N.
If we split the population into J subsets
Then in worse case, for testing we would need = > N ( 1 + 1/J + 1/J**2 + and soo on till we reach 1/N)
So in worse case, we would nearly need (making above sum as sum to infinity) J / (J-1) * N samples to test.
Now, choosing J = 2, we would need nearly 2N tests
As J tends to infinity, the number of tests tend to N

So I think the above research does a good job from worst case analysis alone, by shifting to indivitual testing, which is same as literally J = N

@chaithyagr
Copy link

@teju85 ,
As an extension to above, here is a python code for cases if we split the population to J subsegments: https://github.com/chaithyagr/sample_pool_testing/blob/master/split_to_j_subsets.ipynb

However, it's quite slow for now given all the loop and lack of vectorization. It follows quite similar principles as your code except, one major difference is that due to the speed, I rely on monte carlo simulation, which isnt exactly a good way to get accurate results.
I have added ways to add infection probablity per individual and also a way to kind off sort samples based on these probabilities.

Thoughts:

  1. Currently the code still just does hierachial method, or direct testing, based on how high J is, basically we could extend this to an array of J's at each level
  2. In real world we would never know the so called infection probablity of each indivtual, so the best way forward is to group spatially with respect to families and communities. For this I think https://github.com/3b1b/manim/blob/shaders/from_3b1b/active/sir.py is a reallly good simulation of how covid spreads, and maybe we can add some way to pool test into this based on coordinates. I would really recomment you to watch his visualizations, he is one of my favorite math simulator guys if thats a thing :P : https://www.youtube.com/watch?v=gxAaO2rsdIs&t=1185s . He tests literally all kinds of possibilies, with/out social distancing, if there are common points in communities, borders and barriers etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment