Skip to content

Instantly share code, notes, and snippets.

Last active January 15, 2021 07:10
Show Gist options
  • Save wnarifin/106a9359b1cd6ed9399a171181b78e49 to your computer and use it in GitHub Desktop.
Save wnarifin/106a9359b1cd6ed9399a171181b78e49 to your computer and use it in GitHub Desktop.
How many to sample among suspected population of COVID-19
title author date output
How many to sample among suspected population of COVID-19
Wan Nor Arifin


Research question: How many people do we need to sample to have 90% probability of detecting at least one +ve COVID-19 case among the suspected population? I may relate this question to one my post few years back at

Suppose the prevalence of COVID-19 is,

# prevalence of covid-19 in Malaysia
# ref:
p = 452.43/100000

Prevalence = 0.0045243

Suppose we set a cutoff value of 0.9 for probability. It is said that the rule of thumb to sample n = 20 if N < 50 or n = 30/10%*N if N > 50. To find whether this is reasonable, we can simulate the situations as proposed by the rule of thumb.

Rule of thumb solution

# If we sample up to 30
n = numeric(1)
n.low = 20 # lower limit of n
n.high = 30 # upper limit of n
pr = numeric(0)
n_ = numeric(0)
pr_ = numeric(0)
for(i in n.low:n.high) {
  n = i
  pr = 1 - pbinom(0, n, p)
  n_[i] = n
  pr_[i] = pr
det_p = cbind(n=n_[n.low:n.high], Probability=pr_[n.low:n.high])
det_p_max = det_p[which.max(det_p[,2]),]

The maximum probability is only 0.1271896 for n = 30.

10 percent rule

If we vary n = .1*N, where N suspected population, n = 10% of N

N = c(50, 100, 500, 1000, 5000, 10000)
det_df = data.frame(N = rep(0,6), n = rep(0,6), Probability = rep(0,6))
for(j in 1:length(N)) {
n = numeric(1)
if(N[j] == 50) {n.low = 20; n.high = 30}
if(N[j] > 50) {
n.low = 30 # lower limit of n
n.high = max(.1*N[j], 30) # upper limit of n at 10% N
pr = numeric(0)
p = 0.0045243 # prevalence of covid-19 in Malaysia
n_ = numeric(0)
pr_ = numeric(0)
for(i in n.low:n.high) {
  n = i
  pr = 1 - pbinom(0, n, p)
  n_[i] = n
  pr_[i] = pr
det_p = cbind(N=N[j], n=n_[n.low:n.high], Probability=pr_[n.low:n.high])
det_df[j, ] = det_p[which.max(det_p[,2]),]
N n Probability
50 30 0.1271896
100 30 0.1271896
500 50 0.2028626
1000 100 0.3645720
5000 500 0.8964067
10000 1000 0.9892684
We can see that we can only apply 10% rule for N of 5000, with probability of 90%


It is not sensible to apply rule of thumb of sampling n = 20 if N < 50 or n = 30/10%*N if N > 50. However, if the prevalence of COVID-19 is assumed to be higher among suspected population, this may be reasonable. This code can be changed to test that assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment