Skip to content

Instantly share code, notes, and snippets.

@Heril
Last active May 26, 2021 21:56
Show Gist options
  • Save Heril/23e20a3210ec530299a2b41f834f03fb to your computer and use it in GitHub Desktop.
Save Heril/23e20a3210ec530299a2b41f834f03fb to your computer and use it in GitHub Desktop.
Multivariate Statistic Process Control method comparison Part 2: Generating control limits and initial look

Earlier this year, I posted the first part of a write-up for a project I worked on during a project I worked on last year.

As a reminder, my project involved taking 4 different types of correlation, test the baseline and 8 different data shifts, generating 1000 simulations with 100 points of data each. I then compared how three different multivariate control charts, Hotelling T^2, MEWMA, and MCUSUM, were in detecting a shift while holding the false rate fixed over all simulations.

For this I use the pretty standard for in control average run length (ARL) of 300. This means, on average when the process is in control, you can expect to see an out of control data point in any 300 sequential points. The simple way to do this is to calculate the control limits for each control chart using the in-control data so that this constraint is kept. For this code, the variable cARL is set to 300 to represent constraining the ARL to be 300 in this case.

t2ucl <- simData %>%
  filter(shift == "0/0") %>%
  group_by(copula, rho) %>%
  summarize(UCLt2 = quantile(t2, probs = (cARL - 1)/cARL)) %>%
  select(copula, rho, UCLt2) %>%
  spread(rho, UCLt2)%>%
  column_to_rownames(var="copula")

mewmaucl <- simData %>%
  filter(shift == "0/0") %>%
  group_by(copula, rho) %>%
  summarize(UCLme = quantile(mewma, probs = (cARL - 1)/cARL)) %>%
  select(copula, rho, UCLme) %>%
  spread(rho, UCLme)%>%
  column_to_rownames(var="copula")

mcusumucl <- simData %>%
  filter(shift == "0/0") %>%
  group_by(copula, rho) %>%
  summarize(UCLmc = quantile(mcusum, probs = (cARL - 1)/cARL)) %>%
  select(copula, rho, UCLmc) %>%
  spread(rho, UCLmc)%>%
  column_to_rownames(var="copula")

From these control limits, we can test the out of control data, and determine the ARL, where the lower the value, the quicker a shift is detected. For ease of use, if a run has zero points out of control, I will treat that as an ARL equal to the size of the run, in each of these cases that being 1000.

tmpData <- list()
index <- 1
for(i in 1:Ncopula) {
  for(j in 1:Nrho) {
    tmpData[[index]] <- splitData[[index]] %>%
      group_by(copula, rho, shift, iteration) %>%
      summarize(t2ARL = ifelse(shift == "0/0", size/sum(t2 > t2ucl[i,j]),
                              detect_index(t2, function(z)(z>t2ucl[i,j]))),
                meARL = ifelse(shift == "0/0", size/sum(mewma > mewmaucl[i,j]),
                              detect_index(mewma, function(z)(z>mewmaucl[i,j]))),
                mcARL = ifelse(shift == "0/0", size/sum(mcusum > mcusumucl[i,j]),
                              detect_index(mcusum, function(z)(z>mcusumucl[i,j])))) %>%
      mutate(t2ARL = ifelse(is.infinite(t2ARL), size, t2ARL),
             meARL = ifelse(is.infinite(meARL), size, meARL),
             mcARL = ifelse(is.infinite(mcARL), size, mcARL))
    index <- index + 1
  }
}
rm(splitData)

All that remains is to get those ARL values in a way that is easy to visualize, as well as generate confidence intervals for the simulation as a whole.

ARL <- bind_rows(tmpData) %>%
  group_by(copula, rho, shift) %>%
  summarize(t2ci = list(mean_cl_normal(t2ARL) %>%
                          rename(t2ARLmean=y, t2ARLlwr=ymin, t2ARLupr=ymax)),
            meci = list(mean_cl_normal(meARL) %>%
                          rename(meARLmean=y, meARLlwr=ymin, meARLupr=ymax)),
            mcci = list(mean_cl_normal(mcARL) %>%
                          rename(mcARLmean=y, mcARLlwr=ymin, mcARLupr=ymax))) %>%
  unnest(cols = c(t2ci, meci, mcci))

Now that we have our control chart ARL metrics, let's check that the in-control ARL is approximately 300. This corresponds when the column shift is 0/0, meaning the mean for both variables is 0. I'm selecting columns to exclude the upper and lower values for the confidence intervals.

ARL %>%
	filter(shift == "0/0") %>%
	select(c(1:4,7,10))
# A tibble: 16 x 6
# Groups:   copula, rho [16]
   copula  rho   shift t2ARLmean meARLmean mcARLmean
   <fct>   <fct> <fct>     <dbl>     <dbl>     <dbl>
 1 Normal  0.6   0/0        300.      299.      299.
 2 Normal  0.2   0/0        300.      300.      299.
 3 Normal  -0.2  0/0        300.      300.      299.
 4 Normal  -0.6  0/0        300.      299.      301.
 5 Frank   0.6   0/0        300.      299.      300.
 6 Frank   0.2   0/0        300.      299.      300.
 7 Frank   -0.2  0/0        300.      300.      298.
 8 Frank   -0.6  0/0        299.      299.      301.
 9 Clayton 0.6   0/0        300.      300.      300.
10 Clayton 0.2   0/0        300.      299.      299.
11 Clayton -0.2  0/0        299.      300.      299.
12 Clayton -0.6  0/0        299.      300.      299.
13 Gumbel  0.6   0/0        300.      301.      299.
14 Gumbel  0.2   0/0        300.      299.      300.
15 Gumbel  -0.2  0/0        299.      299.      301.
16 Gumbel  -0.6  0/0        299.      300.      299.

Things are looking good, with everything being right around 300. We've got a lot of plots we can make to visualize this, but for illustrating this simply, let's first look at the simplest, the Hotelling T^2 control chart, looking at a sample control chart as well as the ARL values for both in control and out of control for the different correlation.

Let's take a look at two of the runs for the Normal copula with a correlation of 0.6, one for when there is no shift, and one where just one of the two variables has shifted by 3.

t2run1

t2run1t2

We see in this case with no shift, that only 3 points fall above the upper control limit (UCL), which for 1000 points, that is what we would roughly expect to encounter.

t2run1shifted

t2run1shiftedt2

When simulating one of the variables being out of control with a shift of 3, we see we get a lot of points above the UCL and registered as out of control. This is exactly what we would like to happen, as in a production environment, we would want to detect this change and take action to correct it.

Finally, let's look at an overall summary of the performance for Hotelling T^2 charts for these different copulas and correlations.

t2poshigh

t2poslow

t2neglow

t2neghigh

The curious thing we see with these four graphs is that how well the T^2 performs for detecting different shifts varies between copula type and the strength of the correlation. For my next and final post on this project, I want to delve into the different types of correlations we're working with here and how they are structured and how that changes the performance of the Hotelling T^2 control chart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment