dmbates/Issue611.ipynb

## Issue611.ipynb
{
 "cells": [
  {
   "cell_type": "raw",
   "id": "52407268",
   "metadata": {},
   "source": [
    "---\n",
    "title: Issue611\n",
    "author: Douglas Bates\n",
    "date: today\n",
    "date-format: iso\n",
    "execute:\n",
    "  keep-ipynb: true\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a5dbdf2",
   "metadata": {},
   "source": [
    "## Load packages and data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8f6b34c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "using CSV, MixedModels, ProgressMeter, Tables\n",
    "ProgressMeter.ijulia_behavior(:clear);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ca7944e",
   "metadata": {},
   "source": [
    "Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b344592e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2110-element Vector{Float64}:\n",
       "  274.388611\n",
       " 1015.990089\n",
       " 2197.288998\n",
       " 3367.868333\n",
       " 2415.089921\n",
       " 2980.472642\n",
       " 1621.304086\n",
       " 1452.363879\n",
       " 1612.956475\n",
       " 1836.63395\n",
       " 2548.059624\n",
       " 3075.149043\n",
       " 2846.338504\n",
       "    ⋮\n",
       "   90.037501\n",
       "   64.327124\n",
       "   29.44757\n",
       "   29.084692\n",
       "   27.125033\n",
       "  409.445973\n",
       "   12.593273\n",
       "  431.874531\n",
       "   37.276392\n",
       "   45.440296\n",
       "   75.674115\n",
       "   18.573065"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat = CSV.read(\"./data/model_ready_data_small.csv\", columntable)\n",
    "wts = CSV.read(\"./data/weights.csv\", columntable).weight_vector"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3289eed6",
   "metadata": {},
   "source": [
    "Also, assign a `contrasts` specification of `Grouping` to the grouping factors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "89442b55",
   "metadata": {},
   "outputs": [],
   "source": [
    "contrasts = Dict(\n",
    "    :brand_category_channel_type => Grouping(),\n",
    "    :channel_type_sub_brand => Grouping(),\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7111184e",
   "metadata": {},
   "source": [
    "With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.\n",
    "In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.\n",
    "\n",
    "For example,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "142f98f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2110-element Vector{String}:\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " \"B00\"\n",
       " ⋮\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\"\n",
       " \"B15\""
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbf43ef0",
   "metadata": {},
   "source": [
    "Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.\n",
    "\n",
    "## Initial model fits\n",
    "\n",
    "Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.\n",
    "I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d9a1e689",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\r",
      "\u001b[32mMinimizing 17 \t Time: 0:00:00 (23.36 ms/it)\u001b[39m\n"
     ]
    },
    {
     "data": {
      "text/latex": [
       "\\begin{tabular}\n",
       "{l | r | r | r | r | r}\n",
       " & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n",
       "\\hline\n",
       "(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 \\\\\n",
       "ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 &   \\\\\n",
       "base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 &   \\\\\n",
       "Residual & 33.0010 &  &  &  &  \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "|             |    Est. |     SE |      z |      p | σ_channel_type_sub_brand |\n",
       "|:----------- | -------:| ------:| ------:| ------:| ------------------------:|\n",
       "| (Intercept) | -0.0016 | 0.0072 |  -0.22 | 0.8254 |                   0.0000 |\n",
       "| ppi         | -0.8022 | 0.0718 | -11.18 | <1e-28 |                          |\n",
       "| base_price  | -0.6277 | 0.0725 |  -8.66 | <1e-17 |                          |\n",
       "| Residual    | 33.0010 |        |        |        |                          |\n"
      ],
      "text/plain": [
       "Linear mixed model fit by maximum likelihood\n",
       " volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand)\n",
       "   logLik   -2 logLik     AIC       AICc        BIC    \n",
       " -2279.1400  4558.2800  4568.2800  4568.3085  4596.5522\n",
       "\n",
       "Variance components:\n",
       "                          Column   Variance Std.Dev.\n",
       "channel_type_sub_brand (Intercept)     0.000  0.000\n",
       "Residual                            1089.065 33.001\n",
       " Number of obs: 2110; levels of grouping factors: 81\n",
       "\n",
       "  Fixed-effects parameters:\n",
       "─────────────────────────────────────────────────────\n",
       "                  Coef.  Std. Error       z  Pr(>|z|)\n",
       "─────────────────────────────────────────────────────\n",
       "(Intercept)  -0.0015897  0.00720718   -0.22    0.8254\n",
       "ppi          -0.802217   0.0717856   -11.18    <1e-28\n",
       "base_price   -0.627739   0.0724693    -8.66    <1e-17\n",
       "─────────────────────────────────────────────────────"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m1 = let\n",
    "    form = @formula(\n",
    "        volume ~ 1 + ppi + base_price +\n",
    "        (1|channel_type_sub_brand)\n",
    "    )\n",
    "    fit(MixedModel, form, dat; contrasts, wts)\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06c5e97c",
   "metadata": {},
   "source": [
    "As you can see, it converges on the boundary with a variance component of zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "099da5eb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Linear mixed model fit by maximum likelihood\n",
      " volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand)\n",
      "   logLik   -2 logLik     AIC       AICc        BIC    \n",
      " -2279.1400  4558.2800  4568.2800  4568.3085  4596.5522\n",
      "\n",
      "Variance components:\n",
      "                          Column   Variance Std.Dev.\n",
      "channel_type_sub_brand (Intercept)     0.000  0.000\n",
      "Residual                            1089.065 33.001\n",
      " Number of obs: 2110; levels of grouping factors: 81\n",
      "\n",
      "  Fixed-effects parameters:\n",
      "─────────────────────────────────────────────────────\n",
      "                  Coef.  Std. Error       z  Pr(>|z|)\n",
      "─────────────────────────────────────────────────────\n",
      "(Intercept)  -0.0015897  0.00720718   -0.22    0.8254\n",
      "ppi          -0.802217   0.0717856   -11.18    <1e-28\n",
      "base_price   -0.627739   0.0724693    -8.66    <1e-17\n",
      "─────────────────────────────────────────────────────\n"
     ]
    }
   ],
   "source": [
    "println(m1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ca38e2db",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\r",
      "\u001b[32mMinimizing 58 \t Time: 0:00:00 ( 7.66 ms/it)\u001b[39m\n"
     ]
    },
    {
     "data": {
      "text/latex": [
       "\\begin{tabular}\n",
       "{l | r | r | r | r | r}\n",
       " & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n",
       "\\hline\n",
       "(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & 0.0000 \\\\\n",
       "ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 \\\\\n",
       "base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 &   \\\\\n",
       "Residual & 31.4902 &  &  &  &  \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "|             |    Est. |     SE |     z |      p | σ_channel_type_sub_brand |\n",
       "|:----------- | -------:| ------:| -----:| ------:| ------------------------:|\n",
       "| (Intercept) | -0.0012 | 0.0069 | -0.17 | 0.8673 |                   0.0000 |\n",
       "| ppi         | -0.3071 | 0.2650 | -1.16 | 0.2465 |                   1.3776 |\n",
       "| base_price  | -0.6212 | 0.0695 | -8.94 | <1e-18 |                          |\n",
       "| Residual    | 31.4902 |        |       |        |                          |\n"
      ],
      "text/plain": [
       "Linear mixed model fit by maximum likelihood\n",
       " volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi | channel_type_sub_brand))\n",
       "   logLik   -2 logLik     AIC       AICc        BIC    \n",
       " -2211.8904  4423.7809  4435.7809  4435.8208  4469.7075\n",
       "\n",
       "Variance components:\n",
       "                          Column    Variance Std.Dev.   Corr.\n",
       "channel_type_sub_brand (Intercept)    0.00000  0.00000\n",
       "                       ppi            1.89789  1.37764   .  \n",
       "Residual                            991.63246 31.49020\n",
       " Number of obs: 2110; levels of grouping factors: 81\n",
       "\n",
       "  Fixed-effects parameters:\n",
       "────────────────────────────────────────────────────\n",
       "                  Coef.  Std. Error      z  Pr(>|z|)\n",
       "────────────────────────────────────────────────────\n",
       "(Intercept)  -0.0011511   0.0068892  -0.17    0.8673\n",
       "ppi          -0.3071      0.264971   -1.16    0.2465\n",
       "base_price   -0.621192    0.0695056  -8.94    <1e-18\n",
       "────────────────────────────────────────────────────"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m2 = let\n",
    "    form = @formula(\n",
    "        volume ~ 1 + ppi + base_price +\n",
    "        zerocorr(1 + ppi | channel_type_sub_brand)\n",
    "    )\n",
    "    fit(MixedModel, form, dat; contrasts, wts)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f6006725",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Linear mixed model fit by maximum likelihood\n",
      " volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand)\n",
      "   logLik   -2 logLik     AIC       AICc        BIC    \n",
      " -2211.8904  4423.7809  4433.7809  4433.8094  4462.0531\n",
      "\n",
      "Variance components:\n",
      "                       Column Variance Std.Dev. \n",
      "channel_type_sub_brand ppi    1.89789  1.37764\n",
      "Residual                    991.63246 31.49020\n",
      " Number of obs: 2110; levels of grouping factors: 81\n",
      "\n",
      "  Fixed-effects parameters:\n",
      "────────────────────────────────────────────────────\n",
      "                  Coef.  Std. Error      z  Pr(>|z|)\n",
      "────────────────────────────────────────────────────\n",
      "(Intercept)  -0.0011511   0.0068892  -0.17    0.8673\n",
      "ppi          -0.3071      0.264971   -1.16    0.2465\n",
      "base_price   -0.621192    0.0695056  -8.94    <1e-18\n",
      "────────────────────────────────────────────────────\n"
     ]
    }
   ],
   "source": [
    "m2a = let\n",
    "    form = @formula(\n",
    "        volume ~ 1 + ppi + base_price +\n",
    "        (0 + ppi | channel_type_sub_brand)\n",
    "    )\n",
    "    fit(MixedModel, form, dat; contrasts, wts)\n",
    "end\n",
    "println(m2a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "450c3c31",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Linear mixed model fit by maximum likelihood\n",
      " volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi | channel_type_sub_brand))\n",
      "   logLik   -2 logLik     AIC       AICc        BIC    \n",
      " -2211.8904  4423.7809  4435.7809  4435.8208  4469.7075\n",
      "\n",
      "Variance components:\n",
      "                          Column    Variance Std.Dev.   Corr.\n",
      "channel_type_sub_brand (Intercept)    0.00000  0.00000\n",
      "                       ppi            1.89789  1.37764   .  \n",
      "Residual                            991.63246 31.49020\n",
      " Number of obs: 2110; levels of grouping factors: 81\n",
      "\n",
      "  Fixed-effects parameters:\n",
      "────────────────────────────────────────────────────\n",
      "                  Coef.  Std. Error      z  Pr(>|z|)\n",
      "────────────────────────────────────────────────────\n",
      "(Intercept)  -0.0011511   0.0068892  -0.17    0.8673\n",
      "ppi          -0.3071      0.264971   -1.16    0.2465\n",
      "base_price   -0.621192    0.0695056  -8.94    <1e-18\n",
      "────────────────────────────────────────────────────\n"
     ]
    }
   ],
   "source": [
    "println(m2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a2043b8a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\r",
      "\u001b[32mMinimizing 34 \t Time: 0:00:00 (16.69 ms/it)\u001b[39m\n"
     ]
    },
    {
     "data": {
      "text/latex": [
       "\\begin{tabular}\n",
       "{l | r | r | r | r | r | r}\n",
       " & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n",
       "\\hline\n",
       "(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 & 0.0000 \\\\\n",
       "ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 &   &   \\\\\n",
       "base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 &   &   \\\\\n",
       "Residual & 33.0010 &  &  &  &  &  \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "|             |    Est. |     SE |      z |      p | σ_channel_type_sub_brand | σ_brand_category_channel_type |\n",
       "|:----------- | -------:| ------:| ------:| ------:| ------------------------:| -----------------------------:|\n",
       "| (Intercept) | -0.0016 | 0.0072 |  -0.22 | 0.8254 |                   0.0000 |                        0.0000 |\n",
       "| ppi         | -0.8022 | 0.0718 | -11.18 | <1e-28 |                          |                               |\n",
       "| base_price  | -0.6277 | 0.0725 |  -8.66 | <1e-17 |                          |                               |\n",
       "| Residual    | 33.0010 |        |        |        |                          |                               |\n"
      ],
      "text/plain": [
       "Linear mixed model fit by maximum likelihood\n",
       " volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand) + (1 | brand_category_channel_type)\n",
       "   logLik   -2 logLik     AIC       AICc        BIC    \n",
       " -2279.1400  4558.2800  4570.2800  4570.3199  4604.2066\n",
       "\n",
       "Variance components:\n",
       "                               Column   Variance Std.Dev.\n",
       "channel_type_sub_brand      (Intercept)     0.000  0.000\n",
       "brand_category_channel_type (Intercept)     0.000  0.000\n",
       "Residual                                 1089.065 33.001\n",
       " Number of obs: 2110; levels of grouping factors: 81, 16\n",
       "\n",
       "  Fixed-effects parameters:\n",
       "─────────────────────────────────────────────────────\n",
       "                  Coef.  Std. Error       z  Pr(>|z|)\n",
       "─────────────────────────────────────────────────────\n",
       "(Intercept)  -0.0015897  0.00720718   -0.22    0.8254\n",
       "ppi          -0.802217   0.0717856   -11.18    <1e-28\n",
       "base_price   -0.627739   0.0724693    -8.66    <1e-17\n",
       "─────────────────────────────────────────────────────"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m3 = let\n",
    "    form = @formula(\n",
    "        volume ~ 1 + ppi + base_price +\n",
    "        (1 | channel_type_sub_brand) +\n",
    "        (1 | brand_category_channel_type)\n",
    "    )\n",
    "    fit(MixedModel, form, dat; contrasts, wts)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b87b114a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "┌ Warning: Cholesky factorization failed\n",
      "└ @ Main In[12]:12\n"
     ]
    }
   ],
   "source": [
    "m4 = let\n",
    "    form = @formula(\n",
    "        volume ~ 1 + ppi + base_price +\n",
    "        (1 | channel_type_sub_brand) +\n",
    "        zerocorr(1 + ppi | brand_category_channel_type)\n",
    "    )\n",
    "    LinearMixedModel(form, dat; contrasts, wts)\n",
    "end\n",
    "try\n",
    "    updateL!(m4)\n",
    "catch\n",
    "    @warn \"Cholesky factorization failed\"\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "473d9216",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/latex": [
       "\\begin{tabular}\n",
       "{l | r | r | r | r | r | r}\n",
       " & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n",
       "\\hline\n",
       "(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 &   &   \\\\\n",
       "ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 & 0.0000 \\\\\n",
       "base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 &   &   \\\\\n",
       "Residual & 31.4902 &  &  &  &  &  \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "|             |    Est. |     SE |     z |      p | σ_channel_type_sub_brand | σ_brand_category_channel_type |\n",
       "|:----------- | -------:| ------:| -----:| ------:| ------------------------:| -----------------------------:|\n",
       "| (Intercept) | -0.0012 | 0.0069 | -0.17 | 0.8673 |                          |                               |\n",
       "| ppi         | -0.3071 | 0.2650 | -1.16 | 0.2465 |                   1.3776 |                        0.0000 |\n",
       "| base_price  | -0.6212 | 0.0695 | -8.94 | <1e-18 |                          |                               |\n",
       "| Residual    | 31.4902 |        |       |        |                          |                               |\n"
      ],
      "text/plain": [
       "Linear mixed model fit by maximum likelihood\n",
       " volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand) + (0 + ppi | brand_category_channel_type)\n",
       "   logLik   -2 logLik     AIC       AICc        BIC    \n",
       " -2211.8904  4423.7809  4435.7809  4435.8208  4469.7075\n",
       "\n",
       "Variance components:\n",
       "                            Column Variance Std.Dev. \n",
       "channel_type_sub_brand      ppi    1.89789  1.37764\n",
       "brand_category_channel_type ppi    0.00000  0.00000\n",
       "Residual                         991.63246 31.49020\n",
       " Number of obs: 2110; levels of grouping factors: 81, 16\n",
       "\n",
       "  Fixed-effects parameters:\n",
       "────────────────────────────────────────────────────\n",
       "                  Coef.  Std. Error      z  Pr(>|z|)\n",
       "────────────────────────────────────────────────────\n",
       "(Intercept)  -0.0011511   0.0068892  -0.17    0.8673\n",
       "ppi          -0.307101    0.264971   -1.16    0.2465\n",
       "base_price   -0.621192    0.0695056  -8.94    <1e-18\n",
       "────────────────────────────────────────────────────"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m5 = let\n",
    "    form = @formula(\n",
    "        volume ~ 1 + ppi + base_price +\n",
    "        (0 + ppi | channel_type_sub_brand) +\n",
    "        (0 + ppi | brand_category_channel_type)\n",
    "    )\n",
    "    fit(MixedModel, form, dat; contrasts, wts)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "bd45c758",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Linear mixed model fit by maximum likelihood\n",
      " volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand) + (0 + ppi | brand_category_channel_type)\n",
      "   logLik   -2 logLik     AIC       AICc        BIC    \n",
      " -2211.8904  4423.7809  4435.7809  4435.8208  4469.7075\n",
      "\n",
      "Variance components:\n",
      "                            Column Variance Std.Dev. \n",
      "channel_type_sub_brand      ppi    1.89789  1.37764\n",
      "brand_category_channel_type ppi    0.00000  0.00000\n",
      "Residual                         991.63246 31.49020\n",
      " Number of obs: 2110; levels of grouping factors: 81, 16\n",
      "\n",
      "  Fixed-effects parameters:\n",
      "────────────────────────────────────────────────────\n",
      "                  Coef.  Std. Error      z  Pr(>|z|)\n",
      "────────────────────────────────────────────────────\n",
      "(Intercept)  -0.0011511   0.0068892  -0.17    0.8673\n",
      "ppi          -0.307101    0.264971   -1.16    0.2465\n",
      "base_price   -0.621192    0.0695056  -8.94    <1e-18\n",
      "────────────────────────────────────────────────────\n"
     ]
    }
   ],
   "source": [
    "println(m5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c27042ae",
   "metadata": {},
   "source": [
    "So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.7.2",
   "language": "julia",
   "name": "julia-1.7"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

## Issue611.qmd
---
title: "Issue611"
author: "Douglas Bates"
jupyter: julia-1.7
date: today
date-format: iso
execute:
  keep-ipynb: true
---

## Load packages and data

```{julia}
using CSV, MixedModels, ProgressMeter, Tables
ProgressMeter.ijulia_behavior(:clear);
```

Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor,
```{julia}
dat = CSV.read("./data/model_ready_data_small.csv", columntable)
wts = CSV.read("./data/weights.csv", columntable).weight_vector
```

Also, assign a `contrasts` specification of `Grouping` to the grouping factors
```{julia}
contrasts = Dict(
    :brand_category_channel_type => Grouping(),
    :channel_type_sub_brand => Grouping(),
);
```

With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.
In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.

For example,

```{julia}
brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))
```

Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.

## Initial model fits

Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.
I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace.

```{julia}
m1 = let
    form = @formula(
        volume ~ 1 + ppi + base_price +
        (1|channel_type_sub_brand)
    )
    fit(MixedModel, form, dat; contrasts, wts)
end
```

As you can see, it converges on the boundary with a variance component of zero.

```{julia}
println(m1)
```


```{julia}
m2 = let
    form = @formula(
        volume ~ 1 + ppi + base_price +
        zerocorr(1 + ppi | channel_type_sub_brand)
    )
    fit(MixedModel, form, dat; contrasts, wts)
end
```

```{julia}
m2a = let
    form = @formula(
        volume ~ 1 + ppi + base_price +
        (0 + ppi | channel_type_sub_brand)
    )
    fit(MixedModel, form, dat; contrasts, wts)
end
println(m2a)
```


```{julia}
println(m2)
```

```{julia}
m3 = let
    form = @formula(
        volume ~ 1 + ppi + base_price +
        (1 | channel_type_sub_brand) +
        (1 | brand_category_channel_type)
    )
    fit(MixedModel, form, dat; contrasts, wts)
end
```

```{julia}
m4 = let
    form = @formula(
        volume ~ 1 + ppi + base_price +
        (1 | channel_type_sub_brand) +
        zerocorr(1 + ppi | brand_category_channel_type)
    )
    LinearMixedModel(form, dat; contrasts, wts)
end
try
    updateL!(m4)
catch
    @warn "Cholesky factorization failed"
end
```

```{julia}
m5 = let
    form = @formula(
        volume ~ 1 + ppi + base_price +
        (0 + ppi | channel_type_sub_brand) +
        (0 + ppi | brand_category_channel_type)
    )
    fit(MixedModel, form, dat; contrasts, wts)
end
```


```{julia}
println(m5)
```

So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else.

## Project.toml
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
	{
	"cells": [
	{
	"cell_type": "raw",
	"id": "52407268",
	"metadata": {},
	"source": [
	"---\n",
	"title: Issue611\n",
	"author: Douglas Bates\n",
	"date: today\n",
	"date-format: iso\n",
	"execute:\n",
	" keep-ipynb: true\n",
	"---"
	]
	},
	{
	"cell_type": "markdown",
	"id": "8a5dbdf2",
	"metadata": {},
	"source": [
	"## Load packages and data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"id": "8f6b34c7",
	"metadata": {},
	"outputs": [],
	"source": [
	"using CSV, MixedModels, ProgressMeter, Tables\n",
	"ProgressMeter.ijulia_behavior(:clear);"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7ca7944e",
	"metadata": {},
	"source": [
	"Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor,"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "b344592e",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"2110-element Vector{Float64}:\n",
	" 274.388611\n",
	" 1015.990089\n",
	" 2197.288998\n",
	" 3367.868333\n",
	" 2415.089921\n",
	" 2980.472642\n",
	" 1621.304086\n",
	" 1452.363879\n",
	" 1612.956475\n",
	" 1836.63395\n",
	" 2548.059624\n",
	" 3075.149043\n",
	" 2846.338504\n",
	" ⋮\n",
	" 90.037501\n",
	" 64.327124\n",
	" 29.44757\n",
	" 29.084692\n",
	" 27.125033\n",
	" 409.445973\n",
	" 12.593273\n",
	" 431.874531\n",
	" 37.276392\n",
	" 45.440296\n",
	" 75.674115\n",
	" 18.573065"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"dat = CSV.read(\"./data/model_ready_data_small.csv\", columntable)\n",
	"wts = CSV.read(\"./data/weights.csv\", columntable).weight_vector"
	]
	},
	{
	"cell_type": "markdown",
	"id": "3289eed6",
	"metadata": {},
	"source": [
	"Also, assign a `contrasts` specification of `Grouping` to the grouping factors"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"id": "89442b55",
	"metadata": {},
	"outputs": [],
	"source": [
	"contrasts = Dict(\n",
	" :brand_category_channel_type => Grouping(),\n",
	" :channel_type_sub_brand => Grouping(),\n",
	");"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7111184e",
	"metadata": {},
	"source": [
	"With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.\n",
	"In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.\n",
	"\n",
	"For example,"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"id": "142f98f1",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"2110-element Vector{String}:\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" \"B00\"\n",
	" ⋮\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\"\n",
	" \"B15\""
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))"
	]
	},
	{
	"cell_type": "markdown",
	"id": "cbf43ef0",
	"metadata": {},
	"source": [
	"Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.\n",
	"\n",
	"## Initial model fits\n",
	"\n",
	"Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.\n",
	"I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"id": "d9a1e689",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"\r",
	"\u001b[32mMinimizing 17 \t Time: 0:00:00 (23.36 ms/it)\u001b[39m\n"
	]
	},
	{
	"data": {
	"text/latex": [
	"\\begin{tabular}\n",
	"{l \| r \| r \| r \| r \| r}\n",
	" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n",
	"\\hline\n",
	"(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 \\\\\n",
	"ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 & \\\\\n",
	"base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 & \\\\\n",
	"Residual & 33.0010 & & & & \\\\\n",
	"\\end{tabular}\n"
	],
	"text/markdown": [
	"\| \| Est. \| SE \| z \| p \| σ_channel_type_sub_brand \|\n",
	"\|:----------- \| -------:\| ------:\| ------:\| ------:\| ------------------------:\|\n",
	"\| (Intercept) \| -0.0016 \| 0.0072 \| -0.22 \| 0.8254 \| 0.0000 \|\n",
	"\| ppi \| -0.8022 \| 0.0718 \| -11.18 \| <1e-28 \| \|\n",
	"\| base_price \| -0.6277 \| 0.0725 \| -8.66 \| <1e-17 \| \|\n",
	"\| Residual \| 33.0010 \| \| \| \| \|\n"
	],
	"text/plain": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + (1 \| channel_type_sub_brand)\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2279.1400 4558.2800 4568.2800 4568.3085 4596.5522\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev.\n",
	"channel_type_sub_brand (Intercept) 0.000 0.000\n",
	"Residual 1089.065 33.001\n",
	" Number of obs: 2110; levels of grouping factors: 81\n",
	"\n",
	" Fixed-effects parameters:\n",
	"─────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"─────────────────────────────────────────────────────\n",
	"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n",
	"ppi -0.802217 0.0717856 -11.18 <1e-28\n",
	"base_price -0.627739 0.0724693 -8.66 <1e-17\n",
	"─────────────────────────────────────────────────────"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"m1 = let\n",
	" form = @formula(\n",
	" volume ~ 1 + ppi + base_price +\n",
	" (1\|channel_type_sub_brand)\n",
	" )\n",
	" fit(MixedModel, form, dat; contrasts, wts)\n",
	"end"
	]
	},
	{
	"cell_type": "markdown",
	"id": "06c5e97c",
	"metadata": {},
	"source": [
	"As you can see, it converges on the boundary with a variance component of zero."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"id": "099da5eb",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + (1 \| channel_type_sub_brand)\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2279.1400 4558.2800 4568.2800 4568.3085 4596.5522\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev.\n",
	"channel_type_sub_brand (Intercept) 0.000 0.000\n",
	"Residual 1089.065 33.001\n",
	" Number of obs: 2110; levels of grouping factors: 81\n",
	"\n",
	" Fixed-effects parameters:\n",
	"─────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"─────────────────────────────────────────────────────\n",
	"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n",
	"ppi -0.802217 0.0717856 -11.18 <1e-28\n",
	"base_price -0.627739 0.0724693 -8.66 <1e-17\n",
	"─────────────────────────────────────────────────────\n"
	]
	}
	],
	"source": [
	"println(m1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"id": "ca38e2db",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"\r",
	"\u001b[32mMinimizing 58 \t Time: 0:00:00 ( 7.66 ms/it)\u001b[39m\n"
	]
	},
	{
	"data": {
	"text/latex": [
	"\\begin{tabular}\n",
	"{l \| r \| r \| r \| r \| r}\n",
	" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n",
	"\\hline\n",
	"(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & 0.0000 \\\\\n",
	"ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 \\\\\n",
	"base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 & \\\\\n",
	"Residual & 31.4902 & & & & \\\\\n",
	"\\end{tabular}\n"
	],
	"text/markdown": [
	"\| \| Est. \| SE \| z \| p \| σ_channel_type_sub_brand \|\n",
	"\|:----------- \| -------:\| ------:\| -----:\| ------:\| ------------------------:\|\n",
	"\| (Intercept) \| -0.0012 \| 0.0069 \| -0.17 \| 0.8673 \| 0.0000 \|\n",
	"\| ppi \| -0.3071 \| 0.2650 \| -1.16 \| 0.2465 \| 1.3776 \|\n",
	"\| base_price \| -0.6212 \| 0.0695 \| -8.94 \| <1e-18 \| \|\n",
	"\| Residual \| 31.4902 \| \| \| \| \|\n"
	],
	"text/plain": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi \| channel_type_sub_brand))\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev. Corr.\n",
	"channel_type_sub_brand (Intercept) 0.00000 0.00000\n",
	" ppi 1.89789 1.37764 . \n",
	"Residual 991.63246 31.49020\n",
	" Number of obs: 2110; levels of grouping factors: 81\n",
	"\n",
	" Fixed-effects parameters:\n",
	"────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"────────────────────────────────────────────────────\n",
	"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
	"ppi -0.3071 0.264971 -1.16 0.2465\n",
	"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
	"────────────────────────────────────────────────────"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"m2 = let\n",
	" form = @formula(\n",
	" volume ~ 1 + ppi + base_price +\n",
	" zerocorr(1 + ppi \| channel_type_sub_brand)\n",
	" )\n",
	" fit(MixedModel, form, dat; contrasts, wts)\n",
	"end"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"id": "f6006725",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + (0 + ppi \| channel_type_sub_brand)\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2211.8904 4423.7809 4433.7809 4433.8094 4462.0531\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev. \n",
	"channel_type_sub_brand ppi 1.89789 1.37764\n",
	"Residual 991.63246 31.49020\n",
	" Number of obs: 2110; levels of grouping factors: 81\n",
	"\n",
	" Fixed-effects parameters:\n",
	"────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"────────────────────────────────────────────────────\n",
	"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
	"ppi -0.3071 0.264971 -1.16 0.2465\n",
	"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
	"────────────────────────────────────────────────────\n"
	]
	}
	],
	"source": [
	"m2a = let\n",
	" form = @formula(\n",
	" volume ~ 1 + ppi + base_price +\n",
	" (0 + ppi \| channel_type_sub_brand)\n",
	" )\n",
	" fit(MixedModel, form, dat; contrasts, wts)\n",
	"end\n",
	"println(m2a)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"id": "450c3c31",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi \| channel_type_sub_brand))\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev. Corr.\n",
	"channel_type_sub_brand (Intercept) 0.00000 0.00000\n",
	" ppi 1.89789 1.37764 . \n",
	"Residual 991.63246 31.49020\n",
	" Number of obs: 2110; levels of grouping factors: 81\n",
	"\n",
	" Fixed-effects parameters:\n",
	"────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"────────────────────────────────────────────────────\n",
	"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
	"ppi -0.3071 0.264971 -1.16 0.2465\n",
	"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
	"────────────────────────────────────────────────────\n"
	]
	}
	],
	"source": [
	"println(m2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"id": "a2043b8a",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"\r",
	"\u001b[32mMinimizing 34 \t Time: 0:00:00 (16.69 ms/it)\u001b[39m\n"
	]
	},
	{
	"data": {
	"text/latex": [
	"\\begin{tabular}\n",
	"{l \| r \| r \| r \| r \| r \| r}\n",
	" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n",
	"\\hline\n",
	"(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 & 0.0000 \\\\\n",
	"ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 & & \\\\\n",
	"base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 & & \\\\\n",
	"Residual & 33.0010 & & & & & \\\\\n",
	"\\end{tabular}\n"
	],
	"text/markdown": [
	"\| \| Est. \| SE \| z \| p \| σ_channel_type_sub_brand \| σ_brand_category_channel_type \|\n",
	"\|:----------- \| -------:\| ------:\| ------:\| ------:\| ------------------------:\| -----------------------------:\|\n",
	"\| (Intercept) \| -0.0016 \| 0.0072 \| -0.22 \| 0.8254 \| 0.0000 \| 0.0000 \|\n",
	"\| ppi \| -0.8022 \| 0.0718 \| -11.18 \| <1e-28 \| \| \|\n",
	"\| base_price \| -0.6277 \| 0.0725 \| -8.66 \| <1e-17 \| \| \|\n",
	"\| Residual \| 33.0010 \| \| \| \| \| \|\n"
	],
	"text/plain": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + (1 \| channel_type_sub_brand) + (1 \| brand_category_channel_type)\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2279.1400 4558.2800 4570.2800 4570.3199 4604.2066\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev.\n",
	"channel_type_sub_brand (Intercept) 0.000 0.000\n",
	"brand_category_channel_type (Intercept) 0.000 0.000\n",
	"Residual 1089.065 33.001\n",
	" Number of obs: 2110; levels of grouping factors: 81, 16\n",
	"\n",
	" Fixed-effects parameters:\n",
	"─────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"─────────────────────────────────────────────────────\n",
	"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n",
	"ppi -0.802217 0.0717856 -11.18 <1e-28\n",
	"base_price -0.627739 0.0724693 -8.66 <1e-17\n",
	"─────────────────────────────────────────────────────"
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"m3 = let\n",
	" form = @formula(\n",
	" volume ~ 1 + ppi + base_price +\n",
	" (1 \| channel_type_sub_brand) +\n",
	" (1 \| brand_category_channel_type)\n",
	" )\n",
	" fit(MixedModel, form, dat; contrasts, wts)\n",
	"end"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"id": "b87b114a",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"┌ Warning: Cholesky factorization failed\n",
	"└ @ Main In[12]:12\n"
	]
	}
	],
	"source": [
	"m4 = let\n",
	" form = @formula(\n",
	" volume ~ 1 + ppi + base_price +\n",
	" (1 \| channel_type_sub_brand) +\n",
	" zerocorr(1 + ppi \| brand_category_channel_type)\n",
	" )\n",
	" LinearMixedModel(form, dat; contrasts, wts)\n",
	"end\n",
	"try\n",
	" updateL!(m4)\n",
	"catch\n",
	" @warn \"Cholesky factorization failed\"\n",
	"end"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"id": "473d9216",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/latex": [
	"\\begin{tabular}\n",
	"{l \| r \| r \| r \| r \| r \| r}\n",
	" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n",
	"\\hline\n",
	"(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & & \\\\\n",
	"ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 & 0.0000 \\\\\n",
	"base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 & & \\\\\n",
	"Residual & 31.4902 & & & & & \\\\\n",
	"\\end{tabular}\n"
	],
	"text/markdown": [
	"\| \| Est. \| SE \| z \| p \| σ_channel_type_sub_brand \| σ_brand_category_channel_type \|\n",
	"\|:----------- \| -------:\| ------:\| -----:\| ------:\| ------------------------:\| -----------------------------:\|\n",
	"\| (Intercept) \| -0.0012 \| 0.0069 \| -0.17 \| 0.8673 \| \| \|\n",
	"\| ppi \| -0.3071 \| 0.2650 \| -1.16 \| 0.2465 \| 1.3776 \| 0.0000 \|\n",
	"\| base_price \| -0.6212 \| 0.0695 \| -8.94 \| <1e-18 \| \| \|\n",
	"\| Residual \| 31.4902 \| \| \| \| \| \|\n"
	],
	"text/plain": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + (0 + ppi \| channel_type_sub_brand) + (0 + ppi \| brand_category_channel_type)\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev. \n",
	"channel_type_sub_brand ppi 1.89789 1.37764\n",
	"brand_category_channel_type ppi 0.00000 0.00000\n",
	"Residual 991.63246 31.49020\n",
	" Number of obs: 2110; levels of grouping factors: 81, 16\n",
	"\n",
	" Fixed-effects parameters:\n",
	"────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"────────────────────────────────────────────────────\n",
	"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
	"ppi -0.307101 0.264971 -1.16 0.2465\n",
	"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
	"────────────────────────────────────────────────────"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"m5 = let\n",
	" form = @formula(\n",
	" volume ~ 1 + ppi + base_price +\n",
	" (0 + ppi \| channel_type_sub_brand) +\n",
	" (0 + ppi \| brand_category_channel_type)\n",
	" )\n",
	" fit(MixedModel, form, dat; contrasts, wts)\n",
	"end"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"id": "bd45c758",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Linear mixed model fit by maximum likelihood\n",
	" volume ~ 1 + ppi + base_price + (0 + ppi \| channel_type_sub_brand) + (0 + ppi \| brand_category_channel_type)\n",
	" logLik -2 logLik AIC AICc BIC \n",
	" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
	"\n",
	"Variance components:\n",
	" Column Variance Std.Dev. \n",
	"channel_type_sub_brand ppi 1.89789 1.37764\n",
	"brand_category_channel_type ppi 0.00000 0.00000\n",
	"Residual 991.63246 31.49020\n",
	" Number of obs: 2110; levels of grouping factors: 81, 16\n",
	"\n",
	" Fixed-effects parameters:\n",
	"────────────────────────────────────────────────────\n",
	" Coef. Std. Error z Pr(>\|z\|)\n",
	"────────────────────────────────────────────────────\n",
	"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
	"ppi -0.307101 0.264971 -1.16 0.2465\n",
	"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
	"────────────────────────────────────────────────────\n"
	]
	}
	],
	"source": [
	"println(m5)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "c27042ae",
	"metadata": {},
	"source": [
	"So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Julia 1.7.2",
	"language": "julia",
	"name": "julia-1.7"
	},
	"language_info": {
	"file_extension": ".jl",
	"mimetype": "application/julia",
	"name": "julia",
	"version": "1.7.2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}
	---
	title: "Issue611"
	author: "Douglas Bates"
	jupyter: julia-1.7
	date: today
	date-format: iso
	execute:
	keep-ipynb: true
	---

	## Load packages and data

	```{julia}
	using CSV, MixedModels, ProgressMeter, Tables
	ProgressMeter.ijulia_behavior(:clear);
	```

	Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor,
	```{julia}
	dat = CSV.read("./data/model_ready_data_small.csv", columntable)
	wts = CSV.read("./data/weights.csv", columntable).weight_vector
	```

	Also, assign a `contrasts` specification of `Grouping` to the grouping factors
	```{julia}
	contrasts = Dict(
	:brand_category_channel_type => Grouping(),
	:channel_type_sub_brand => Grouping(),
	);
	```

	With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.
	In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.

	For example,

	```{julia}
	brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))
	```

	Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.

	## Initial model fits

	Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.
	I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace.

	```{julia}
	m1 = let
	form = @formula(
	volume ~ 1 + ppi + base_price +
	(1\|channel_type_sub_brand)
	)
	fit(MixedModel, form, dat; contrasts, wts)
	end
	```

	As you can see, it converges on the boundary with a variance component of zero.

	```{julia}
	println(m1)
	```


	```{julia}
	m2 = let
	form = @formula(
	volume ~ 1 + ppi + base_price +
	zerocorr(1 + ppi \| channel_type_sub_brand)
	)
	fit(MixedModel, form, dat; contrasts, wts)
	end
	```

	```{julia}
	m2a = let
	form = @formula(
	volume ~ 1 + ppi + base_price +
	(0 + ppi \| channel_type_sub_brand)
	)
	fit(MixedModel, form, dat; contrasts, wts)
	end
	println(m2a)
	```


	```{julia}
	println(m2)
	```

	```{julia}
	m3 = let
	form = @formula(
	volume ~ 1 + ppi + base_price +
	(1 \| channel_type_sub_brand) +
	(1 \| brand_category_channel_type)
	)
	fit(MixedModel, form, dat; contrasts, wts)
	end
	```

	```{julia}
	m4 = let
	form = @formula(
	volume ~ 1 + ppi + base_price +
	(1 \| channel_type_sub_brand) +
	zerocorr(1 + ppi \| brand_category_channel_type)
	)
	LinearMixedModel(form, dat; contrasts, wts)
	end
	try
	updateL!(m4)
	catch
	@warn "Cholesky factorization failed"
	end
	```

	```{julia}
	m5 = let
	form = @formula(
	volume ~ 1 + ppi + base_price +
	(0 + ppi \| channel_type_sub_brand) +
	(0 + ppi \| brand_category_channel_type)
	)
	fit(MixedModel, form, dat; contrasts, wts)
	end
	```


	```{julia}
	println(m5)
	```

	So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else.
	[deps]
	CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
	CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
	Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
	DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
	MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
	ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
	StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
	Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"