darribas/panel_FE.ipynb

## panel_FE.ipynb
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Panel fixed effects: dummy and demeaning estimation\n",
      "\n",
      "- **Author**: Dani Arribas-Bel [@darribas](http://twitter.com/darribas)\n",
      "\n",
      "This notebook is a mental note about issues coming up when trying to estimate a panel with one and two-ways fixed effects. Any comments, suggestions or ideas are more than welcome.\n",
      "\n",
      "You can access the notebook in two flavours:\n",
      "\n",
      "* Online (read-only): [link](http://nbviewer.ipython.org/urls/gist.github.com/darribas/7805381/raw/a68c68232800895e9da6d593e9fea38b953b5073/panel_FE.ipynb)\n",
      "* Raw file (as a gist): [link](https://gist.github.com/darribas/7805381#file-panel_fe-ipynb)\n",
      "    \n",
      "Last, the notebook relies on two functions I provide in a separate Python file in order not to clutter and make this text more readable. You get the additional file when you pull the `gist`, but you can also check it online [here](https://gist.github.com/darribas/7805381#file-panel_ols-py). All the code examples are written in Python and rely on the following dependencies that you will need to reproduce the code in your machine:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import panel_ols\n",
      "import pandas as pd\n",
      "print \"Pandas: \", pd.__version__\n",
      "import pysal as ps\n",
      "print \"PySAL: \", ps.version\n",
      "import numpy as np\n",
      "print \"Numpy: \", np.__version__\n",
      "\n",
      "seed = np.random.seed(1234)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Pandas:  0.12.0\n",
        "PySAL:  1.7.0dev\n",
        "Numpy:  1.7.1\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Theoretical preliminars\n",
      "\n",
      "### One-way\n",
      "\n",
      "The idea is that we have a panel with two dimensions (you can think of individuals $i$ over time $t$, but there can be other dimensions too). In formal notation, we begin with the following baseline model:\n",
      "\n",
      "$$ y_{it} = \\alpha + X_{it} \\beta + \\epsilon_{it} $$\n",
      "\n",
      "Assuming the usual stuff about the good behaviour of $\\epsilon_{it}$$, this can be estimated by OLS. If we want to control for time-invariant characteristics of individuals, we can include what is called a fixed-effect:\n",
      "\n",
      "$$ y_{it} = X_{it} \\beta + \\mu_i + \\epsilon_{it} $$\n",
      "\n",
      "where $\\mu_i$ absorbs every individual effect that does not vary over time. If we want to estimate this model, we can simply include a dummy of each $i$ and run OLS. The problem is that if the number of individuals is very large, one quickly runs out of degrees of freedom and increases multicollinearity. An equivalent way to work around this is suggested by [Baltagi](http://www.amazon.com/Econometric-Analysis-Panel-Data-Baltagi/dp/0470518863) on p. 34 (of Third edition at least). We can *demean* $y_{it} and $X_{it}$ as follows:\n",
      "\n",
      "$$ y_{it}* = y_{it} - \\bar{y}_{i.} $$\n",
      "\n",
      "where $\\bar{y}_{i.}$ is the average of values for each individual. Running a straight OLS (without intercept) on the transformed variables results in the same $\\beta$ estimates as we would by using dummy variables.\n",
      "\n",
      "### Two-way\n",
      "\n",
      "If we also want to account for the unboserved individual-invariant heterogeneity, we can also include a time effect, converting the model in the so called two-way fixed-effects modes:\n",
      "\n",
      "$$ y_{it} = X_{it} \\beta + \\mu_i + \\nu_t + \\epsilon_{it} $$\n",
      "\n",
      "In theory, you could also estimate this using dummy variables, but the problems that arise in the one-way model are even more acute. Fortunately, Baltagi proposes another transformation that gets around this:\n",
      "\n",
      "$$ y_{it}* = y_{it} - \\bar{y}_{i.} - \\bar{y}_{.t} + \\bar{y}_{..} $$\n",
      "\n",
      "If you apply this transformation to both $y$ and $X$, you can run a straight OLS (without intercept again) and obtain the $\\beta$ estimates of interest."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Ilustration in Python\n",
      "\n",
      "Let's now get our hands dirty and check this alternative ways of especifying the models are indeed equivalent. First we need some synthetic data to run the experiments:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "n = 1000\n",
      "d1 = 100\n",
      "d2 = 10\n",
      "seed = np.random.seed(1234)\n",
      "x = np.random.random((n, 3))\n",
      "fe1 = np.random.random_integers(0, d1, n)\n",
      "fe1_d = pd.get_dummies(fe1, 'fe1')\n",
      "fe2 = np.random.random_integers(0, d2, n)\n",
      "fe2_d = pd.get_dummies(fe2, 'fe2')\n",
      "\n",
      "e = np.random.normal(0, 1, (n, 1))\n",
      "y = np.dot(x, np.array([[1., 1., 1.]]).T) \\\n",
      "    + fe1_d.ix[:, :-1].values.sum(axis=1)[:, None] \\\n",
      "    + fe2_d.values.sum(axis=1)[:, None] \\\n",
      "    + e"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### One-way"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can estimate the model including the dummies for the first effect:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "xone = np.hstack((x, fe1_d.values))\n",
      "m1_dummy = ps.spreg.BaseOLS(y, xone)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Or use the transformation, built into the method `one_way_fe`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "m1_demean = panel_ols.one_way_fe(y, x, fe1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can now compare the estimates of $\\beta$:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.hstack((m1_dummy.betas[:3], m1_demean.betas))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "array([[ 1.05322727,  1.05322727],\n",
        "       [ 0.94063773,  0.94063773],\n",
        "       [ 1.08095942,  1.08095942]])"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If you check, the difference between the two is negligible:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "m1_dummy.betas[:3] - m1_demean.betas"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "array([[ -4.15223411e-14],\n",
        "       [  1.57651669e-14],\n",
        "       [ -1.82076576e-14]])"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Also, you can see how the multicollinearity issue starts to creep up although, in this case, it is not unacceptable:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ci = ps.spreg.diagnostics.condition_index\n",
      "\n",
      "print \"Dummy: %.4f | Demean: %.4f\"%(ci(m1_dummy), ci(m1_demean))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Dummy: 7.0236 | Demean: 1.0713\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/home/dani/code/pysal_darribas/pysal/spreg/diagnostics.py:619: ComplexWarning: Casting complex values to real discards the imaginary part\n",
        "  ci_result = sqrt(max_eigval/min_eigval)\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Two-way\n",
      "\n",
      "Here there are three alternatives and, in theory, they all give the same estimates although, as we are going to see, differences are a bit larger.\n",
      "\n",
      "We can estimate with all the dummies directly:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "xtwo = np.hstack((xone, fe2_d.values[:, 1:]))\n",
      "m2_dummies = ps.spreg.BaseOLS(y, xtwo)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can estimate the fully transformed model:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "m2_demean = panel_ols.two_way_fe(y, x, fe1, fe2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Or you can have a middle way using dummies for the effect with less categories (`fe2` in this case) and the demeaner for the other one:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "m2_mix = panel_ols.one_way_fe(y, xone[:, :-1], fe2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can now check the coefficients of the three methods:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pd.DataFrame({'Dummies': m2_dummies.betas[:3].flatten(), \\\n",
      "              'Mix': m2_mix.betas[:3].flatten(), \\\n",
      "              'Demean': m2_demean.betas.flatten()})"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>Demean</th>\n",
        "      <th>Dummies</th>\n",
        "      <th>Mix</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1.051114</td>\n",
        "      <td> 1.051047</td>\n",
        "      <td> 1.051047</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 0.963858</td>\n",
        "      <td> 0.964616</td>\n",
        "      <td> 0.964616</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 1.072238</td>\n",
        "      <td> 1.070982</td>\n",
        "      <td> 1.070982</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "     Demean   Dummies       Mix\n",
        "0  1.051114  1.051047  1.051047\n",
        "1  0.963858  0.964616  0.964616\n",
        "2  1.072238  1.070982  1.070982"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And their multicollinearity:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pd.Series({'Dummies': ci(m2_dummies), \\\n",
      "              'Mix': ci(m2_mix), \\\n",
      "              'Demean': ci(m2_demean)})"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "Demean      1.073158\n",
        "Dummies    11.582533\n",
        "Mix        11.704122\n",
        "dtype: float64"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As you can see, estimates for `Dummies` and `Mix` are exactly the same (at least at the sixth decimal), while those for the `Demean` are slightly different. This is probably related to precission issues, but if you have any suggestion as to why this happens, please drop me a line, I'll be happy to learn!\n",
      "\n",
      "**Update**\n",
      "\n",
      "It turns out that the two-way fixed effects differ in the previous cells because the way I have randomly set up the indices for both dimensions. Demeaning a-la Baltagi only works in a true panel setting in which you only have one occurrence of a pair of indices $it$. When you indeed have this case, the numbers match."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "n = 1000\n",
      "d1 = 10\n",
      "d2 = 100\n",
      "seed = np.random.seed(1234)\n",
      "x = np.random.random((n, 3))\n",
      "# Balanced panel\n",
      "d2 = int(n * 1. / d1)\n",
      "fe1 = np.arange(d1)\n",
      "fe2 = np.arange(d2)\n",
      "fe = np.array([(i, j) for i in fe1 for j in fe2])\n",
      "fe1 = fe[:, 0]\n",
      "fe2 = fe[:, 1]\n",
      "#\n",
      "fe1_d = pd.get_dummies(fe1, 'fe1')\n",
      "fe2_d = pd.get_dummies(fe2, 'fe2')\n",
      "                                                         \n",
      "e = np.random.normal(0, 1, (n, 1))\n",
      "y = np.dot(x, np.array([[1., 1., 1.]]).T) \\\n",
      "        + fe1_d.ix[:, :-1].values.sum(axis=1)[:, None] \\\n",
      "        + e\n",
      "        #+ fe2_d.values.sum(axis=1)[:, None] \\\n",
      "                                                         \n",
      "        # One-way\n",
      "m = panel_ols.one_way_fe(y, x, fe1)\n",
      "                                                         \n",
      "xone = np.hstack((x, fe1_d.values))\n",
      "md = ps.spreg.BaseOLS(y, xone)\n",
      "                                                         \n",
      "print \"De-meaned | Dummy\"\n",
      "for i in range(3):\n",
      "    print np.round(m.betas[i][0], 9), \" | \", \\\n",
      "            np.round(md.betas[i][0], 9)\n",
      "print 'Condition index'\n",
      "print ci(m), \" | \", ci(md)\n",
      "                                                         \n",
      "        # Two-way\n",
      "m = panel_ols.two_way_fe(y, x, fe1, fe2)\n",
      "                                                         \n",
      "malt = panel_ols.one_way_fe(y, xone[:, :-1], fe2)\n",
      "                                                         \n",
      "xtwo = np.hstack((xone, fe2_d.values[:, 1:]))\n",
      "md = ps.spreg.BaseOLS(y, xtwo)\n",
      "                                                         \n",
      "print \"\\nDe-meaned | 1 demeaned-1 dummy | Dummy\"\n",
      "for i in range(3):\n",
      "    print np.round(m.betas[i][0], 9), \" | \", \\\n",
      "            np.round(malt.betas[i][0], 9), \" | \", \\\n",
      "            np.round(md.betas[i][0], 9)\n",
      "print 'Condition index'\n",
      "print ci(m), \" | \",ci(malt), \" | \", ci(md)\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "De-meaned | Dummy\n",
        "1.001466538  |  1.001466538\n",
        "0.823691091  |  0.823691091\n",
        "0.988729449  |  0.988729449\n",
        "Condition index\n",
        "1.09144305033  |  6.66571288935\n",
        "\n",
        "De-meaned | 1 demeaned-1 dummy | Dummy"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "0.971197304  |  0.971197304  |  0.971197304\n",
        "0.867112279  |  0.867112279  |  0.867112279\n",
        "0.88178915  |  0.88178915  |  0.88178915\n",
        "Condition index\n",
        "1.06881037371  |  3.32261011076  |  29.8077107922\n"
       ]
      }
     ],
     "prompt_number": 15
    }
   ],
   "metadata": {}
  }
 ]
}

## panel_ols.py
'''
Fixed-effects OLS Panel models

Author: Dani Arribas-Bel <@darribas>

Copyright (c) 2013, Dani Arribas-Bel
All rights reserved.

LICENSE
-------

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright
  notice, this list of conditions and the following disclaimer in the
  documentation and/or other materials provided with the distribution.

* The name of the author may not be used to endorse or
  promote products derived from this software without specific prior written
  permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
'''

import pandas as pd
import pysal as ps
import numpy as np
from pysal.spreg.diagnostics import condition_index as ci

def one_way_fe(y, x, fe1, robust=None, name_y=None, name_x=None, model_name=None):
    '''
    One way Fixed-Effect panel
    ...

    Arguments
    ---------
    y               : ndarray
                      Flat array or list with dependent variable
    x               : ndarray
                      Array with exogenous variables
    fe1             : ndarray/list
                      Sequence with memberships to the Fixed effect (e.g.
                      individiual or neighborhood)
    robust          : str
                      Argument for PySAL OLS. If None (default), standard OLS
                      inference is performed. If 'white', white
                      heteroskedasticity is applied.
    name_y          : str
                      [Optional] Name of dependent variable
    name_x          : list
                      [Optional] Name of exogenous variables
    model_name      : str
                      [Optional] Name of model
    Returns
    -------
    m               : ps.spreg.BaseOLS
                      Barebones PySAL OLS model object
    '''
    yx = pd.DataFrame(x)
    yx['y'] = y
    yx['fe1'] = fe1
    demeaner1 = yx.groupby('fe1').mean()
    demeaner1 = yx[['fe1']].join(demeaner1, on='fe1')\
            .drop(['fe1'], axis=1)
    yx = yx.drop(['fe1'], axis=1)
    yxd = yx - demeaner1
    m = ps.spreg.BaseOLS(yxd[['y']].values, yxd.drop('y', axis=1).values, \
            robust=robust)
    if name_y:
        m.name_y = name_y
    else:
        m.name_y = 'y'
    if name_x:
        m.name_x = name_x
    else:
        m.name_x = ['X'+str(i) for i in range(x.shape[1])]
    from pysal.spreg.diagnostics import t_stat, r2
    m.t_stat = t_stat(m)
    m.r2 = r2(m)
    m.model_name = m.name_y
    m.fe = 'FE-1'
    m.fe2 = None
    return m

def two_way_fe(y, x, fe1, fe2, robust=None, name_y=None, name_x=None, model_name=None):
    '''
    Two ways Fixed-Effect panel
    ...

    Arguments
    ---------
    y               : ndarray
                      Flat array or list with dependent variable
    x               : ndarray
                      Array with exogenous variables
    fe1             : ndarray/list
                      Sequence with memberships to the first Fixed effect (e.g.
                      individiual or neighborhood)
    fe2             : ndarray/list
                      Sequence with memberships to the second Fixed effect (e.g.
                      time)
    robust          : str
                      Argument for PySAL OLS. If None (default), standard OLS
                      inference is performed. If 'white', white
                      heteroskedasticity is applied.
    name_y          : str
                      [Optional] Name of dependent variable
    name_x          : list
                      [Optional] Name of exogenous variables
    model_name      : str
                      [Optional] Name of model
    Returns
    -------
    m               : ps.spreg.BaseOLS
                      Barebones PySAL OLS model object
    '''
    yx = pd.DataFrame(x)
    yx['y'] = y
    yx['fe1'] = fe1
    yx['fe2'] = fe2
    demeaner1 = yx.drop('fe2', axis=1).groupby('fe1').mean()
    demeaner1 = yx[['fe1']].join(demeaner1, on='fe1')\
            .drop(['fe1'], axis=1)
    demeaner2 = yx.drop('fe1', axis=1).groupby('fe2').mean()
    demeaner2 = yx[['fe2']].join(demeaner2, on='fe2')\
            .drop(['fe2'], axis=1)
    yx = yx.drop(['fe1', 'fe2'], axis=1)
    yxd = yx - demeaner1 - demeaner2 + yx.mean()
    m = ps.spreg.BaseOLS(yxd[['y']].values, yxd.drop('y', axis=1).values, \
            robust=robust)
    if name_y:
        m.name_y = name_y
    else:
        m.name_y = 'y'
    if name_x:
        m.name_x = name_x
    else:
        m.name_x = ['X'+str(i) for i in range(x.shape[1])]
    from pysal.spreg.diagnostics import t_stat, r2
    m.t_stat = t_stat(m)
    m.r2 = r2(m)
    m.model_name = m.name_y
    m.fe = 'FE-1'
    m.fe2 = 'FE-2'
    return m

if __name__ == '__main__':

            # DGP
    n = 1000
    d1 = 10
    d2 = 100
    seed = np.random.seed(1234)
    x = np.random.random((n, 3))
    # Unbalanced panel
    fe1 = np.random.random_integers(0, d1, n).astype(str)
    fe2 = np.random.random_integers(0, d2, n).astype(str)
    # Balanced panel
    d2 = int(n * 1. / d1)
    fe1 = np.arange(d1)
    fe2 = np.arange(d2)
    fe = np.array([(i, j) for i in fe1 for j in fe2])
    fe1 = fe[:, 0]
    fe2 = fe[:, 1]
    #
    fe1_d = pd.get_dummies(fe1, 'fe1')
    fe2_d = pd.get_dummies(fe2, 'fe2')

    e = np.random.normal(0, 1, (n, 1))
    y = np.dot(x, np.array([[1., 1., 1.]]).T) \
            + fe1_d.ix[:, :-1].values.sum(axis=1)[:, None] \
            + e
            #+ fe2_d.values.sum(axis=1)[:, None] \

            # One-way
    m = one_way_fe(y, x, fe1)

    xone = np.hstack((x, fe1_d.values))
    md = ps.spreg.BaseOLS(y, xone)

    print "De-meaned | Dummy"
    for i in range(3):
        print np.round(m.betas[i][0], 9), " | ", \
                np.round(md.betas[i][0], 9)
    print 'Condition index'
    print ci(m), " | ", ci(md)

            # Two-way
    m = two_way_fe(y, x, fe1, fe2)

    malt = one_way_fe(y, xone[:, :-1], fe2)

    xtwo = np.hstack((xone, fe2_d.values[:, 1:]))
    md = ps.spreg.BaseOLS(y, xtwo)

    print "\nDe-meaned | 1 demeaned-1 dummy | Dummy"
    for i in range(3):
        print np.round(m.betas[i][0], 9), " | ", \
                np.round(malt.betas[i][0], 9), " | ", \
                np.round(md.betas[i][0], 9)
    print 'Condition index'
    print ci(m), " | ",ci(malt), " | ", ci(md)
	{
	"metadata": {
	"name": ""
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Panel fixed effects: dummy and demeaning estimation\n",
	"\n",
	"- Author: Dani Arribas-Bel [@darribas](http://twitter.com/darribas)\n",
	"\n",
	"This notebook is a mental note about issues coming up when trying to estimate a panel with one and two-ways fixed effects. Any comments, suggestions or ideas are more than welcome.\n",
	"\n",
	"You can access the notebook in two flavours:\n",
	"\n",
	"* Online (read-only): [link](http://nbviewer.ipython.org/urls/gist.github.com/darribas/7805381/raw/a68c68232800895e9da6d593e9fea38b953b5073/panel_FE.ipynb)\n",
	"* Raw file (as a gist): [link](https://gist.github.com/darribas/7805381#file-panel_fe-ipynb)\n",
	" \n",
	"Last, the notebook relies on two functions I provide in a separate Python file in order not to clutter and make this text more readable. You get the additional file when you pull the `gist`, but you can also check it online [here](https://gist.github.com/darribas/7805381#file-panel_ols-py). All the code examples are written in Python and rely on the following dependencies that you will need to reproduce the code in your machine:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"import panel_ols\n",
	"import pandas as pd\n",
	"print \"Pandas: \", pd.__version__\n",
	"import pysal as ps\n",
	"print \"PySAL: \", ps.version\n",
	"import numpy as np\n",
	"print \"Numpy: \", np.__version__\n",
	"\n",
	"seed = np.random.seed(1234)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"Pandas: 0.12.0\n",
	"PySAL: 1.7.0dev\n",
	"Numpy: 1.7.1\n"
	]
	}
	],
	"prompt_number": 1
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Theoretical preliminars\n",
	"\n",
	"### One-way\n",
	"\n",
	"The idea is that we have a panel with two dimensions (you can think of individuals $i$ over time $t$, but there can be other dimensions too). In formal notation, we begin with the following baseline model:\n",
	"\n",
	"$$ y_{it} = \\alpha + X_{it} \\beta + \\epsilon_{it} $$\n",
	"\n",
	"Assuming the usual stuff about the good behaviour of $\\epsilon_{it}$$, this can be estimated by OLS. If we want to control for time-invariant characteristics of individuals, we can include what is called a fixed-effect:\n",
	"\n",
	"$$ y_{it} = X_{it} \\beta + \\mu_i + \\epsilon_{it} $$\n",
	"\n",
	"where $\\mu_i$ absorbs every individual effect that does not vary over time. If we want to estimate this model, we can simply include a dummy of each $i$ and run OLS. The problem is that if the number of individuals is very large, one quickly runs out of degrees of freedom and increases multicollinearity. An equivalent way to work around this is suggested by [Baltagi](http://www.amazon.com/Econometric-Analysis-Panel-Data-Baltagi/dp/0470518863) on p. 34 (of Third edition at least). We can demean $y_{it} and $X_{it}$ as follows:\n",
	"\n",
	"$$ y_{it}* = y_{it} - \\bar{y}_{i.} $$\n",
	"\n",
	"where $\\bar{y}_{i.}$ is the average of values for each individual. Running a straight OLS (without intercept) on the transformed variables results in the same $\\beta$ estimates as we would by using dummy variables.\n",
	"\n",
	"### Two-way\n",
	"\n",
	"If we also want to account for the unboserved individual-invariant heterogeneity, we can also include a time effect, converting the model in the so called two-way fixed-effects modes:\n",
	"\n",
	"$$ y_{it} = X_{it} \\beta + \\mu_i + \\nu_t + \\epsilon_{it} $$\n",
	"\n",
	"In theory, you could also estimate this using dummy variables, but the problems that arise in the one-way model are even more acute. Fortunately, Baltagi proposes another transformation that gets around this:\n",
	"\n",
	"$$ y_{it}* = y_{it} - \\bar{y}_{i.} - \\bar{y}_{.t} + \\bar{y}_{..} $$\n",
	"\n",
	"If you apply this transformation to both $y$ and $X$, you can run a straight OLS (without intercept again) and obtain the $\\beta$ estimates of interest."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Ilustration in Python\n",
	"\n",
	"Let's now get our hands dirty and check this alternative ways of especifying the models are indeed equivalent. First we need some synthetic data to run the experiments:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"n = 1000\n",
	"d1 = 100\n",
	"d2 = 10\n",
	"seed = np.random.seed(1234)\n",
	"x = np.random.random((n, 3))\n",
	"fe1 = np.random.random_integers(0, d1, n)\n",
	"fe1_d = pd.get_dummies(fe1, 'fe1')\n",
	"fe2 = np.random.random_integers(0, d2, n)\n",
	"fe2_d = pd.get_dummies(fe2, 'fe2')\n",
	"\n",
	"e = np.random.normal(0, 1, (n, 1))\n",
	"y = np.dot(x, np.array([[1., 1., 1.]]).T) \\\n",
	" + fe1_d.ix[:, :-1].values.sum(axis=1)[:, None] \\\n",
	" + fe2_d.values.sum(axis=1)[:, None] \\\n",
	" + e"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 2
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### One-way"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can estimate the model including the dummies for the first effect:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"xone = np.hstack((x, fe1_d.values))\n",
	"m1_dummy = ps.spreg.BaseOLS(y, xone)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 3
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Or use the transformation, built into the method `one_way_fe`:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"m1_demean = panel_ols.one_way_fe(y, x, fe1)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 4
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can now compare the estimates of $\\beta$:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"np.hstack((m1_dummy.betas[:3], m1_demean.betas))"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 5,
	"text": [
	"array([[ 1.05322727, 1.05322727],\n",
	" [ 0.94063773, 0.94063773],\n",
	" [ 1.08095942, 1.08095942]])"
	]
	}
	],
	"prompt_number": 5
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you check, the difference between the two is negligible:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"m1_dummy.betas[:3] - m1_demean.betas"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 6,
	"text": [
	"array([[ -4.15223411e-14],\n",
	" [ 1.57651669e-14],\n",
	" [ -1.82076576e-14]])"
	]
	}
	],
	"prompt_number": 6
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Also, you can see how the multicollinearity issue starts to creep up although, in this case, it is not unacceptable:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"ci = ps.spreg.diagnostics.condition_index\n",
	"\n",
	"print \"Dummy: %.4f \| Demean: %.4f\"%(ci(m1_dummy), ci(m1_demean))"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"Dummy: 7.0236 \| Demean: 1.0713\n"
	]
	},
	{
	"output_type": "stream",
	"stream": "stderr",
	"text": [
	"/home/dani/code/pysal_darribas/pysal/spreg/diagnostics.py:619: ComplexWarning: Casting complex values to real discards the imaginary part\n",
	" ci_result = sqrt(max_eigval/min_eigval)\n"
	]
	}
	],
	"prompt_number": 7
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Two-way\n",
	"\n",
	"Here there are three alternatives and, in theory, they all give the same estimates although, as we are going to see, differences are a bit larger.\n",
	"\n",
	"We can estimate with all the dummies directly:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"xtwo = np.hstack((xone, fe2_d.values[:, 1:]))\n",
	"m2_dummies = ps.spreg.BaseOLS(y, xtwo)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 8
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can estimate the fully transformed model:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"m2_demean = panel_ols.two_way_fe(y, x, fe1, fe2)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 9
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Or you can have a middle way using dummies for the effect with less categories (`fe2` in this case) and the demeaner for the other one:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"m2_mix = panel_ols.one_way_fe(y, xone[:, :-1], fe2)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 10
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can now check the coefficients of the three methods:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"pd.DataFrame({'Dummies': m2_dummies.betas[:3].flatten(), \\\n",
	" 'Mix': m2_mix.betas[:3].flatten(), \\\n",
	" 'Demean': m2_demean.betas.flatten()})"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"html": [
	"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Demean</th>\n",
	" <th>Dummies</th>\n",
	" <th>Mix</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td> 1.051114</td>\n",
	" <td> 1.051047</td>\n",
	" <td> 1.051047</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td> 0.963858</td>\n",
	" <td> 0.964616</td>\n",
	" <td> 0.964616</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td> 1.072238</td>\n",
	" <td> 1.070982</td>\n",
	" <td> 1.070982</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 11,
	"text": [
	" Demean Dummies Mix\n",
	"0 1.051114 1.051047 1.051047\n",
	"1 0.963858 0.964616 0.964616\n",
	"2 1.072238 1.070982 1.070982"
	]
	}
	],
	"prompt_number": 11
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"And their multicollinearity:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"pd.Series({'Dummies': ci(m2_dummies), \\\n",
	" 'Mix': ci(m2_mix), \\\n",
	" 'Demean': ci(m2_demean)})"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 12,
	"text": [
	"Demean 1.073158\n",
	"Dummies 11.582533\n",
	"Mix 11.704122\n",
	"dtype: float64"
	]
	}
	],
	"prompt_number": 12
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"As you can see, estimates for `Dummies` and `Mix` are exactly the same (at least at the sixth decimal), while those for the `Demean` are slightly different. This is probably related to precission issues, but if you have any suggestion as to why this happens, please drop me a line, I'll be happy to learn!\n",
	"\n",
	"Update\n",
	"\n",
	"It turns out that the two-way fixed effects differ in the previous cells because the way I have randomly set up the indices for both dimensions. Demeaning a-la Baltagi only works in a true panel setting in which you only have one occurrence of a pair of indices $it$. When you indeed have this case, the numbers match."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"n = 1000\n",
	"d1 = 10\n",
	"d2 = 100\n",
	"seed = np.random.seed(1234)\n",
	"x = np.random.random((n, 3))\n",
	"# Balanced panel\n",
	"d2 = int(n * 1. / d1)\n",
	"fe1 = np.arange(d1)\n",
	"fe2 = np.arange(d2)\n",
	"fe = np.array([(i, j) for i in fe1 for j in fe2])\n",
	"fe1 = fe[:, 0]\n",
	"fe2 = fe[:, 1]\n",
	"#\n",
	"fe1_d = pd.get_dummies(fe1, 'fe1')\n",
	"fe2_d = pd.get_dummies(fe2, 'fe2')\n",
	" \n",
	"e = np.random.normal(0, 1, (n, 1))\n",
	"y = np.dot(x, np.array([[1., 1., 1.]]).T) \\\n",
	" + fe1_d.ix[:, :-1].values.sum(axis=1)[:, None] \\\n",
	" + e\n",
	" #+ fe2_d.values.sum(axis=1)[:, None] \\\n",
	" \n",
	" # One-way\n",
	"m = panel_ols.one_way_fe(y, x, fe1)\n",
	" \n",
	"xone = np.hstack((x, fe1_d.values))\n",
	"md = ps.spreg.BaseOLS(y, xone)\n",
	" \n",
	"print \"De-meaned \| Dummy\"\n",
	"for i in range(3):\n",
	" print np.round(m.betas[i][0], 9), \" \| \", \\\n",
	" np.round(md.betas[i][0], 9)\n",
	"print 'Condition index'\n",
	"print ci(m), \" \| \", ci(md)\n",
	" \n",
	" # Two-way\n",
	"m = panel_ols.two_way_fe(y, x, fe1, fe2)\n",
	" \n",
	"malt = panel_ols.one_way_fe(y, xone[:, :-1], fe2)\n",
	" \n",
	"xtwo = np.hstack((xone, fe2_d.values[:, 1:]))\n",
	"md = ps.spreg.BaseOLS(y, xtwo)\n",
	" \n",
	"print \"\\nDe-meaned \| 1 demeaned-1 dummy \| Dummy\"\n",
	"for i in range(3):\n",
	" print np.round(m.betas[i][0], 9), \" \| \", \\\n",
	" np.round(malt.betas[i][0], 9), \" \| \", \\\n",
	" np.round(md.betas[i][0], 9)\n",
	"print 'Condition index'\n",
	"print ci(m), \" \| \",ci(malt), \" \| \", ci(md)\n"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"De-meaned \| Dummy\n",
	"1.001466538 \| 1.001466538\n",
	"0.823691091 \| 0.823691091\n",
	"0.988729449 \| 0.988729449\n",
	"Condition index\n",
	"1.09144305033 \| 6.66571288935\n",
	"\n",
	"De-meaned \| 1 demeaned-1 dummy \| Dummy"
	]
	},
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"\n",
	"0.971197304 \| 0.971197304 \| 0.971197304\n",
	"0.867112279 \| 0.867112279 \| 0.867112279\n",
	"0.88178915 \| 0.88178915 \| 0.88178915\n",
	"Condition index\n",
	"1.06881037371 \| 3.32261011076 \| 29.8077107922\n"
	]
	}
	],
	"prompt_number": 15
	}
	],
	"metadata": {}
	}
	]
	}
	'''
	Fixed-effects OLS Panel models

	Author: Dani Arribas-Bel <@darribas>

	Copyright (c) 2013, Dani Arribas-Bel
	All rights reserved.

	LICENSE
	-------

	Redistribution and use in source and binary forms, with or without
	modification, are permitted provided that the following conditions are met:

	* Redistributions of source code must retain the above copyright notice, this
	list of conditions and the following disclaimer.

	* Redistributions in binary form must reproduce the above copyright
	notice, this list of conditions and the following disclaimer in the
	documentation and/or other materials provided with the distribution.

	* The name of the author may not be used to endorse or
	promote products derived from this software without specific prior written
	permission.

	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
	CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
	INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
	DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
	CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
	SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
	LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
	USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
	AND
	ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
	LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
	IN
	ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
	POSSIBILITY OF SUCH DAMAGE.
	'''

	import pandas as pd
	import pysal as ps
	import numpy as np
	from pysal.spreg.diagnostics import condition_index as ci

	def one_way_fe(y, x, fe1, robust=None, name_y=None, name_x=None, model_name=None):
	'''
	One way Fixed-Effect panel
	...

	Arguments
	---------
	y : ndarray
	Flat array or list with dependent variable
	x : ndarray
	Array with exogenous variables
	fe1 : ndarray/list
	Sequence with memberships to the Fixed effect (e.g.
	individiual or neighborhood)
	robust : str
	Argument for PySAL OLS. If None (default), standard OLS
	inference is performed. If 'white', white
	heteroskedasticity is applied.
	name_y : str
	[Optional] Name of dependent variable
	name_x : list
	[Optional] Name of exogenous variables
	model_name : str
	[Optional] Name of model
	Returns
	-------
	m : ps.spreg.BaseOLS
	Barebones PySAL OLS model object
	'''
	yx = pd.DataFrame(x)
	yx['y'] = y
	yx['fe1'] = fe1
	demeaner1 = yx.groupby('fe1').mean()
	demeaner1 = yx[['fe1']].join(demeaner1, on='fe1')\
	.drop(['fe1'], axis=1)
	yx = yx.drop(['fe1'], axis=1)
	yxd = yx - demeaner1
	m = ps.spreg.BaseOLS(yxd[['y']].values, yxd.drop('y', axis=1).values, \
	robust=robust)
	if name_y:
	m.name_y = name_y
	else:
	m.name_y = 'y'
	if name_x:
	m.name_x = name_x
	else:
	m.name_x = ['X'+str(i) for i in range(x.shape[1])]
	from pysal.spreg.diagnostics import t_stat, r2
	m.t_stat = t_stat(m)
	m.r2 = r2(m)
	m.model_name = m.name_y
	m.fe = 'FE-1'
	m.fe2 = None
	return m

	def two_way_fe(y, x, fe1, fe2, robust=None, name_y=None, name_x=None, model_name=None):
	'''
	Two ways Fixed-Effect panel
	...

	Arguments
	---------
	y : ndarray
	Flat array or list with dependent variable
	x : ndarray
	Array with exogenous variables
	fe1 : ndarray/list
	Sequence with memberships to the first Fixed effect (e.g.
	individiual or neighborhood)
	fe2 : ndarray/list
	Sequence with memberships to the second Fixed effect (e.g.
	time)
	robust : str
	Argument for PySAL OLS. If None (default), standard OLS
	inference is performed. If 'white', white
	heteroskedasticity is applied.
	name_y : str
	[Optional] Name of dependent variable
	name_x : list
	[Optional] Name of exogenous variables
	model_name : str
	[Optional] Name of model
	Returns
	-------
	m : ps.spreg.BaseOLS
	Barebones PySAL OLS model object
	'''
	yx = pd.DataFrame(x)
	yx['y'] = y
	yx['fe1'] = fe1
	yx['fe2'] = fe2
	demeaner1 = yx.drop('fe2', axis=1).groupby('fe1').mean()
	demeaner1 = yx[['fe1']].join(demeaner1, on='fe1')\
	.drop(['fe1'], axis=1)
	demeaner2 = yx.drop('fe1', axis=1).groupby('fe2').mean()
	demeaner2 = yx[['fe2']].join(demeaner2, on='fe2')\
	.drop(['fe2'], axis=1)
	yx = yx.drop(['fe1', 'fe2'], axis=1)
	yxd = yx - demeaner1 - demeaner2 + yx.mean()
	m = ps.spreg.BaseOLS(yxd[['y']].values, yxd.drop('y', axis=1).values, \
	robust=robust)
	if name_y:
	m.name_y = name_y
	else:
	m.name_y = 'y'
	if name_x:
	m.name_x = name_x
	else:
	m.name_x = ['X'+str(i) for i in range(x.shape[1])]
	from pysal.spreg.diagnostics import t_stat, r2
	m.t_stat = t_stat(m)
	m.r2 = r2(m)
	m.model_name = m.name_y
	m.fe = 'FE-1'
	m.fe2 = 'FE-2'
	return m

	if __name__ == '__main__':

	# DGP
	n = 1000
	d1 = 10
	d2 = 100
	seed = np.random.seed(1234)
	x = np.random.random((n, 3))
	# Unbalanced panel
	fe1 = np.random.random_integers(0, d1, n).astype(str)
	fe2 = np.random.random_integers(0, d2, n).astype(str)
	# Balanced panel
	d2 = int(n * 1. / d1)
	fe1 = np.arange(d1)
	fe2 = np.arange(d2)
	fe = np.array([(i, j) for i in fe1 for j in fe2])
	fe1 = fe[:, 0]
	fe2 = fe[:, 1]
	#
	fe1_d = pd.get_dummies(fe1, 'fe1')
	fe2_d = pd.get_dummies(fe2, 'fe2')

	e = np.random.normal(0, 1, (n, 1))
	y = np.dot(x, np.array([[1., 1., 1.]]).T) \
	+ fe1_d.ix[:, :-1].values.sum(axis=1)[:, None] \
	+ e
	#+ fe2_d.values.sum(axis=1)[:, None] \

	# One-way
	m = one_way_fe(y, x, fe1)

	xone = np.hstack((x, fe1_d.values))
	md = ps.spreg.BaseOLS(y, xone)

	print "De-meaned \| Dummy"
	for i in range(3):
	print np.round(m.betas[i][0], 9), " \| ", \
	np.round(md.betas[i][0], 9)
	print 'Condition index'
	print ci(m), " \| ", ci(md)

	# Two-way
	m = two_way_fe(y, x, fe1, fe2)

	malt = one_way_fe(y, xone[:, :-1], fe2)

	xtwo = np.hstack((xone, fe2_d.values[:, 1:]))
	md = ps.spreg.BaseOLS(y, xtwo)

	print "\nDe-meaned \| 1 demeaned-1 dummy \| Dummy"
	for i in range(3):
	print np.round(m.betas[i][0], 9), " \| ", \
	np.round(malt.betas[i][0], 9), " \| ", \
	np.round(md.betas[i][0], 9)
	print 'Condition index'
	print ci(m), " \| ",ci(malt), " \| ", ci(md)