Skip to content

Instantly share code, notes, and snippets.

@TaylorOshan
Last active April 12, 2023 09:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save TaylorOshan/aa4e00448fedc6219379 to your computer and use it in GitHub Desktop.
Save TaylorOshan/aa4e00448fedc6219379 to your computer and use it in GitHub Desktop.
GSoC Proposal - Taylor Oshan- PySAL: Spatial Interaction Modeling

Python Software Foundation 2016 Google Summer of Code Application

Sub-organization Information

PySAL: Python Spatial Analysis Library

Student Information

Code Samples

Link to a patch/code sample, preferably one you have submitted to your sub-org (*)

  1. Check regression constant bug*

  2. Initialize SpInt (Spatial Interaction) modeling Package with family of max-entropy "gravity" models*

  3. Sketch of generalized linear model class for Gaussian, Poisson, and logistic models to use as base of modeling counts of flows such as commuting-to-work data

  4. Spatial autocorrelation statistic for vectors

Project Information

Proposal Title:

PySAL: Integrating Poisson count models and spatial effects for spatial interaction modeling.

Proposal Abstract:

Spatial interaction modeling involves the analysis of flows from an origin to a destination either over physical space (i.e., migration) or through abstract space (i.e., telecommunication). While many models that have been developed and proposed, there is little to no software avaialble to carry out spatial interaction modeling and the analysis of flow data. This is especially true in the case of open source software and within the python ecosystem. Therefore, a comprehensive python package, which draws on existing PySAL infrastrucutre and extends it, would fill an important gap within the current set of avialable spatial analysis tools.

PySAL is intended to support the development of high level spatial analysis. As such, it currenty provides a rich set of tools for modeling spatial effects within a regression framework, which is typically applied to areal units. While it is possible to extend some of these models to the case of spatial interaction data, new spatial weight structures will be necessary to capture the unique spatial dependence that occurs between a data point that has both an origin and a destination, rather than a single areal unit. Furthermore, the existing spatial regression models are specifically designed for continuous data, whereas many spatial interaction phenomena are more properly modeled as counts (i.e., commuting, migration).

Finally, there are several paradigms for incorporating spatial effects into spatial interaction models (competing destinations, spatial autoregressive, eigenvector spatial filter). Therefore, the primary goals of this GSoC proposal are to:

  • Implement new structures and algorithms to capture dependence in flow data.
  • Develop Poisson count models that incorporate spatial effects.

A generalized linear model (GLM) approach will be adopted for modeling counts, so developing this framework would be the first outcome of this project, which could be used more widely throuhgout PySAL. While this functionality currently exists in another python library (statsmodels), the newly developed GLM framework would a) accommodate a sparse data structure often utilized in PySAL's spatial regression module and b) be light-weight so it would be simple to extend for various models with spatial effects. The other major outcome of this project would be a comprehensive module focused on spatial interaction modeling, which would include exploratory statistics, data structures, and models, as well as documentation and educational materials. In terms of exploratory statistics, the module will consist of tests of overdispersion and spatial dependence to detect potential problems when modeling counts of flows. New data structures will include origin-destination weights and network-based weights, which are needed to capture spatial dependence in flows. Finally, the Poisson GLM framework will be extended to include models that account for overdisperison or spatial dependence. The final module could exist within the core of PySAL or as a contributor module, depending on which dependencies are necessary.

Proposal Description/Timeline

  • Generalized linear model (GLM) base class for modeling count data (Poisson model)1. (Week 1 & 2; ~ May 23rd - June 3rd)

    • Coefficient estimation via iteratively re-weighted least squared routine (see code sample 3. above)
    • Coefficient estimation via maximum likelihood and gradient optimization (using scipy and/or autograd)
    • Include support for sparse matrix data structure
    • Poisson GLM diagnostics such as AIC, BIC, deviance, log-likelihood, null deviance, deviance residuals, working residuals, etc.
    • Unit tests/documentation
  • Zero flows, zero-inflation, overdispersion, and heteroskedasticity. (Week 3 & 4; ~ June 6th - June 17th)

    • Tests for overdispersion4,5,6
    • Poisson Pseudo Maximum Likelihood (PPML) estimator2
    • Zero-inflated Poisson Model3
    • Unit tests/documentation
  • Exploratory tools. (Week 5 & 6; ~ June 20th - July 1st)

    • Vector-based spatial autocorrelation statistic7 (see code sample 4. above).
    • Vector randomization for permutation-based hypothesis testing of vector spatial autocorrelation
    • Automate origin/destination specific calibration to investigate non-stationary processes
    • Unit tests/documentation
  • Flow-based spatial weight specifications. (Week 7 & 8; ~ July 4th - July 15th)

  • Origin-destination weights8

  • Network origin-destination weights9

  • Unit tests/documentation

  • Spatial autoregressive (SAR) specifications. (Week 9 & 10 & 11; ~ July 18th - August 5th)

    • Log-normal SAR8
    • Poisson SAR model10
    • Poisson SAR gravity model11
    • Unit tests/documentation
  • Wrap up and prepare module for release. (Week 12 & 13; ~ August 8th - August 23rd)

    • Optimize code
    • Double check tests/documentation
    • Finalize educational materials and provide sample analysis workflow using exploratory tools, diagnostic tests, and formal models
  • Additional goals if there is any extra time and project is ahead of schedule:

    • Competing destinations specifications16,17
    • Spatial eigenvector filter (SF) specifications9
    • Non-parametric “universal” model varieties 12,13,14
    • Non-parametric Neural Network routines for calibrating spatial interaction models15

Citations

  1. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135(3), 370–384. http://doi.org/10.2307/2344614
  2. Santos Silva, J. M. C., & Tenreyro, S. (2006). The Log of Gravity. The Review of Econoomics and Statsitics, 88(4), 641–658.
  3. Burger, M., Van Oort, F., & Linders, G.-J. (2009). On the specification of the gravity model of trade: zeros, excess zeros and zero-inflated estimation. Spatial Economic Analysis, 4(2), 167–190.
  4. Dean, C., & Lawless, J. F. (1989). Tests for Detecting Overdispersion in Poisson Regression Models. Journal of the American Statistical Association, 84(406), 467–472. http://doi.org/10.2307/2289931
  5. Dean, C. B. (1992). Testing for Overdispersion in Poisson and Binomial Regression Models. Journal of the American Statistical Association, 87(418), 451–457. http://doi.org/10.2307/2290276
  6. Cameron, A. C., & Trivedi, P. K. (1990). Regression-based tests for overdispersion in the Poisson model. Journal of Econometrics, 46(3), 347–364. http://doi.org/10.1016/0304-4076(90)90014-K
  7. Liu, Y., Tong, D., & Liu, X. (2014). Measuring Spatial Autocorrelation of Vectors: Measuring Spatial Autocorrelation of Vectors. Geographical Analysis, n/a–n/a. http://doi.org/10.1111/gean.12069
  8. LeSage, J. P., & Pace, R. K. (2008). Spatial econometric Modeling Of Origin-Destination Flows. Journal of Regional Science, 48(5), 941–967. http://doi.org/10.1111/j.1467-9787.2008.00573.x
  9. Chun, Y. (2008). Modeling network autocorrelation within migration flows by eigenvector spatial filtering. Journal of Geographical Systems, 10(4), 317–344. http://doi.org/http://dx.doi.org.ezproxy1.lib.asu.edu/10.1007/s10109-008-0068-2
  10. Lambert, D. M., Brown, J. P., & Florax, R. J. G. M. (2010). A two-step estimator for a spatial lag model of counts: Theory, small sample performance and an application. Regional Science and Urban Economics, 40(4), 241–252. http://doi.org/10.1016/j.regsciurbeco.2010.04.001
  11. Sellner, R., Fischer, M. M., & Koch, M. (2013). A Spatial Autoregressive Poisson Gravity Model: A SAR Poisson Gravity Model. Geographical Analysis, 45(2), 180–201. http://doi.org/10.1111/gean.12007
  12. Simini, F., González, M. C., Maritan, A., & Barabási, A.-L. (2012). A universal model for mobility and migration patterns. Nature, 484(7392), 96–100. http://doi.org/10.1038/nature10856
  13. Yan, X.-Y., Zhao, C., Fan, Y., Di, Z., & Wang, W.-X. (2013). Universal Predictability of Mobility Patterns in Cities. arXiv:1307.7502 [physics]. Retrieved from http://arxiv.org/abs/1307.7502
  14. Lenormand, M., Huet, S., Gargiulo, F., & Deffuant, G. (2012). A Universal Model of Commuting Networks. PLoS ONE, 7(10), e45985. http://doi.org/10.1371/journal.pone.0045985
  15. Fischer, M. M. (2006). Neural Networks: A General Framework for Non-Linear Function Approximation. Transactions in GIS, 10(4), 521–533. http://doi.org/10.1111/j.1467-9671.2006.01010.x
  16. Fotheringham, A. S. (1983). A new set of spatial-interaction models: the theory of competing destinations. Environment and Planning A, 15(1), 15–36.
  17. Fotheringham, A. S. (1985). Spatial competition and agglomeration in urban modelling. Environment and Planning A, 17(2), 213–230.

Other Commitments

###Other commitments during the main GSoC time period:

I currently have some academic papers submitted to journals so it could be possible that I may need to make some revisions.

Exams or classes that overlap with this period:

None

Other jobs or internships:

None

Other short term commitments:

Potentially attend the Scipy (Scientific Computing with Python) conference from July 11th-July 17th. This will include coding sprints, which could be used to work on GSoC project, in addition to attending workshps and presentations about python programming.

Other organizations:

None - I am only applying to PySAL during the 2016 cycle of GSoC.

Extra Information

Link to resume:

Resume

University Information:

  • University name: Arizona State University

  • Major: Geography

  • Current year: second year out of three

  • Expected graduation: August/2017

  • Degree: PhD

Other Contact Information:

  • Alternative email: toshan@asu.edu

  • Homepage: tayloroshan.github.io

  • Instant messaging: @TaylorOshan on Gitter

  • Twitter: @TaylorOshan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment