Skip to content

Instantly share code, notes, and snippets.

@darribas
Created January 29, 2022 18:47
Show Gist options
  • Save darribas/1be34f836d2b513931123ad833ff446d to your computer and use it in GitHub Desktop.
Save darribas/1be34f836d2b513931123ad833ff446d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 31,
"id": "a3a628df-d380-449e-ad28-8c92a5d6c02c",
"metadata": {},
"outputs": [],
"source": [
"from libpysal.weights import Queen, raster\n",
"from libpysal import weights\n",
"import xarray\n",
"import esda\n",
"import warnings"
]
},
{
"cell_type": "markdown",
"id": "82cf7d00-520f-464f-99db-1d0fda52fea4",
"metadata": {},
"source": [
"- Read data"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "aa016ff3-01d2-4057-92af-5f1475851f21",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float32')"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pop = xarray.open_rasterio(\n",
" 'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'\n",
").sel(\n",
" x=slice(-4436000, -4427000), y=slice(-2875000, -2886000), band=1\n",
")\n",
"pop.dtype"
]
},
{
"cell_type": "markdown",
"id": "27aeb606-05d2-480a-aa0b-f0b3f5e24d2e",
"metadata": {},
"source": [
"- Build weights"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "9bc3cac6-07c8-403d-9ad6-43e6767011cd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('int8')"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"w = Queen.from_xarray(pop, sparse=False)\n",
"w.sparse.dtype"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "1d635c9b-6fdf-4371-861c-f31e78d37e95",
"metadata": {},
"outputs": [],
"source": [
"w.transform = 'r'"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "1d31a4c9-b080-4ef6-a9f2-d691cd685f9b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"w.sparse.dtype"
]
},
{
"cell_type": "markdown",
"id": "40860384-4954-446f-98bc-a58ffae5768b",
"metadata": {},
"source": [
"- Attempt LISA"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e2e69de8-7305-4bfe-ab3e-4516887963e6",
"metadata": {},
"outputs": [],
"source": [
"# Fails\n",
"lisa = esda.Moran_Local(pop.to_series(), w)"
]
},
{
"cell_type": "markdown",
"id": "45372cf5-e75f-4bee-a7e3-d72b03fa10b1",
"metadata": {},
"source": [
"- Aligner"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "3fc8d8dc-aed9-4f90-9293-c767d2cd360f",
"metadata": {},
"outputs": [],
"source": [
"def aligner(y, w, how='y'):\n",
" '''\n",
" Align `dtype` between `y` and `w`\n",
" ...\n",
" \n",
" Arguments\n",
" ---------\n",
" y\n",
" w\n",
" how : str/dtype\n",
" [Optional. Default='y'] Alignment policy:\n",
" - 'y': use `y.dtype`\n",
" - 'w': use `dtype` in `w`\n",
" - `dtype`: different `dtype` target\n",
" \n",
" Returns\n",
" -------\n",
" ya : ndarray/pandas.Series\n",
" wa : W\n",
" '''\n",
" if type(how) is not str:\n",
" ya = y.astype(how)\n",
" wa = _convert_w_dtype(w, how)\n",
" return ya, wa\n",
" elif how == 'y':\n",
" wa = _convert_w_dtype(w, y.dtype)\n",
" return y, wa\n",
" elif how == 'w':\n",
" ya = y.astype(w.sparse.dtype)\n",
" return ya, w\n",
" else:\n",
" warnings.warn(\"Please pass 'y', 'w', or an explicit dtype for `how`\")\n",
" \n",
"\n",
"def _convert_w_dtype(w, dtype):\n",
" nw = w.sparse.astype(dtype)\n",
" nw = weights.WSP(nw, id_order=w.id_order)\n",
" nw = weights.WSP2W(nw)\n",
" return nw"
]
},
{
"cell_type": "markdown",
"id": "bef71e46-9159-4dd4-b4a9-87990e74cafe",
"metadata": {},
"source": [
"- Use aligner to convert pre-LISA computation"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "5eec14a0-ac23-4556-be07-d08accefa860",
"metadata": {},
"outputs": [],
"source": [
"ya1, wa1 = aligner(pop.to_series(), w, float)\n",
"\n",
"lisa1 = esda.Moran_Local(ya1, wa1)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "2561e72a-68e1-49ba-9b2d-96498301ab36",
"metadata": {},
"outputs": [],
"source": [
"ya2, wa2 = aligner(pop.to_series(), w, 'w')\n",
"\n",
"lisa2 = esda.Moran_Local(ya2, wa2)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "fe1f400f-ac11-472d-ba0e-57ae58bd09a5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(lisa1.Is != lisa2.Is).sum()"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "479bbdb0-02b4-4948-8621-d0aedb395e34",
"metadata": {},
"outputs": [],
"source": [
"ya3, wa3 = aligner(pop.to_series(), w, 'y')\n",
"\n",
"#lisa3 = esda.Moran_Local(ya3, wa3)"
]
},
{
"cell_type": "markdown",
"id": "39ec7a52-f4ef-4fe4-bc1b-99a8f0a6a26e",
"metadata": {},
"source": [
"The above still fails because, after `w` is converted into `wa3`, expressed as `float32`, when transforming it for row-standardisation (default in `Moran_Local`), it is converted into `float64`."
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "6e5c78f5-b3ac-4a37-86a5-9d12ccab30bd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'O'"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wa3.transform"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "ef87586f-90db-43c6-ab1f-a1d05e0e8d70",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float32')"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wa3.sparse.dtype"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "3110c2ab-ade0-4d87-aa79-1857eeacfc44",
"metadata": {},
"outputs": [],
"source": [
"wa3.transform = 'r'"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "131d0520-86fe-4c39-8ff0-d42d0c6155a5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wa3.sparse.dtype"
]
},
{
"cell_type": "markdown",
"id": "31573814-601f-489d-91b2-04a698cc5af5",
"metadata": {},
"source": [
"Ways forward/thoughts:\n",
"\n",
"1. Something like `aligner` could be embedded in `Moran_Local` and all methods that rely on `numba`, but this would need to be added _after_ any transformations and on each of the classes that currently rely on accelerated randomisation\n",
"1. My general sense is to have a sensible default (either `y.dtype` or `float64`) and leave flexibility as an option if folks need it\n",
"1. Something to keep in mind that I've not thought too much about beyond seeing it might be an issue is whether there are more memory efficient ways of doing all this. If `w` is a large matrix, the approach taken in `_convert_w_dtype` might not be feasible/desirable. Do we have other ways?\n",
"1. At any rate, if `aligner` was called within something like `Moran_Local`, warnings would need to be raised so the user is aware. In fact, one option in `Moran_local` could select what to do, with the option of raising an error if they're different, or converting `dtype`s automatically."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment