Skip to content

Instantly share code, notes, and snippets.

@rsignell-usgs
Created March 1, 2023 21:56
Show Gist options
  • Save rsignell-usgs/b8a45d5926e7bf8053cb8f0fc982bf85 to your computer and use it in GitHub Desktop.
Save rsignell-usgs/b8a45d5926e7bf8053cb8f0fc982bf85 to your computer and use it in GitHub Desktop.
naip_orig_tutorial.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "f7bff4ee-c9c1-4a19-a76c-8e8a27431996",
"metadata": {},
"source": [
"# NAIP Segmentation using Scikit image Quickshift, SLIC algorithms\n",
"https://opensourceoptions.com/blog/python-geographic-object-based-image-analysis-geobia/\n",
"\n",
"This notebook needs at least 16GB RAM to run"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8f38887d-28ef-4162-9281-2c201bc73923",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from osgeo import gdal\n",
"from skimage import exposure\n",
"from skimage.segmentation import quickshift, slic\n",
"import time\n",
"import scipy\n",
"import fsspec"
]
},
{
"cell_type": "markdown",
"id": "3eca32c6-7354-4d91-b6df-adb3c50c658c",
"metadata": {},
"source": [
"#### Download data so we have a local copy"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "add26381-a5c8-4451-bb7d-18870013cdd2",
"metadata": {},
"outputs": [],
"source": [
"naip_fn = 'https://mghp.osn.xsede.org/rsignellbucket1/obia/m_4107027_se_19_060_20210904.tif'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c033ed33-dedd-45ec-a6bd-914fbe069b62",
"metadata": {},
"outputs": [],
"source": [
"fs_https = fsspec.filesystem('https')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9ebb676b-5a18-4c91-8dc9-e538d6ad3598",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'name': 'https://mghp.osn.xsede.org/rsignellbucket1/obia/m_4107027_se_19_060_20210904.tif',\n",
" 'size': 505021950,\n",
" 'ETag': '\"ba8c8e2c8e6aed3943214a5fa641552d-61\"',\n",
" 'type': 'file'}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fs_https.info(naip_fn)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a6aa1afb-2543-4502-b462-b60d576e457c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.29 s, sys: 806 ms, total: 2.1 s\n",
"Wall time: 15.1 s\n"
]
},
{
"data": {
"text/plain": [
"[None]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"fs_https.download(naip_fn, 'naip.tif')"
]
},
{
"cell_type": "markdown",
"id": "e6b32c19-f683-43aa-b4c1-071184800d0d",
"metadata": {},
"source": [
"#### Load the data"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "26e00612-1992-436d-872c-c9077dea6dc9",
"metadata": {},
"outputs": [],
"source": [
"naip_fn = 'naip.tif'"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "471ec08b-9e93-4707-963a-eda750aaf951",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bands 4 rows 12823 columns 9844\n",
"CPU times: user 442 ms, sys: 580 ms, total: 1.02 s\n",
"Wall time: 2.17 s\n"
]
}
],
"source": [
"%%time\n",
"driverTiff = gdal.GetDriverByName('GTiff')\n",
"naip_ds = gdal.Open(naip_fn)\n",
"nbands = naip_ds.RasterCount\n",
"band_data = []\n",
"print('bands', naip_ds.RasterCount, 'rows', naip_ds.RasterYSize, 'columns',\n",
" naip_ds.RasterXSize)\n",
"for i in range(1, nbands+1):\n",
" band = naip_ds.GetRasterBand(i).ReadAsArray()\n",
" band_data.append(band)\n",
"band_data = np.dstack(band_data)"
]
},
{
"cell_type": "markdown",
"id": "dcb2c6ce-f69d-4872-b4d2-b6776ce04604",
"metadata": {},
"source": [
"#### Scale image values from 0.0 - 1.0 "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d2cb6332-198d-42c1-b7a0-2dfb15e44131",
"metadata": {},
"outputs": [],
"source": [
"img = exposure.rescale_intensity(band_data)"
]
},
{
"cell_type": "markdown",
"id": "2e29b211-c718-4a3d-afc0-c07ed3018642",
"metadata": {},
"source": [
"#### Run the segmentation using SLIC"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "d2dfaf63-5387-4831-8536-0dd676d115c8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2min 53s, sys: 3.62 s, total: 2min 57s\n",
"Wall time: 2min 57s\n"
]
}
],
"source": [
"%%time\n",
"segments = slic(img, n_segments=500000, compactness=0.1)"
]
},
{
"cell_type": "markdown",
"id": "caf08272-3a6a-4827-90d1-445f3fefe0b0",
"metadata": {},
"source": [
"#### Spectral Properties of Segments\n",
"\n",
"Now we need to describe each segment based on it’s spectral properties because the spectral properties are the variables that will classify each segment as a land cover type.\n",
"\n",
"First of all, write a function that, given an array of pixel values, will calculate the min, max, mean, variance, skewness, and kurtosis for each band. The code below takes all the pixels in a segment and calculates statistics for each band, saving them in the features variable, which is returned. Next, get the pixel data and save the returned features. I describe this process below."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0b5ef5fe-1a25-4371-8821-234284f15a71",
"metadata": {},
"outputs": [],
"source": [
"def segment_features(segment_pixels):\n",
" features = []\n",
" npixels, nbands = segment_pixels.shape\n",
" for b in range(nbands):\n",
" stats = scipy.stats.describe(segment_pixels[:, b])\n",
" band_stats = list(stats.minmax) + list(stats)[2:]\n",
" if npixels == 1:\n",
" # in this case the variance = nan, change it 0.0\n",
" band_stats[3] = 0.0\n",
" features += band_stats\n",
" return features"
]
},
{
"cell_type": "markdown",
"id": "8e7057be-61f0-45df-b119-4d33652e86c0",
"metadata": {},
"source": [
"In object-based image analysis each segment represents an object. Objects represent buildings, roads, trees, fields or pieces of those features, depending on how the segmentation is done. \n",
"\n",
"The code below gets a list of the segment ID numbers. Then sets up a list for the statistics describing each object (i.e. segment ID) returned from the segment_features function (above). The pixels for each segment are identified and passed to segment_features, which returns the statistics describing the spectral properties of the segment/object. \n",
"\n",
"Statistics are saved to the objects list and the object_id is stored in a separate list. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "2832b5c2-85f2-4427-ac11-f0d4798201d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"375291\n",
"CPU times: user 4.21 s, sys: 244 ms, total: 4.45 s\n",
"Wall time: 4.45 s\n"
]
}
],
"source": [
"%%time\n",
"segment_ids = np.unique(segments)\n",
"print(len(segment_ids))"
]
},
{
"cell_type": "markdown",
"id": "088e96d0-67e3-4703-a54c-ea08dc50469c",
"metadata": {},
"source": [
"The following cell loops over each segment. It seems to only require 2.5GB RAM"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "81df98fb-dbd0-460e-818b-cdcc8dd8c43c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 26.1 s, sys: 1.67 s, total: 27.7 s\n",
"Wall time: 27.7 s\n"
]
}
],
"source": [
"%%time\n",
"objects = []\n",
"object_ids = []\n",
"for id in segment_ids[:80]: # test with a few segments\n",
" segment_pixels = img[segments == id]\n",
" object_features = segment_features(segment_pixels)\n",
" objects.append(object_features)\n",
" object_ids.append(id)"
]
},
{
"cell_type": "markdown",
"id": "fbc87d69-0ce6-4828-840c-9b90e889eab8",
"metadata": {},
"source": [
"Okay, let's parallelize! \n",
"\n",
"First we define a function that takes the id as input and returns the stats for that segment: "
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "1f729894-9cef-4c6e-ab2f-4a686471c9cc",
"metadata": {},
"outputs": [],
"source": [
"def get_features(id):\n",
" segment_pixels = img[segments == id]\n",
" return segment_features(segment_pixels)"
]
},
{
"cell_type": "markdown",
"id": "327fdaad-64e4-46e5-9410-24643efcc43d",
"metadata": {},
"source": [
"Verify it takes the same amount of time and gives the same results:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "d298546f-91ca-4ba3-978e-c020d75fa5b7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 26 s, sys: 1.75 s, total: 27.7 s\n",
"Wall time: 27.7 s\n"
]
}
],
"source": [
"%%time \n",
"objects2 = [get_features(id) for id in segment_ids[:80]]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "f107eca9-a831-482e-96b0-bd7c0cbd90ec",
"metadata": {},
"outputs": [],
"source": [
"assert objects == objects2"
]
},
{
"cell_type": "markdown",
"id": "5d9ab608-f6ef-4be6-bb34-636236d472d8",
"metadata": {},
"source": [
"Use Dask Bag to parallelize loop"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "cd9daac5-81cd-41d9-a055-8a425d4d0ec2",
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "72b4c602-2f7e-4637-8ff2-3042c445486e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"coiled.analytics.computation.interval is set to '10m'. Ignoring this old default value, using '15s' instead. To override, use any value other than '10m'.\n"
]
}
],
"source": [
"client = Client(n_workers=4, threads_per_worker=1)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "8eb7995d-78ad-4496-aa44-8583914b5184",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
" <div style=\"width: 24px; height: 24px; background-color: #e1e1e1; border: 3px solid #9D9D9D; border-radius: 5px; position: absolute;\"> </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <h3 style=\"margin-bottom: 0px;\">Client</h3>\n",
" <p style=\"color: #9D9D9D; margin-bottom: 0px;\">Client-d75f07df-b876-11ed-a03a-ba4f5d667a1f</p>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
"\n",
" <tr>\n",
" \n",
" <td style=\"text-align: left;\"><strong>Connection method:</strong> Cluster object</td>\n",
" <td style=\"text-align: left;\"><strong>Cluster type:</strong> distributed.LocalCluster</td>\n",
" \n",
" </tr>\n",
"\n",
" \n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard: </strong> <a href=\"http://127.0.0.1:8787/status\" target=\"_blank\">http://127.0.0.1:8787/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\"></td>\n",
" </tr>\n",
" \n",
"\n",
" </table>\n",
"\n",
" \n",
"\n",
" \n",
" <details>\n",
" <summary style=\"margin-bottom: 20px;\"><h3 style=\"display: inline;\">Cluster Info</h3></summary>\n",
" <div class=\"jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output\">\n",
" <div style=\"width: 24px; height: 24px; background-color: #e1e1e1; border: 3px solid #9D9D9D; border-radius: 5px; position: absolute;\">\n",
" </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <h3 style=\"margin-bottom: 0px; margin-top: 0px;\">LocalCluster</h3>\n",
" <p style=\"color: #9D9D9D; margin-bottom: 0px;\">dac2423a</p>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard:</strong> <a href=\"http://127.0.0.1:8787/status\" target=\"_blank\">http://127.0.0.1:8787/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Workers:</strong> 4\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total threads:</strong> 4\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total memory:</strong> 30.91 GiB\n",
" </td>\n",
" </tr>\n",
" \n",
" <tr>\n",
" <td style=\"text-align: left;\"><strong>Status:</strong> running</td>\n",
" <td style=\"text-align: left;\"><strong>Using processes:</strong> True</td>\n",
"</tr>\n",
"\n",
" \n",
" </table>\n",
"\n",
" <details>\n",
" <summary style=\"margin-bottom: 20px;\">\n",
" <h3 style=\"display: inline;\">Scheduler Info</h3>\n",
" </summary>\n",
"\n",
" <div style=\"\">\n",
" <div>\n",
" <div style=\"width: 24px; height: 24px; background-color: #FFF7E5; border: 3px solid #FF6132; border-radius: 5px; position: absolute;\"> </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <h3 style=\"margin-bottom: 0px;\">Scheduler</h3>\n",
" <p style=\"color: #9D9D9D; margin-bottom: 0px;\">Scheduler-c08cecfe-3739-4626-86f8-2d58d68624d2</p>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Comm:</strong> tcp://127.0.0.1:38875\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Workers:</strong> 4\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard:</strong> <a href=\"http://127.0.0.1:8787/status\" target=\"_blank\">http://127.0.0.1:8787/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total threads:</strong> 4\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Started:</strong> Just now\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total memory:</strong> 30.91 GiB\n",
" </td>\n",
" </tr>\n",
" </table>\n",
" </div>\n",
" </div>\n",
"\n",
" <details style=\"margin-left: 48px;\">\n",
" <summary style=\"margin-bottom: 20px;\">\n",
" <h3 style=\"display: inline;\">Workers</h3>\n",
" </summary>\n",
"\n",
" \n",
" <div style=\"margin-bottom: 20px;\">\n",
" <div style=\"width: 24px; height: 24px; background-color: #DBF5FF; border: 3px solid #4CC9FF; border-radius: 5px; position: absolute;\"> </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <details>\n",
" <summary>\n",
" <h4 style=\"margin-bottom: 0px; display: inline;\">Worker: 0</h4>\n",
" </summary>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Comm: </strong> tcp://127.0.0.1:37529\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total threads: </strong> 1\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard: </strong> <a href=\"http://127.0.0.1:41235/status\" target=\"_blank\">http://127.0.0.1:41235/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Memory: </strong> 7.73 GiB\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Nanny: </strong> tcp://127.0.0.1:43073\n",
" </td>\n",
" <td style=\"text-align: left;\"></td>\n",
" </tr>\n",
" <tr>\n",
" <td colspan=\"2\" style=\"text-align: left;\">\n",
" <strong>Local directory: </strong> /tmp/dask-worker-space/worker-yrr4h01v\n",
" </td>\n",
" </tr>\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" </table>\n",
" </details>\n",
" </div>\n",
" </div>\n",
" \n",
" <div style=\"margin-bottom: 20px;\">\n",
" <div style=\"width: 24px; height: 24px; background-color: #DBF5FF; border: 3px solid #4CC9FF; border-radius: 5px; position: absolute;\"> </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <details>\n",
" <summary>\n",
" <h4 style=\"margin-bottom: 0px; display: inline;\">Worker: 1</h4>\n",
" </summary>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Comm: </strong> tcp://127.0.0.1:41711\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total threads: </strong> 1\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard: </strong> <a href=\"http://127.0.0.1:41201/status\" target=\"_blank\">http://127.0.0.1:41201/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Memory: </strong> 7.73 GiB\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Nanny: </strong> tcp://127.0.0.1:45517\n",
" </td>\n",
" <td style=\"text-align: left;\"></td>\n",
" </tr>\n",
" <tr>\n",
" <td colspan=\"2\" style=\"text-align: left;\">\n",
" <strong>Local directory: </strong> /tmp/dask-worker-space/worker-r3fh4puo\n",
" </td>\n",
" </tr>\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" </table>\n",
" </details>\n",
" </div>\n",
" </div>\n",
" \n",
" <div style=\"margin-bottom: 20px;\">\n",
" <div style=\"width: 24px; height: 24px; background-color: #DBF5FF; border: 3px solid #4CC9FF; border-radius: 5px; position: absolute;\"> </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <details>\n",
" <summary>\n",
" <h4 style=\"margin-bottom: 0px; display: inline;\">Worker: 2</h4>\n",
" </summary>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Comm: </strong> tcp://127.0.0.1:43525\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total threads: </strong> 1\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard: </strong> <a href=\"http://127.0.0.1:43753/status\" target=\"_blank\">http://127.0.0.1:43753/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Memory: </strong> 7.73 GiB\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Nanny: </strong> tcp://127.0.0.1:45767\n",
" </td>\n",
" <td style=\"text-align: left;\"></td>\n",
" </tr>\n",
" <tr>\n",
" <td colspan=\"2\" style=\"text-align: left;\">\n",
" <strong>Local directory: </strong> /tmp/dask-worker-space/worker-ujln9gls\n",
" </td>\n",
" </tr>\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" </table>\n",
" </details>\n",
" </div>\n",
" </div>\n",
" \n",
" <div style=\"margin-bottom: 20px;\">\n",
" <div style=\"width: 24px; height: 24px; background-color: #DBF5FF; border: 3px solid #4CC9FF; border-radius: 5px; position: absolute;\"> </div>\n",
" <div style=\"margin-left: 48px;\">\n",
" <details>\n",
" <summary>\n",
" <h4 style=\"margin-bottom: 0px; display: inline;\">Worker: 3</h4>\n",
" </summary>\n",
" <table style=\"width: 100%; text-align: left;\">\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Comm: </strong> tcp://127.0.0.1:40001\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Total threads: </strong> 1\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Dashboard: </strong> <a href=\"http://127.0.0.1:45131/status\" target=\"_blank\">http://127.0.0.1:45131/status</a>\n",
" </td>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Memory: </strong> 7.73 GiB\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"text-align: left;\">\n",
" <strong>Nanny: </strong> tcp://127.0.0.1:33087\n",
" </td>\n",
" <td style=\"text-align: left;\"></td>\n",
" </tr>\n",
" <tr>\n",
" <td colspan=\"2\" style=\"text-align: left;\">\n",
" <strong>Local directory: </strong> /tmp/dask-worker-space/worker-q6q7bb69\n",
" </td>\n",
" </tr>\n",
"\n",
" \n",
"\n",
" \n",
"\n",
" </table>\n",
" </details>\n",
" </div>\n",
" </div>\n",
" \n",
"\n",
" </details>\n",
"</div>\n",
"\n",
" </details>\n",
" </div>\n",
"</div>\n",
" </details>\n",
" \n",
"\n",
" </div>\n",
"</div>"
],
"text/plain": [
"<Client: 'tcp://127.0.0.1:38875' processes=4 threads=4, memory=30.91 GiB>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "aceeb9ea-414b-4e7b-b2cb-6bfa9f43c92e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: <Nanny: tcp://127.0.0.1:37529, threads: 1>,\n",
" 1: <Nanny: tcp://127.0.0.1:41711, threads: 1>,\n",
" 2: <Nanny: tcp://127.0.0.1:43525, threads: 1>,\n",
" 3: <Nanny: tcp://127.0.0.1:40001, threads: 1>}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.cluster.workers"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "ce4e6e4a-b528-42ed-97ac-029f584f00e7",
"metadata": {},
"outputs": [],
"source": [
"import dask.bag as db"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "b7868d75-8424-4189-a610-708f1c099344",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 21.7 s, sys: 9.69 s, total: 31.4 s\n",
"Wall time: 40.6 s\n"
]
}
],
"source": [
"%%time\n",
"b = db.from_sequence(segment_ids[:80], npartitions=4)\n",
"b1 = b.map(get_features).compute()"
]
},
{
"cell_type": "markdown",
"id": "c2620139-89ed-4030-9d28-0c5cf7590a0e",
"metadata": {},
"source": [
"#### Try scattering the image and segment data to the workers "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d65746b-b41d-4f6a-a08f-f78837f6c40a",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"scattered_img = client.scatter(img, broadcast=True)\n",
"scattered_segments = client.scatter(segments, broadcast=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "447c590f-5bd2-4e27-aa16-3b11db769390",
"metadata": {},
"outputs": [],
"source": [
"def get_features(id, img=scattered_img, segments=scattered_segments):\n",
" segment_pixels = img[segments == id]\n",
" return segment_features(segment_pixels)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06609825-0aa2-4394-9f6e-14b5c0b6a3c6",
"metadata": {},
"outputs": [],
"source": [
"%%time \n",
"b1 = b.map(get_features).compute()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "780b2001-fa88-41b5-a87b-4003e8ee961f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "users-users-pangeo",
"language": "python",
"name": "conda-env-users-users-pangeo-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment