JGravier/Distances simulations between linear located elements and random point patterns in R.md

## Distances simulations between linear located elements and random point patterns in R.md

      
    Raw
  

              Distances simulations between linear located elements and random point patterns in R.md
            
          
    Distances simulations between linear located elements and random point patterns in R

Julie Gravier
2022-09-29
Librairies required:
library(tidyverse) # or at least tidyr, dplyr and ggplot2 librairies
library(sf)
library(maptools)
library(spData)
Initial (fake) question

We have two datasets in spData package:

the current regions of New Zeland,
the top 101 heighest points in New Zealand (2017).

The question is whether these highest points have had an impact on the delimitation of the country's regions. For this purpose, we study the distances between these highest points and the regional limits that we will compare with a random situation.
Linear located elements

Initial regional dataset
spData::nz
#>                 Name Island  Land_area Population Median_income Sex_ratio
#> 1          Northland  North 12500.5611     175500         23400 0.9424532
#> 2           Auckland  North  4941.5726    1657200         29600 0.9442858
#> 3            Waikato  North 23900.0364     460100         27900 0.9520500
#> 4      Bay of Plenty  North 12071.1447     299900         26200 0.9280391
#> 5           Gisborne  North  8385.8266      48500         24400 0.9349734
#> 6        Hawke's Bay  North 14137.5244     164000         26100 0.9238375
#> 7           Taranaki  North  7254.4804     118000         29100 0.9569363
#> 8  Manawatu-Wanganui  North 22220.6084     234500         25000 0.9387734
#> 9         Wellington  North  8048.5528     513900         32700 0.9335524
#> 10        West Coast  South 23245.4559      32400         26900 1.0139072
Tranforming polygons as lines
nz_lines <- st_cast(x = spData::nz, to = "MULTILINESTRING")
Lines dataset
Simple feature collection with 16 features and 6 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 10 features:
                Name Island Land_area Population Median_income Sex_ratio                           geom
1          Northland  North 12500.561     175500         23400 0.9424532 MULTILINESTRING ((1745493 6...
2           Auckland  North  4941.573    1657200         29600 0.9442858 MULTILINESTRING ((1803822 5...
3            Waikato  North 23900.036     460100         27900 0.9520500 MULTILINESTRING ((1860345 5...
4      Bay of Plenty  North 12071.145     299900         26200 0.9280391 MULTILINESTRING ((2049387 5...
5           Gisborne  North  8385.827      48500         24400 0.9349734 MULTILINESTRING ((2024489 5...
6        Hawke's Bay  North 14137.524     164000         26100 0.9238375 MULTILINESTRING ((2024489 5...
7           Taranaki  North  7254.480     118000         29100 0.9569363 MULTILINESTRING ((1740438 5...
8  Manawatu-Wanganui  North 22220.608     234500         25000 0.9387734 MULTILINESTRING ((1866732 5...
9         Wellington  North  8048.553     513900         32700 0.9335524 MULTILINESTRING ((1881590 5...
10        West Coast  South 23245.456      32400         26900 1.0139072 MULTILINESTRING ((1557042 5...
Point pattern dataset
spData::nz_height

Simple feature collection with 101 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 10 features:
   t50_fid elevation nearest_line_id  snap_dist       X       Y                geometry
1  2353944      2723              13 15439.8492 1188706 5050278 POINT (1204143 5049971)
2  2354404      2820              10 14414.9990 1233682 5062686 POINT (1234725 5048309)
3  2354405      2830              10 14066.4000 1234896 5062775 POINT (1235915 5048745)
4  2369113      3033              10  2760.7402 1261779 5074752 POINT (1259702 5076570)
5  2362630      2749              10 13102.4487 1365475 5161736 POINT (1378170 5158491)
6  2362814      2822              10 11830.1136 1389413 5180579 POINT (1389460 5168749)
7  2362817      2778              10 11138.4179 1389413 5180579 POINT (1390166 5169466)
8  2363991      3004              10  1433.0696 1371424 5173816 POINT (1372357 5172729)
9  2363993      3114              10   856.0622 1371505 5173886 POINT (1372062 5173236)
10 2363994      2882              10  1204.0424 1372026 5174333 POINT (1372810 5173419)
Mapping datasets
plot(nz_lines$geom)
plot(nz_height$geom, col = "darkorange", add = TRUE)

Distances calculation between heighest points and region's limits

Add ID to each region
nz_lines <- nz_lines %>%
  rowid_to_column(.) # id column name: rowid
Create sp datasets for snapping points to nearest lines with snapPointsToLines function
# sp data
nz_height_sp <- as(object = nz_height, Class = "Spatial")
nz_lines_sp <- as(object = nz_lines, Class = "Spatial")

# snapping points to nearest lines
tibble_distances <- maptools::snapPointsToLines(points = nz_height_sp,
                            lines = nz_lines_sp,
                            maxDist = 40000, # here in meters
                            idField = "rowid") %>%
  as_tibble(.)
Dataset of snapped points
tibble_distances
# A tibble: 101 × 4
   nearest_line_id snap_dist        X        Y
             <int>     <dbl>    <dbl>    <dbl>
 1              13    15440. 1188706. 5050278.
 2              10    14415. 1233682. 5062686.
 3              10    14066. 1234896. 5062775.
 4              10     2761. 1261779. 5074752.
 5              10    13102. 1365475. 5161736.
 6              10    11830. 1389413. 5180579.
 7              10    11138. 1389413. 5180579.
 8              10     1433. 1371424. 5173816.
 9              10      856. 1371505. 5173886.
10              10     1204. 1372026. 5174333.
# … with 91 more rows
Order of the dataset is the same of the initial point pattern. It is then possible to bind nz_height with this result.
nz_height <- nz_height %>%
  bind_cols(tibble_distances)


nz_height

# sf object
# where X and Y are the coordinates of Pi snap points
# and geometry the coordinates of Pi
Simple feature collection with 101 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 10 features:
   t50_fid elevation nearest_line_id  snap_dist       X       Y                geometry
1  2353944      2723              13 15439.8492 1188706 5050278 POINT (1204143 5049971)
2  2354404      2820              10 14414.9990 1233682 5062686 POINT (1234725 5048309)
3  2354405      2830              10 14066.4000 1234896 5062775 POINT (1235915 5048745)
4  2369113      3033              10  2760.7402 1261779 5074752 POINT (1259702 5076570)
5  2362630      2749              10 13102.4487 1365475 5161736 POINT (1378170 5158491)
6  2362814      2822              10 11830.1136 1389413 5180579 POINT (1389460 5168749)
7  2362817      2778              10 11138.4179 1389413 5180579 POINT (1390166 5169466)
8  2363991      3004              10  1433.0696 1371424 5173816 POINT (1372357 5172729)
9  2363993      3114              10   856.0622 1371505 5173886 POINT (1372062 5173236)
10 2363994      2882              10  1204.0424 1372026 5174333 POINT (1372810 5173419)
We can then plot the distribution of the distances between the heighest points of New Zeland and the borders of the regions.
nz_height %>%
  ggplot(mapping = aes(x = snap_dist)) +
  geom_density()

Distances calculation between random point pattern and region's limits

In a second step we can simulate N random point patterns, then calculate the distances between the laters and the borders of the regions. N always is chosen by the user.
First, we create of an empty data table that will be filled with the simulation results:
compilrandom <- tibble(nearest_line_id = numeric(),
                       snap_dist = numeric(),
                       X = numeric(),
                       Y = numeric(),
                       sim = numeric())
Then, we create a loop to simulate random point patterns in New Zeland and calculate the distances.
# zone in which random points are created
new_zeland <- nz %>%
  st_union(.)

plot(new_zeland)

Loop
for (i in 1:10) { # N simulation = 10
  random_points <- st_sample(x = new_zeland, # random points IN New Zeland zone
                             size = nrow(x = nz_height)) # size of the sample is the same as the init. point pattern
  
  snapping <- maptools::snapPointsToLines(points = as(object = random_points, Class = "Spatial"),
                                          lines = as(object = nz_lines, Class = "Spatial"),
                                          maxDist = 40000, # in meters
                                          idField = "rowid") %>%
    as_tibble(x = .) %>%
    mutate(sim = i) # add a column containing the simulation number
  
  print(i) # to know where we are in processing
  
  compilrandom <- compilrandom %>%
    bind_rows(snapping)
}
Dataset result
compilrandom

# A tibble: 941 × 5
   nearest_line_id snap_dist        X        Y   sim
             <dbl>     <dbl>    <dbl>    <dbl> <dbl>
 1              11    39383. 1332211. 5071253.     1
 2              12    28305. 1292366. 4977682.     1
 3              12    13868. 1302700. 4856972.     1
 4               3     5636. 1856544. 5892553.     1
 5              11    18025. 1395149. 5004766.     1
 6              12    11757. 1223251. 5058076.     1
 7               8    36948. 1775204. 5571056.     1
 8               1     7827. 1667504. 6122581.     1
 9               8    22649. 1868865. 5492924.     1
10               6    35363. 1864128. 5568156.     1
# … with 931 more rows
We can plot the distances results as before for observed points
compilrandom %>%
  ggplot(mapping = aes(x = snap_dist)) +
  geom_density()

Comparison between observed and simulated distances

Construction of a dataset containing both distances datasets.
distances_tibble <- compilrandom %>%
  bind_rows(tibble_distances %>%
              mutate(sim = NA) # creating a new column where observed distances = NA simulation
            ) %>%
  mutate(type_of_distance = if_else(
    condition = is.na(sim), 
    true = "observed", 
    false = "simulated" # new column to distinguished simulated and observed distances
  ))
Two types of representation for comparison
# first representation
distances_tibble %>%
  ggplot(mapping = aes(x = snap_dist)) +
  geom_density(mapping = aes(color = as.character(sim))) +
  scale_x_continuous(name = "distances (in m.)", trans = "log10")

# second representation
distances_tibble %>%
  ggplot(mapping = aes(x = snap_dist)) +
  geom_density(mapping = aes(color = type_of_distance)) +
  scale_x_continuous(name = "distances (in m.)", trans = "log10")


Answer to the initial fake question

Observed and simulated distances between points and the borders of the regions of New Zeland are very different. Indeed, simulated distances are on average 14 kilometers long, while the observed ones are 3 kms long.
The highest points are elements of the landscape that have influenced the determination of the boundaries of New Zealand's regions, knowing of course that they are not the only factors in the construction of boundaries.