Isolation by Resistance Sampling

Landscape genetics has laddered null hypothesis regarding the genetic structure of species. The first null, Panmixia, is that populations show no differentiation, as could be observed in a species with few, highly connected populations; it is not treated as a sampling scenario by this package. The second null is Isolation by Distance, the main idea behind most of the functions in this package, where populations become more differentiated as a function of geographic distance - culminating in the function ‘IBDBasedSample’. The final null hypothesis is Isolation by Resistance, where the differentiation of populations is driven by environmental distances across populations decreasing gene flow.

While IBD is easily calculated from geographic distances, IBR requires a parameterized cost surface, in raster format, that depicts the costs to an organism’s movement of maternal and paternal genetic material. safeHavens supports a simple workflow to paramterize a cost surface, that can be used to support basic IBR type sampling.

Example

We’ll use the species occurrence data from dismo again here. A few spatial data sets that should be suitable for nearly all users IBR applications are included in safeHavens for this example, but the full data sets are not shipped with the package. They include: - lakes, Lehner et al. 2025, GLWD - oceans from Natural Earth, https://www.naturalearthdata.com/downloads/10m-physical-vectors/ - rivers from FAO, - tri (terrain ruggedness index), Amatulli et al. 2018, https://www.earthenv.org/topography

library(terra)
library(sf)
library(dplyr)
library(ggplot2)
library(safeHavens)

We’ll load the species occurrence data and make it explicitly spatial.

x <- read.csv(file.path(system.file(package="dismo"), 'ex', 'bradypus.csv'))
x <- x[,c('lon', 'lat')]
x <- sf::st_as_sf(x, coords = c('lon', 'lat'), crs = 4326)

planar_proj <- 3857 # Web Mercator for planar distance calcs

data prep

Each of the environmental variable data sources needs some pre-processing before being feed into the IBR workflow, most of them just need to be converted from vector to raster formats - a process we call ‘rasterization’. We will also rescale the tri rasters, to put them on a similar numeric range as the other data sets.

tri <- terra::rast(file.path(system.file(package = "safeHavens"), "extdata", "tri.tif"))
names(tri) <- 'tri'

rescale_rast <- function(r, new_min = 0, new_max = 1) {
  r_min <- global(r, "min", na.rm = TRUE)[[1]]
  r_max <- global(r, "max", na.rm = TRUE)[[1]]
  ((r - r_min) / (r_max - r_min)) * (new_max - new_min) + new_min
}

tri <- rescale_rast(tri, 0, 100)

lakes_v <- sf::st_read(
  file.path(system.file(package = "safeHavens"),  "extdata", "lakes.gpkg"),
    quiet = T) |>
  filter(TYPE == 'Lake') |> # remove reservoirs for our purposes (time scale)
  select(geometry = geom) |> 
  mutate(Value = 1) |> # this will be rasterized. 
  terra::vect()

ocean_v <- sf::st_read(
  file.path(system.file(package = "safeHavens"), "extdata", "oceans.gpkg"),
   quiet = T) |>
  select(geometry = geom) |>
  mutate(Value = 1) |>
  terra::vect()

rivers_v <- sf::st_read(
  file.path(system.file(package = "safeHavens"), "extdata", "rivers.gpkg"),
   quiet = T) |>
  select(geometry = geom) |>
  mutate(Value = 1) |>
  terra::vect()

We’ll convert each of the above spatVect objects to spatRasters, and clip them to an area of analysis around our species records.

x_buff <- sf::st_transform(x, planar_proj) |>
  # huge buffer for the bbox. 
  st_buffer(200000) |> 
  st_transform(crs(lakes_v)) |>
  st_as_sfc() |> 
  st_union() |>
  vect() |>
  ext()

lakes_v <- crop(lakes_v, x_buff)
rivers_v <- crop(rivers_v, x_buff)
ocean_v <- crop(ocean_v, x_buff)
tri <- crop(tri, x_buff)

The vector data need to be converted to raster format using a function like rasterize, from terra.

ocean_r <- rasterize(ocean_v, tri, field = 'Value', background = 0.1)
lakes_r <- rasterize(lakes_v, tri, field = 'Value',  background = 0.1)
rivers_r <- rasterize(rivers_v, tri, field = 'Value',  background = 0.1)

par(mfrow=c(2, 2))
plot(rivers_r)
plot(lakes_r)
plot(ocean_r)
plot(tri)

dev.off()
#> null device 
#>           1

rm(ocean_v, lakes_v, rivers_v)

actual workflow.

The first, of three, functions in the Isolation by Resistance sampling workflow is used to create the resistance surface. It requires a template raster, and the other rasters which will be incorporated into the product. buildResistanceSurface essentially just requires the rasters we configured above, and user specified weights to assign to each raster. These weights convey how difficult it is for a species to move across the landscape. For plants, oceans generally get very high weight, while lakes and rivers - still obstacles - receive lower weights. Terrain ruggedness values are much more likely to vary based on both the ecology of the species, and the lanscape being sampled.

When considering weights consider the influences of: bird dispersal, land animal dispersal, and natural growth/expansion of populations. This is certainly the most difficult part of this workstream to tune, and often relies on subjective expert judgement. The values below are the functions defaults - to my eyes they work OK for the landscape at hand; although an ecologist with experience in this region may beg to differ, and I would not argue with them.

res_surface <- buildResistanceSurface(
  base_raster = rast(ocean_r), # template

  oceans = ocean_r, # rasters
  lakes = lakes_r,
  rivers = rivers_r,
  tri = tri,

  w_ocean = 120, # weights to multiple  input raster values by -- 
  w_lakes = 50, # is 1 * 50
  w_rivers = 20, # is 1 *20
  w_tri = 4 # ranges from 1~30, so from (1-30)*4 up to a weight of ~120. 
)

plot(res_surface)

Note that the above object will be stored in memory, I advocate for the use of a relatively coarsely grained raster surface (e.g. 4kmx4km cells or more), because we are doing pretty generalized work, and it will speed up processing.

Now that we have our resistance surface, the next step is calculating the distances between areas across the species range. Similar to other functions in safeHavens we will not use population occurrence record data directly, as we know that spatial biases affect occurrence record distributinons. Rather we will buffer (buffer_dist) the occurrence records, using a relevant planar_proj, and create a raster surface of their locations. From this surface we will sample n points (using terra::spatSample, with method = ‘spread’), and identify neighboring points using a graph_method, either delauney triangulation, which is useful when a very high n is required, or through a complete distance matrix.
In my testing I greatly prefer results from ‘complete’ calculations, and these work with NbClust without having to be ordinated down into 2-dimensional space and give much better results.

This function will return a raster representing the buffered populations (species range in the domain), the points sampled by terra, the graph information, and a distance matrix with the least cost distance values from the resistance surface.

pop_res_graphs <- populationResistance(
  base_raster = rast(ocean_r),
  populations_sf = x,
  n = 150,
  planar_proj = 3857,
  buffer_dist = 125000,
  resistance_surface = res_surface,
  graph_method = 'complete'
)

names(pop_res_graphs)
#> [1] "pop_raster"     "sampled_points" "spatial_graph"  "edge_list"     
#> [5] "ibr_matrix"

The buffered locations look like this - currently the IDs are meaningless.

plot(pop_res_graphs$pop_raster)

This function does not perform well with user pre-specified n, however the min.nc parameter, passed to NbClust can be applied to generally give the analyst specified value.

ibr_groups <- IBRSurface(
  base_raster = rast(ocean_r),
  resistance_surface = res_surface, 
  pop_raster = pop_res_graphs$pop_raster,
  pts_sf = pop_res_graphs$sampled_points, 
  ibr_matrix = pop_res_graphs$ibr_matrix, 
  fixedClusters = TRUE,
  n = 20,
  planar_proj = 3857
)

The results of the clustering process are visualized beneath.

classified_pts = ibr_groups$points
plot(res_surface)
points(
  classified_pts, 
  pch = as.numeric(as.factor(classified_pts$ID)),  # different symbol per ID
  col = rainbow(length(unique(classified_pts$ID)))[as.factor(classified_pts$ID)],
  cex = 2,
  lwd = 3
  )

The classified occupied area clusters are visualized below.

ggplot() +
  geom_sf(data = ibr_groups$geometry, aes(fill = factor(ID))) + 
  theme_minimal()

Finally the surface can be sampled using PolygonBasedSample. Simply union the geometries to be the species range, and supply the groups to the x argument, in lieu of an ecoregion or pstz type surface.

out <- PolygonBasedSample(
  x = st_union(ibr_groups$geometry),
  zones = ibr_groups$geometry, 
  zone_key = 'ID', 
  n  = 30
  )

ggplot(data = out) + 
  geom_sf(aes(fill = factor(allocation))) + 
  theme_minimal() + 
  labs(fill = 'Ct. collections\nper cluster') + 
  theme(legend.position = 'bottom')

Each of the nine clusters has a different number of the 20 desired collections assigned to it.

bonus: adding sdm as a resistance layer, and non-linear scaling.

A default argument to cluster IBR is the probablity surface predictions from an SDM. Here we will add it to the surface.

We will also show another way of adding topographc ruggedness index, that ‘bypasses’ an assumed linear relationship along the weighting scheme.

sdm <- terra::rast(
  file.path(system.file(package="safeHavens"), 'extdata',  'SDM_thresholds.tif')
  )
plot(sdm['Predictions'])

Note that we will have to reverse the predictions from this layer, currently higher values mean more suitable habitat. We want the higher values going into the probability surface function to mean more resistance to movement.

inverted_sdm <- 1 - sdm['Predictions']
plot(inverted_sdm)

And indeed, we may actually just want to remove weights in suitable habitat, or set to an arbitrarily low value

inverted_sdm <- terra::ifel(inverted_sdm < 0.5, 0.01, inverted_sdm)
plot(inverted_sdm)

And we will convert it to the same, integer, scale as the rest of the predictors

inverted_sdm = round(inverted_sdm * 100, 0)
plot(inverted_sdm)

Note that we will have to make sure our SDM aligns with the existin layers

inverted_sdm <- crop(inverted_sdm, ocean_r)
inverted_sdm <- resample(inverted_sdm, ocean_r)
inverted_sdm[is.na(inverted_sdm)] <- 99

plot(inverted_sdm)

non-linear weight

Not all environmental variables are expected to have linear effects on movement. Indeed something like terrain ruggedness may have only a small effect until reaching certain thresholds.

x <- 1:100
rs <- function(x, to = c(0, 100)) {
  rng <- range(x, na.rm = TRUE)
  to[1] + (x - rng[1]) / diff(rng) * diff(to)
}

# display some transformations. 
par(mfrow = c(2, 3), mar = c(4, 4, 2, 1))
plot(x, rs(sqrt(x)), type = "l", main = "sqrt(x)")
plot(x, rs(log1p(x)), type = "l", main = "log1p(x)")
plot(x, rs(asinh(x)), type = "l", main = "asinh(x)")
plot(x, rs(x^2), type = "l", main = "x^2: polynomial")
plot(x, rs(x^3), type = "l", main = "x^3: polynomial")
plot(x, rs(x^4), type = "l", main = "x^4 polynomial")


rm(x)

We will apply a fourth order polynomial to the input tri data to accomodate a non-linear effect.

tri_rscl <- app(tri, function(x) { rs(x^2, to = c(1, 80)) })

par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))
plot(tri, main = 'Original')
plot(tri_rscl, main = 'Rescaled')

And we can create a new resistance surface

res_surface <- buildResistanceSurface(

  base_raster = rast(ocean_r), # template

  oceans = ocean_r, # rasters
  lakes = lakes_r,
  rivers = rivers_r,
  #tri = tri, # dont use me! i rescale. 
  habitat = inverted_sdm,
  addtl_r = tri_rscl,

  w_ocean = 120, # weights to multiple  input raster values by -- 
  w_lakes = 70, # is 1 * 50
  w_rivers = 40, # is 1 *20
  w_habitat = 0.5,
  #w_tri = 4 # don't use me! use addtl_wt
  addtl_w = 1
)
dev.off()
#> null device 
#>           1
plot(res_surface)

Isolation By Resistance