Skip to contents

Isolation by Resistance Sampling

Landscape genetics has laddered null hypotheses regarding the genetic structure of species. The first null, Panmixia, is that populations show no differentiation, as could be observed in a species with few, highly connected populations; this package does not treat it as a sampling scenario. The second null is Isolation by Distance, the main idea behind most of the functions in this package, in which populations become more differentiated with increasing geographic distance, culminating in the function ‘IBDBasedSample’.The next hypothesis is Isolation by Resistance, where the differentiation of populations is driven by a break down in gene flow due to barriers.

While IBD is easily calculated from geographic distances, IBR requires a parameterised cost surface, in raster format, that depicts the costs to an organism’s movement of maternal and paternal genetic material. safeHavens supports a simple workflow for parameterising a cost surface that can be used for basic IBR-type sampling.

Example

We’ll use the species occurrence data from dismo again here. A few spatial data sets suitable for nearly all users’ IBR applications are included in safeHavens for this example, but the full data sets are not shipped with the package.

They include: - lakes, Lehner et al. 2025, GLWD
- oceans from Natural Earth, https://www.naturalearthdata.com/downloads/10m-physical-vectors/
- rivers from FAO, - tri (terrain ruggedness index), Amatulli et al. 2018, https://www.earthenv.org/topography

library(safeHavens)
library(sf) ## vector operations
library(terra) ## raster operations
library(dplyr) ## general data handling
library(ggplot2) ## plotting 

We’ll load the species occurrence data and make it explicitly spatial.

x <- read.csv(file.path(system.file(package="dismo"), 'ex', 'bradypus.csv'))
x <- x[,c('lon', 'lat')]
x <- sf::st_as_sf(x, coords = c('lon', 'lat'), crs = 4326)

planar_proj <- 3857 # Web Mercator for planar distance calcs

data prep

Each of the environmental variable data sources needs some pre-processing before being fed into the IBR workflow; most of them just need to be converted from vector to raster formats - a process we call ‘rasterisation’. We will also rescale the tri rasters to put them on a similar numeric scale to the other datasets.

tri <- terra::rast(file.path(system.file(package = "safeHavens"), "extdata", "tri.tif"))
names(tri) <- 'tri'

rescale_rast <- function(r, new_min = 0, new_max = 1) {
  r_min <- global(r, "min", na.rm = TRUE)[[1]]
  r_max <- global(r, "max", na.rm = TRUE)[[1]]
  ((r - r_min) / (r_max - r_min)) * (new_max - new_min) + new_min
}

tri <- rescale_rast(tri, 0, 100)

lakes_v <- sf::st_read(
  file.path(system.file(package = "safeHavens"),  "extdata", "lakes.gpkg"),
    quiet = T) |>
  filter(TYPE == 'Lake') |> # remove reservoirs for our purposes (time scale)
  select(geometry = geom) |> 
  mutate(Value = 1) |> # this will be rasterized. 
  terra::vect()

ocean_v <- sf::st_read(
  file.path(system.file(package = "safeHavens"), "extdata", "oceans.gpkg"),
   quiet = T) |>
  select(geometry = geom) |>
  mutate(Value = 1) |>
  terra::vect()

rivers_v <- sf::st_read(
  file.path(system.file(package = "safeHavens"), "extdata", "rivers.gpkg"),
   quiet = T) |>
  select(geometry = geom) |>
  mutate(Value = 1) |>
  terra::vect()

We’ll convert each of the above spatVect objects to spatRasters and clip them to an analysis area around our species records.

x_buff <- sf::st_transform(x, planar_proj) |>
  # huge buffer for the bbox. 
  st_buffer(200000) |> 
  st_transform(crs(lakes_v)) |>
  st_as_sfc() |> 
  st_union() |>
  vect() |>
  ext()

lakes_v <- crop(lakes_v, x_buff)
rivers_v <- crop(rivers_v, x_buff)
ocean_v <- crop(ocean_v, x_buff)
tri <- crop(tri, x_buff)

Convert the vector data to raster format using terra’s rasterise function.

ocean_r <- rasterize(ocean_v, tri, field = 'Value', background = 0.1)
lakes_r <- rasterize(lakes_v, tri, field = 'Value',  background = 0.1)
rivers_r <- rasterize(rivers_v, tri, field = 'Value',  background = 0.1)

par(mfrow=c(2, 2))
plot(rivers_r)
plot(lakes_r)
plot(ocean_r)
plot(tri)

par(mfrow=c(1,1))

rm(ocean_v, lakes_v, rivers_v)

actual workflow.

The first of three functions in the Isolation by Resistance sampling workflow is used to create the resistance surface. It requires a template raster and the other rasters, which will be incorporated into the product. buildResistanceSurface essentially just requires the rasters we configured above and user-specified weights for each raster. These weights convey how difficult it is for a species to move across the landscape. For plants, oceans generally get very high weights, while lakes and rivers - still obstacles - receive lower weights. Terrain ruggedness values are much more likely to vary with both the species’ ecology and the landscape being sampled.

When considering weights, consider the influences of bird dispersal, land animal dispersal, and natural population growth/expansion. Tuning this part is challenging and often subjective. The values below are the functions’ defaults - to my eyes, they work OK for the landscape at hand; although an ecologist with experience in this region may beg to differ, and I would not argue with them.

res_surface <- buildResistanceSurface(
  base_raster = rast(ocean_r), # template

  oceans = ocean_r, # rasters
  lakes = lakes_r,
  rivers = rivers_r,
  tri = tri,

  w_ocean = 120, # weights to multiple  input raster values by -- 
  w_lakes = 50, # is 1 * 50
  w_rivers = 20, # is 1 *20
  w_tri = 4 # ranges from 1~30, so from (1-30)*4 up to a weight of ~120. 
)

plot(res_surface)

Note that the above object is stored in memory. I recommend using coarse raster surfaces (e.g., 4 km x 4 km cells or larger) at this level of generalisation and to speed up processing.

Now that we have our resistance surface, the next step is calculating the distances between areas across the species range using the function populationResistance. Similar to other functions in safeHavens we will not use population occurrence record data directly, as we know that spatial biases affect occurrence record distributions. Rather we will buffer (buffer_dist) the occurrence records, using a relevant planar_proj, and create a raster surface of their locations. From this surface we will sample n points (using terra::spatSample, with method = ‘spread’), and identify neighboring points using a graph_method, either delauney triangulation, which is useful when a very high n is required, or through a complete distance matrix. In my testing I greatly prefer results from ‘complete’ calculations, and these work with NbClust without having to be ordinated down into 2-dimensional space and give much better results.

This function returns a raster representing the buffered populations (species range in the domain), the points sampled by terra, the graph information, and a distance matrix containing the least-cost distances from the resistance surface.

pop_res_graphs <- populationResistance(
  base_raster = rast(ocean_r),
  populations_sf = x,
  n = 150,
  planar_proj = 3857,
  buffer_dist = 125000,
  resistance_surface = res_surface,
  graph_method = 'complete'
)

names(pop_res_graphs)
[1] "pop_raster"     "sampled_points" "spatial_graph"  "edge_list"     
[5] "ibr_matrix"    

Buffered locations appear as shown; IDs are meaningless.

plot(pop_res_graphs$pop_raster)

This function underperforms with user-prespecified n; passing min.nc to NbClust generally yields the analyst-specified value.

ibr_groups <- IBRSurface(
  base_raster = rast(ocean_r),
  resistance_surface = res_surface, 
  pop_raster = pop_res_graphs$pop_raster,
  pts_sf = pop_res_graphs$sampled_points, 
  ibr_matrix = pop_res_graphs$ibr_matrix, 
  fixedClusters = TRUE,
  n = 20,
  planar_proj = 3857
)

Clustering results are shown below.

classified_pts = ibr_groups$points
plot(res_surface)
points(
  classified_pts, 
  pch = as.numeric(as.factor(classified_pts$ID)),  # different symbol per ID
  col = rainbow(length(unique(classified_pts$ID)))[as.factor(classified_pts$ID)],
  cex = 2,
  lwd = 3
  )

Occupied area clusters are visualised below.

ggplot() +
  geom_sf(data = ibr_groups$geometry, aes(fill = factor(ID))) + 
  theme_minimal()

Finally, the surface can be sampled using PolygonBasedSample. Simply union the geometries to be the species range, and supply the groups to the x argument, in lieu of an ecoregion or pSTZ-type surface.

out <- PolygonBasedSample(
  x = st_union(ibr_groups$geometry),
  zones = ibr_groups$geometry, 
  zone_key = 'ID', 
  n  = 30
  )

ggplot(data = out) + 
  geom_sf(aes(fill = factor(allocation))) + 
  theme_minimal() + 
  labs(fill = 'Ct. collections\nper cluster') + 
  theme(legend.position = 'bottom')

Each cluster has a different number of the 20 assigned collections.

bonus: adding sdm as a resistance layer, and non-linear scaling.

A default argument to cluster IBR is the probability surface predictions from an SDM. Here we will add it to the surface.

We will also show another way to add the topographic ruggedness index that bypasses the assumed linear relationship in the weighting scheme.

sdm <- terra::rast(
  file.path(system.file(package="safeHavens"), 'extdata',  'SDM_thresholds.tif')
  )
plot(sdm['Predictions'])

We will have to reverse the predictions from this layer; currently, higher values indicate a more suitable habitat. We want higher values in the probability surface function to indicate greater resistance to movement.

inverted_sdm <- 1 - sdm['Predictions']
plot(inverted_sdm)

And indeed, we may actually just want to remove weights in a suitable habitat, or set them to an arbitrarily low value.

inverted_sdm <- terra::ifel(inverted_sdm < 0.5, 0.01, inverted_sdm)
plot(inverted_sdm)

And we will convert it to the same integer scale as the other predictors.

inverted_sdm = round(inverted_sdm * 100, 0)
plot(inverted_sdm)

Note that we will have to make sure our SDM aligns with the existing layers

inverted_sdm <- crop(inverted_sdm, ocean_r)
inverted_sdm <- resample(inverted_sdm, ocean_r)
inverted_sdm[is.na(inverted_sdm)] <- 99

plot(inverted_sdm)

non-linear weight

Not all environmental variables are expected to have linear effects on movement. Indeed, something like terrain ruggedness may have only a small effect until reaching an asymptote.

x <- 1:100
rs <- function(x, to = c(0, 100)) {
  rng <- range(x, na.rm = TRUE)
  to[1] + (x - rng[1]) / diff(rng) * diff(to)
}

# display some transformations. 
par(mfrow = c(2, 3), mar = c(4, 4, 2, 1))
plot(x, rs(sqrt(x)), type = "l", main = "sqrt(x)")
plot(x, rs(log1p(x)), type = "l", main = "log1p(x)")
plot(x, rs(asinh(x)), type = "l", main = "asinh(x)")
plot(x, rs(x^2), type = "l", main = "x^2: polynomial")
plot(x, rs(x^3), type = "l", main = "x^3: polynomial")
plot(x, rs(x^4), type = "l", main = "x^4 polynomial")

We will apply a fourth-order polynomial to the input tri data to accommodate a non-linear effect.

tri_rscl <- app(tri, function(x) { rs(x^2, to = c(1, 80)) })

par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))
plot(tri, main = 'Original')
plot(tri_rscl, main = 'Rescaled')

And we can create a new resistance surface

res_surface <- buildResistanceSurface(

  base_raster = rast(ocean_r), # template

  oceans = ocean_r, # rasters
  lakes = lakes_r,
  rivers = rivers_r,
  #tri = tri, # dont use me! i rescale. 
  habitat = inverted_sdm,
  addtl_r = tri_rscl,

  w_ocean = 120, # weights to multiple  input raster values by -- 
  w_lakes = 70, # is 1 * 50
  w_rivers = 40, # is 1 *20
  w_habitat = 0.5,
  #w_tri = 4 # don't use me! use addtl_wt
  addtl_w = 1
)
par(mfrow=c(1,1))
plot(res_surface)