
Cluster environmental space using posterior beta draws
Source:R/PosteriorCluster.R
PosteriorCluster.RdImplements option D of the Bayesian clustering workflow: draws K sets of beta coefficients from the posterior, produces a weighted environmental raster for each draw, samples points from each, and records how often pairs of sample points are assigned to the same cluster. The resulting co-occurrence probability matrix is then used to derive a consensus clustering.
Cells that co-cluster with probability near 1 across all draws are robustly
similar in environmental space. Cells near boundaries that shift across draws
— where the model is uncertain which variables matter most — will have
intermediate co-occurrence probability. This boundary-instability surface is
returned separately as StabilityRaster and is directly meaningful for
targeting collection effort.
Usage
PosteriorCluster(
model,
predictors,
f_rasts,
pred_mat,
training_data,
n_draws = 100,
n = 10,
n_pts = 500,
lyr = "occurrence_prob_mean",
planar_proj,
coord_wt = 2.5,
consensus_method = c("hierarchical", "pam"),
beta_draws = NULL,
seed = 42
)Arguments
- model
A
brmsfitobject frombayesianSDM().- predictors
A
SpatRasterstack ($PredictorsfrombayesianSDM()).- f_rasts
The rasters output from
bayesianSDM(), used to define the spatial mask and domain. Specifically uses$RasterPredictions.- pred_mat
Data frame or matrix (
$PredictMatrixfrombayesianSDM()).- training_data
sfobject ($TrainDatafrombayesianSDM()).- n_draws
Integer. Number of posterior beta draws to cluster over. Defaults to
100. Values between 50–200 give stable co-occurrence matrices for typical SDM datasets (< 2000 obs, < 20 vars).- n
Integer. Number of clusters per draw. Passed to
EnvironmentalBasedSample()internals.- n_pts
Integer. Points to sample per draw for clustering. Defaults to
500. The same points are used across all draws (fixed spatial frame) so co-occurrence is directly comparable.- lyr
Character. Layer name in
f_raststo use as the spatial domain mask. Defaults to"Threshold"(thePostProcessSDM()output name).- planar_proj
Numeric EPSG or proj4 string for planar projection.
- coord_wt
Numeric. Coordinate weight, as in
EnvironmentalBasedSample(). Defaults to2.5.- consensus_method
Character. How to derive the final clustering from the co-occurrence matrix. One of:
"hierarchical"Average-linkage hierarchical clustering on
(1 - co_occurrence)as a dissimilarity. Stable and interpretable. Default."pam"Partitioning Around Medoids on the dissimilarity matrix. More robust to outliers than hierarchical.
- beta_draws
Optional matrix of pre-drawn posterior samples (rows = draws, cols = parameters, as from
brms::as_draws_matrix()). IfNULL, draws are taken internally. Supply this to reuse draws across multiple calls.- seed
Integer. Random seed. Defaults to
42.
Value
A list:
Geometrysfpolygon layer of consensus clusters (compatible withEnvironmentalBasedSample()$Geometryoutput).CoOccurrenceMatrixNumeric matrix (n_pts × n_pts). Each tupple is the proportion of draws in which points i and j were assigned to the same cluster.
StabilityRasterSpatRasterof per-cell cluster-assignment stability, predicted via KNN regression from the sample point stability scores. Values in (0,1); low values flag environmentally ambiguous boundary regions.ConsensusRasterSpatRasterof final consensus (rank1) cluster IDs, geographically reordered.Rank2RasterSpatRasterof second-most-frequent cluster assignments across draws. Where a point was always assigned to one cluster, rank2 = rank1.Rank3RasterSpatRasterof third-most-frequent cluster assignments. Where fewer than 3 unique clusters were assigned, rank3 = rank1.Top3LookupData frame with columns:
point_id,x,y,rank1_cluster,rank1_pct,rank2_cluster,rank2_pct,rank3_cluster,rank3_pct,uncertainty(= 100 - rank1_pct). Tabular summary of cluster assignment frequencies at sample points.DrawClusteringsInteger matrix (n_pts × n_draws) of per-draw cluster assignments. Retained for diagnostics.
SamplePointssfobject of the fixed sample points used.KNN_ClusterTrained KNN model for rank1 cluster assignment. Can be reused to classify new points.
KNN_Rank2Trained KNN model for rank2 clusters.
KNN_Rank3Trained KNN model for rank3 clusters.
KNN_StabilityTrained KNN regression model for stability scores. Can be reused to predict stability at new locations.