Help for package glossa

Type:

Package

Title:

User-Friendly 'shiny' App for Bayesian Species Distribution Models

Version:

1.2.2

Description:

A user-friendly 'shiny' application for Bayesian machine learning analysis of marine species distributions. GLOSSA (Global Ocean Species Spatio-temporal Analysis) uses Bayesian Additive Regression Trees (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) to model species distributions with intuitive workflows for data upload, processing, model fitting, and result visualization. It supports presence-absence and presence-only data (with pseudo-absence generation), spatial thinning, cross-validation, and scenario-based projections. GLOSSA is designed to facilitate ecological research by providing easy-to-use tools for analyzing and visualizing marine species distributions across different spatial and temporal scales.

License:

GPL-3

URL:

https://github.com/iMARES-group/glossa, https://iMARES-group.github.io/glossa/

BugReports:

https://github.com/iMARES-group/glossa/issues

Depends:

bs4Dash, R (≥ 4.1.0), shiny

Imports:

automap, blockCV, dbarts, dplyr, DT, GeoThinneR, ggplot2, htmltools, leaflet, markdown, mcp, pROC, sf, shinyWidgets, sparkline, svglite, terra, tidyterra, waiter, zip

Suggests:

jsonlite, knitr, matrixStats, rmarkdown, testthat (≥ 3.0.0), tidyr, tidyverse

Config/testthat/edition:

Encoding:

UTF-8

Language:

en-US

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-07-15 10:57:36 UTC; jorge

Author:

Jorge Mestre-Tomás

[aut, cre], Alba Fuster-Alonso

[aut]

Maintainer:

Jorge Mestre-Tomás <jorge.mestre.tomas@csic.es>

Repository:

CRAN

Date/Publication:

2025-07-15 16:00:02 UTC

Enlarge/Buffer a Polygon

Description

This function enlarges a polygon by applying a buffer.

Usage

buffer_polygon(polygon, buffer_distance)

Arguments

polygon

An sf object representing the polygon to be buffered.

buffer_distance

Numeric. The buffer distance in decimal degrees (arc degrees).

Value

An sf object representing the buffered polygon.

Clean Coordinates of Presence/Absence Data

Description

This function cleans coordinates of presence/absence data by removing NA coordinates, rounding coordinates if specified, removing duplicated points, and removing points outside specified spatial polygon boundaries.

Usage

clean_coordinates(
  df,
  study_area,
  overlapping = FALSE,
  thinning_method = NULL,
  thinning_value = NULL,
  coords = c("decimalLongitude", "decimalLatitude"),
  by_timestamp = TRUE,
  seed = NULL
)

Arguments

df

A dataframe object with rows representing points. Coordinates are in WGS84 (EPSG:4326) coordinate system.

study_area

A spatial polygon in WGS84 (EPSG:4326) representing the boundaries within which coordinates should be kept.

overlapping

Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE).

thinning_method

Character; spatial thinning method to apply to occurrence data. Options are 'c("None", "Distance", "Grid", "Precision")'. See 'GeoThinneR' package for details.

thinning_value

Numeric; value used for thinning depending on the selected method: distance in meters ('Distance'), grid resolution in degrees ('Grid'), or decimal precision ('Precision').

coords

Character vector specifying the column names for longitude and latitude.

by_timestamp

If TRUE, clean coordinates taking into account different time periods defined in the column 'timestamp'.

seed

Optional; an integer seed for reproducibility of results.

Details

This function takes a data frame containing presence/absence data with longitude and latitude coordinates, a spatial polygon representing boundaries within which to keep points, and parameters for rounding coordinates and handling duplicated points. It returns a cleaned data frame with valid coordinates within the specified boundaries.

Value

A cleaned data frame containing presence/absence data with valid coordinates.

Continuous Boyce Index (CBI) with weighting

Description

This function is a copy from the 'contBoyce()' function from the 'enmSdm' R package. This function calculates the continuous Boyce index (CBI), a measure of model accuracy for presence-only test data. This version uses multiple, overlapping windows, in contrast to link{contBoyce2x}, which covers each point by at most two windows.

Usage

contBoyce(
  pres,
  contrast,
  presWeight = rep(1, length(pres)),
  contrastWeight = rep(1, length(contrast)),
  numBins = 101,
  binWidth = 0.1,
  autoWindow = TRUE,
  method = "spearman",
  dropZeros = TRUE,
  na.rm = FALSE,
  ...
)

Arguments

pres

Numeric vector. Predicted values at presence sites.

contrast

Numeric vector. Predicted values at background sites.

presWeight

Numeric vector same length as pres. Relative weights of presence sites. The default is to assign each presence a weight of 1.

contrastWeight

Numeric vector same length as contrast. Relative weights of background sites. The default is to assign each presence a weight of 1.

numBins

Positive integer. Number of (overlapping) bins into which to divide predictions.

binWidth

Positive numeric value < 1. Size of a bin. Each bin will be binWidth * (max - min). If autoWindow is FALSE (the default) then min is 0 and max is 1. If autoWindow is TRUE then min and max are the maximum and minimum value of all predictions in the background and presence sets (i.e., not necessarily 0 and 1).

autoWindow

Logical. If FALSE calculate bin boundaries starting at 0 and ending at 1 + epsilon (where epsilon is a very small number to assure inclusion of cases that equal 1 exactly). If TRUE (default) then calculate bin boundaries starting at minimum predicted value and ending at maximum predicted value.

method

Character. Type of correlation to calculate. The default is 'spearman', the Spearman rank correlation coefficient used by Boyce et al. (2002) and Hirzel et al. (2006), which is the "traditional" CBI. In contrast, 'pearson' or 'kendall' can be used instead. See cor for more details.

dropZeros

Logical. If TRUE then drop all bins in which the frequency of presences is 0.

na.rm

Logical. If TRUE then remove any presences and associated weights and background predictions and associated weights with NAs.

...

Other arguments (not used).

Details

CBI is the Spearman rank correlation coefficient between the proportion of sites in each prediction class and the expected proportion of predictions in each prediction class based on the proportion of the landscape that is in that class. The index ranges from -1 to 1. Values >0 indicate the model's output is positively correlated with the true probability of presence. Values <0 indicate it is negatively correlated with the true probability of presence.

Value

Numeric value.

Note

This function is directly copied from the 'enmSdm' package.

References

Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300. doi:10.1016/S0304-3800(02)00200-4

Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152. doi:10.1016/j.ecolmodel.2006.05.017

Create Geographic Coordinate Layers

Description

Generates raster layers for longitude and latitude from given raster data, applies optional scaling, and restricts the output to a specified spatial mask.

Usage

create_coords_layer(layers, study_area = NULL, scale_layers = FALSE)

Arguments

layers

Raster or stack of raster layers to derive geographic extent and resolution.

study_area

Spatial object for masking output layers.

scale_layers

Logical indicating if scaling is applied. Default is FALSE.

Value

Raster stack with layers lon and lat.

Cross-validation for BART model

Description

This function performs cross-validation for a Bayesian Additive Regression Trees (BART) model using presence-absence data and environmental covariate layers. It calculates various performance metrics for model evaluation.

Usage

cross_validate_model(data, folds, predictor_cols = NULL, seed = NULL)

Arguments

data

Data frame with a column (named 'pa') indicating presence (1) or absence (0) and columns for the predictor variables.

folds

A vector of fold assignments (same length as 'data').

predictor_cols

Optional; a character vector of column names to be used as predictors. If NULL, all columns except 'pa' will be used.

seed

Optional; random seed.

Value

A list with:

metrics: A data frame containing the true positives (TP), false positives (FP), false negatives (FN), true negatives (TN), and various performance metrics including precision (PREC), sensitivity (SEN), specificity (SPC), false discovery rate (FDR), negative predictive value (NPV), false negative rate (FNR), false positive rate (FPR), F-score, accuracy (ACC), balanced accuracy (BA), and true skill statistic (TSS) for each fold.
predictions: Data frame with observed, predicted, probability, and fold assignment per test instance.

Create a Download Action Button

Description

This function generates a download action button that triggers the download of a file when clicked.

Usage

downloadActionButton(
  outputId,
  label = "Download",
  icon = NULL,
  width = NULL,
  status = NULL,
  outline = FALSE,
  ...
)

Arguments

outputId

The output ID for the button.

label

The label text displayed on the button. Default is "Download".

icon

The icon to be displayed on the button. Default is NULL.

width

The width of the button. Default is NULL.

status

The status of the button. Default is NULL.

outline

Logical indicating whether to use outline style for the button. Default is FALSE.

...

Additional parameters to be passed to the actionButton function.

Value

Returns a download action button with the specified parameters.

Evaluation metrics for model predictions

Description

Computes a set of performance metrics (e.g., AUC, TSS, CBI) based on observed and predicted values.

Usage

evaluation_metrics(df, na.rm = TRUE, method = "spearman")

Arguments

df

A data.frame with columns: 'observed' (0/1), 'predicted' (0/1), 'probability' (numeric).

na.rm

Logical. Whether to remove rows with NA values.

method

Correlation method for CBI ("spearman", "pearson", or "kendall").

Value

A named list or data.frame with evaluation metrics.

Server Logic for Export Plot Functionality

Description

Sets up server-side functionality for exporting plots, including creating a modal dialog for user input on export preferences (height, width, format) and processing the download.

Usage

export_plot_server(id, exported_plot)

Value

No return value, this function is used for its side effects within a Shiny app.

Create UI for Export Plot Button

Description

This function generates a UI element (action button) for exporting plots. The button is styled to be minimalistic, featuring only a download icon.

Usage

export_plot_ui(id)

Value

Returns an actionButton for use in a Shiny UI that triggers plot export modal when clicked.

Extract Non-NA Covariate Values

Description

This function extracts covariate values for species occurrences, excluding NA values.

Usage

extract_noNA_cov_values(data, covariate_layers, predictor_variables)

Arguments

data

A data frame containing species occurrence data with columns x/long (first column) and y/lat (second column).

covariate_layers

A list of raster layers representing covariates.

predictor_variables

Variables to select from all the layers.

Details

This function extracts covariate values for each species occurrence location from the provided covariate layers. It returns a data frame containing species occurrence data with covariate values, excluding any NA values.

Value

A data frame containing species occurrence data with covariate values, excluding NA values.

Server-side Logic for Custom File Input

Description

Processes the file input from the UI component created by file_input_area_ui and provides access to the uploaded file data.

Usage

file_input_area_server(id)

Value

A reactive expression that returns a data frame containing information about the uploaded files, or NULL if no files have been uploaded.

Custom File Input UI

Description

Creates a customized file input area in a Shiny application. The file input is designed to be visually distinct and supports features such as multiple file selection and file type restrictions.

Usage

file_input_area_ui(
  id,
  label = "Input text: ",
  multiple = FALSE,
  accept = NULL,
  width = NULL,
  button_label = "Browse...",
  icon_name = NULL
)

Value

A Shiny UI object that can be added to a Shiny application.

Fit a BART Model Using Environmental Covariate Layers

Description

This function fits a Bayesian Additive Regression Trees (BART) model using presence/absence data and environmental covariate layers.

Usage

fit_bart_model(y, x, seed = NULL, ...)

Arguments

y

A numeric vector indicating presence (1) or absence (0).

x

A data frame with the same number of rows as the length of the vector 'y', containing the covariate values.

seed

An optional integer value for setting the random seed for reproducibility.

...

Additional arguments passed to 'dbarts::bart()'.

Value

A BART model object.

Generate cross-validation folds

Description

Creates cross-validation fold assignments for presence-absence or presence-only data, supporting three types of strategies: k-fold, spatial blocks (through blockCV R package), and temporal blocks.

Usage

generate_cv_folds(
  data,
  method = "k-fold",
  block_method = "predictors_autocorrelation",
  block_size = NULL,
  k = 10,
  predictor_raster = NULL,
  model_residuals = NULL,
  coords = c("decimalLongitude", "decimalLatitude")
)

Arguments

data

A 'data.frame' with at least presence-absence data ('pa'), coordinates, and optionally a 'timestamp'.

method

The cross-validation strategy. One of: '"k-fold"', '"spatial_blocks"', '"temporal_blocks"'.

block_method

For spatial blocks, how to determine block size. One of: '"residuals_autocorrelation"', '"predictors_autocorrelation"', '"manual"'.

block_size

Numeric. Manual block size in meters (used if 'block_method = "manual"').

k

Integer. Number of folds to generate.

predictor_raster

A 'terra::SpatRaster' used for estimating spatial autocorrelation (only needed if 'block_method = "predictors_autocorrelation"').

model_residuals

A 'data.frame' with residuals and coordinates (only needed if 'block_method = "residuals_autocorrelation"').

coords

A character vector of length 2 indicating the longitude and latitude column names.

Value

A list with the following elements:

folds: A vector of fold assignments (one per row in 'data').
method: The CV method used.
block_method: The spatial block size method (if applicable).
block_size: The estimated or manual block size (in meters), if spatial blocks were used.

Generate Pseudo-Absences Using Buffer-Out Strategy

Description

This function generates pseudo-absences outside a buffer around presence points but within the convex hull of those points. This prevents spatial overlap while preserving geographic realism.

Usage

generate_pa_buffer_out(
  presences,
  raster_stack,
  predictor_variables,
  coords = c("decimalLongitude", "decimalLatitude"),
  pa_buffer_distance = 0.5,
  ratio = 1,
  attempts = 100,
  seed = NULL
)

Arguments

presences

Data frame containing presence points.

raster_stack

'SpatRaster' object containing covariate data.

predictor_variables

Character vector of the predictor variables selected for this species.

coords

Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'.

pa_buffer_distance

Numeric; buffer radius in degrees around each presence. Default is 0.5.

ratio

Ratio of pseudo-absences to presences (default 1 = balanced).

attempts

Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100.

seed

Optional seed for reproducibility.

Value

A data frame of pseudo-absences with coordinates, timestamp, 'pa = 0', and covariate values.

Generate Random Pseudo-Absences

Description

This function generates pseudo-absence points randomly across the study area (random background), optionally applying spatial thinning to match presence filtering strategy.

Usage

generate_pa_random(
  presences,
  study_area,
  raster_stack,
  predictor_variables,
  coords = c("decimalLongitude", "decimalLatitude"),
  ratio = 1,
  attempts = 100,
  seed = NULL
)

Arguments

presences

Data frame containing presence points.

study_area

Spatial polygon defining the study area ('sf' object).

raster_stack

'SpatRaster' object containing covariate data.

predictor_variables

Character vector of the predictor variables selected for this species.

coords

Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'.

ratio

Ratio of pseudo-absences to presences (default 1 = balanced).

attempts

Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100.

seed

Optional random seed.

Value

Data frame containing pseudo-absence points with coordinates, timestamp, pa = 0, and covariates.

Generate Pseudo-Absences Using Target-Group Background

Description

Generate Pseudo-Absences Using Target-Group Background

Usage

generate_pa_target_group(
  presences,
  target_group_points,
  study_area,
  raster_stack,
  predictor_variables,
  coords = c("decimalLongitude", "decimalLatitude"),
  ratio = 1,
  attempts = 100,
  seed = NULL
)

Arguments

presences

Data frame containing presence points.

target_group_points

Data frame of all sampling locations (target group).

study_area

Spatial polygon defining the study area ('sf' object).

raster_stack

'SpatRaster' object containing covariate data.

predictor_variables

Character vector of the predictor variables selected for this species.

coords

Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'.

ratio

Ratio of pseudo-absences to presences (default 1 = balanced).

attempts

Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100.

seed

Optional random seed.

Value

Data frame containing pseudo-absence points with coordinates, timestamp, pa = 0, and covariates.

Generate Prediction Plot

Description

This function generates a plot based on prediction raster layers and presence/absence points.

Usage

generate_prediction_plot(
  prediction_layer,
  pa_points,
  legend_label,
  non_study_area_mask,
  coords
)

Arguments

prediction_layer

Raster prediction layer.

pa_points

Presence/absence points.

legend_label

Label for the legend.

non_study_area_mask

Spatial polygon representing the non study areas.

Value

Returns a ggplot object representing the world prediction plot.

Generate Pseudo-Absence Points Using Different Methods Based on Presence Points, Covariates, and Study Area Polygon

Description

Wrapper function for pseudo-absence generation methods, such as background random points, target-group, and using buffer area.

Usage

generate_pseudo_absences(
  method = c("random", "target_group", "buffer_out"),
  presences,
  raster_stack,
  predictor_variables,
  study_area = NULL,
  target_group_points = NULL,
  coords = c("decimalLongitude", "decimalLatitude"),
  pa_buffer_distance = 0.5,
  ratio = 1,
  attempts = 100,
  seed = NULL
)

Arguments

method

Character; one of "random", "target_group", or "buffer_out".

presences

Data frame of presence points with coordinates and timestamp.

raster_stack

SpatRaster of covariates.

predictor_variables

Character vector of selected predictors.

study_area

Optional sf polygon (used for clipping).

target_group_points

Optional data frame of sampling points (for target-group).

coords

Vector of coordinate column names.

pa_buffer_distance

Numeric; buffer radius in degrees around each presence. Default is 0.5.

ratio

Ratio of pseudo-absences to presences.

attempts

Max attempts to fulfill sample size.

seed

Optional seed for reproducibility.

Value

A data frame of pseudo-absence points (pa = 0) with covariates.

Compute specificity and sensitivity

Description

Compute specificity and sensitivity

Usage

getFprTpr(actuals, predictedScores, threshold = 0.5)

Arguments

actuals

The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'.

predictedScores

The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's.

threshold

If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5.

Details

This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).

Value

A list with two elements: fpr (false positive rate) and tpr (true positive rate).

Get Covariate Names

Description

This function extracts the names of covariates from a ZIP file containing covariate layers.

Usage

get_covariate_names(file_path)

Arguments

file_path

Path to the ZIP file containing covariate layers.

Details

This function extracts the names of covariates from a ZIP file containing covariate layers.

Value

A character vector containing the names of covariates.

Main Analysis Function for GLOSSA Package

Description

This function wraps all the analysis that the GLOSSA package performs. It processes presence-absence data, environmental covariates, and performs species distribution modeling and projections under past and future scenarios.

Usage

glossa_analysis(
  pa_data = NULL,
  fit_layers = NULL,
  proj_files = NULL,
  study_area_poly = NULL,
  predictor_variables = NULL,
  thinning_method = NULL,
  thinning_value = NULL,
  scale_layers = FALSE,
  buffer = NULL,
  native_range = NULL,
  suitable_habitat = NULL,
  other_analysis = NULL,
  model_args = list(),
  cv_methods = NULL,
  cv_folds = 5,
  cv_block_source = "residuals_autocorrelation",
  cv_block_size = NULL,
  pseudoabsence_method = "random",
  pa_ratio = 1,
  target_group_points = NULL,
  pa_buffer_distance = NULL,
  seed = NA,
  waiter = NULL
)

Arguments

pa_data

A list of data frames containing presence-absence data including 'decimalLongitude', 'decimalLatitude', 'timestamp', and 'pa' columns.

fit_layers

A ZIP file with the raster files containing model fitting environmental layers formatted as explained in the website documentation.

proj_files

A list of ZIP file paths containing environmental layers for projection scenarios.

study_area_poly

A spatial polygon defining the study area.

predictor_variables

A list of the predictor variables to be used in the analysis for each occurrence dataset.

thinning_method

A character specifying the spatial thinning method to apply to occurrence data. Options are 'c("none", "distance", "grid", "precision")'. See 'GeoThinneR' package for details.

thinning_value

A numeric value used for thinning depending on the selected method: distance in meters ('distance'), grid resolution in degrees ('grid'), or decimal precision ('precision').

scale_layers

Logical; if 'TRUE', covariate layers will be standardize (z-score) based on fit layers.

buffer

Buffer value or distance in decimal degrees (arc_degrees) for buffering the study area polygon.

native_range

A vector of scenarios ‘c(’fit_layers', 'projections')' where native range modeling should be performed.

suitable_habitat

A vector of scenarios ‘c(’fit_layers', 'projections')' where habitat suitability modeling should be performed.

other_analysis

A vector of additional analyses to perform (e.g., ''variable_importance', 'functional_responses', 'cross_validation'').

model_args

A named list of additional arguments passed to the modeling function (e.g., 'dbarts::bart'). This allows users to fine-tune model parameters such as 'ntree' or 'k'. These are passed internally via '...' and must match the arguments of the selected model function.

cv_methods

A vector of the cross-validation strategies to perform. One or multiple of '"k-fold"', '"spatial_blocks"', '"temporal_blocks"'.

cv_folds

Integer indicating the number of folds to generate.

cv_block_source

For spatial blocks, how to determine block size. One of: '"residuals_autocorrelation"', '"predictors_autocorrelation"', '"manual"'.

cv_block_size

Numeric block size in meters (used if 'cv_block_source = "manual"').

pseudoabsence_method

Method for generating pseudo-absences. One of "random", "target_group", or "buffer_out".

pa_ratio

Ratio of pseudo-absences to presences (pseudo-absence:presences).

target_group_points

Optional data frame for sampling points for target-group method.

pa_buffer_distance

Numeric buffer radius in degrees around each presence. Default is NULL.

seed

Optional; an integer seed for reproducibility of results.

waiter

Optional; a waiter instance to update progress in a Shiny application.

Value

A list containing structured outputs from each major section of the analysis, including model data, projections, variable importance scores, and habitat suitability assessments.

Export Glossa Model Results

Description

This function exports various types of Glossa model results, including native range predictions, suitable habitat predictions, model data, variable importance, functional response results, and presence/absence probability cutoffs. It generates raster files for prediction results, TSV files for model data and variable importance, and TSV files for functional response results. Additionally, it creates a TSV file for presence/absence probability cutoffs if provided.

Usage

glossa_export(
  species = NULL,
  models = NULL,
  layer_results = NULL,
  fields = NULL,
  model_data = FALSE,
  model_summary = FALSE,
  fr = FALSE,
  prob_cut = FALSE,
  varimp = FALSE,
  cross_val = FALSE,
  layer_format = "tif",
  projections_results = NULL,
  presence_absence_list = NULL,
  other_results = NULL,
  pa_cutoff = NULL,
  config_snapshot = NULL
)

Arguments

species

A character vector specifying the species names.

models

A character vector specifying the types of models to export results for.

layer_results

A list containing layer results for native range and suitable habitat predictions.

fields

A character vector specifying the fields to include in the exported results.

model_data

Logical, indicating whether to export model data.

fr

Logical, indicating whether to export functional response results.

prob_cut

Logical, indicating whether to export presence/absence probability cutoffs.

varimp

Logical, indicating whether to export variable importance.

cross_val

Logical, indicating whether to export cross-validation metrics.

layer_format

A character vector specifying the format of the exported raster files.

projections_results

A list containing projections results.

presence_absence_list

A list containing presence/absence lists.

other_results

A list containing other types of results (e.g., variable importance, functional responses, cross-validation).

pa_cutoff

A list containing presence/absence probability cutoffs.

Value

A character vector of file paths for the exported files or directories.

Invert a Polygon

Description

This function inverts a polygon by calculating the difference between the bounding box and the polygon.

Usage

invert_polygon(polygon, bbox = NULL)

Arguments

polygon

An sf object representing the polygon to be inverted.

bbox

Optional. An sf or bbox object representing the bounding box. If NULL, the bounding box of the input polygon is used.

Value

An sf object representing the inverted polygon.

Apply Polygon Mask to Raster Layers

Description

This function crops and extends raster layers to a study area extent (bbox) defined by longitude and latitude then applies a mask based on a provided spatial polygon to remove areas outside the polygon.

Usage

layer_mask(layers, study_area)

Arguments

layers

A stack of raster layers ('SpatRaster' object) to be processed.

study_area

A spatial polygon ('sf' object) used to mask the raster layers.

Value

A 'SpatRaster' object representing the masked raster layers.

Misclassification Error

Description

Misclassification Error

Usage

misClassError(actuals, predictedScores, threshold = 0.5)

Arguments

actuals

The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'.

predictedScores

The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's.

threshold

If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5.

Details

This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).

Value

The misclassification error, which tells what proportion of predicted direction did not match with the actuals.

Compute the optimal probability cutoff score

Description

Compute the optimal probability cutoff score

Usage

optimalCutoff(
  actuals,
  predictedScores,
  optimiseFor = "misclasserror",
  returnDiagnostics = FALSE
)

Arguments

actuals

The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'.

predictedScores

The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's.

optimiseFor

The maximization criterion for which probability cutoff score needs to be optimised. Can take either of following values: "Ones" or "Zeros" or "Both" or "misclasserror"(default). If "Ones" is used, 'optimalCutoff' will be chosen to maximise detection of "One's". If 'Both' is specified, the probability cut-off that gives maximum Youden's Index is chosen. If 'misclasserror' is specified, the probability cut-off that gives minimum mis-classification error is chosen.

returnDiagnostics

If TRUE, would return additional diagnostics such as 'sensitivityTable', 'misclassificationError', 'TPR', 'FPR' and 'specificity' for the chosen cut-off.

Details

This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).

Value

The optimal probability score cutoff that maximises a given criterion. If 'returnDiagnostics' is TRUE, then the following items are returned in a list:

Optimal Cutoff for Presence-Absence Prediction

Description

This function calculates the optimal cutoff for presence-absence prediction using a BART model.

Usage

pa_optimal_cutoff(y, x, model, seed = NULL)

Arguments

y

Vector indicating presence (1) or absence (0).

x

Dataframe with same number of rows as the length of the vector 'y' with the covariate values.

model

A BART model object.

seed

Random seed for reproducibility.

Value

The optimal cutoff value for presence-absence prediction.

Plot cross-validation fold assignments

Description

Plot cross-validation fold assignments

Usage

plot_cv_folds_points(data, polygon = NULL)

Arguments

data

Dataframe with columns: 'decimalLongitude', 'decimalLatitude', 'pa' and 'fold'.

polygon

An sf object representing the inverted study area.

Value

A ggplot object showing point color-coded by cv fold and shaped by presence/absence.

Plot cross-validation metrics

Description

This function generates a cross-validation radial plot based on evaluation metrics.

Usage

plot_cv_metrics(data)

Arguments

data

Dataframe containing cross-validation results.

Value

Returns a ggplot object representing the cross-validation plot.

Make Predictions Using a BART Model

Description

This function makes predictions using a Bayesian Additive Regression Trees (BART) model on a stack of environmental covariates ('SpatRaster').

Usage

predict_bart(bart_model, layers, cutoff = NULL)

Arguments

bart_model

A BART model object obtained from fitting BART using the 'dbarts' package.

layers

A SpatRaster object containing environmental covariates for prediction.

cutoff

An optional numeric cutoff value for determining potential presences. If NULL, potential presences and absences will not be computed.

Value

A SpatRaster containing the mean, median, standard deviation, and quantiles of the posterior predictive distribution, as well as a potential presences layer if cutoff is provided.

Read and Validate Extent Polygon

Description

This function reads and validates a polygon file containing the extent. It checks if the file has the correct format and extracts the geometry.

Usage

read_extent_polygon(file_path, show_modal = FALSE)

Arguments

file_path

Path to the polygon file containing the extent.

show_modal

Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE.

Value

A spatial object representing the extent if the file is valid, NULL otherwise.

Load Covariate Layers from ZIP Files

Description

This function loads covariate layers from a ZIP file, verifies their spatial characteristics, and returns them as a list of raster layers.

Usage

read_layers_zip(
  file_path,
  extend = TRUE,
  first_layer = FALSE,
  show_modal = FALSE
)

Arguments

file_path

Path to the ZIP file containing covariate layers.

extend

If TRUE it will take the largest extent, if FALSE the smallest.

first_layer

If TRUE it will return only the layers from the first timestamp.

show_modal

Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE.

Value

A list containing raster layers for each covariate.

Read and validate presences/absences CSV file

Description

This function reads and validates a CSV file containing presences and absences data for species occurrences. It checks if the file has the expected columns and formats.

Usage

read_presences_absences_csv(
  file_path,
  file_name = NULL,
  show_modal = FALSE,
  timestamp_mapping = NULL,
  coords = c("decimalLongitude", "decimalLatitude"),
  sep = "\t",
  dec = "."
)

Arguments

file_path

The file path to the CSV file.

file_name

Optional. The name of the file. If not provided, the base name of the file path is used.

show_modal

Optional. Logical. Whether to show a modal notification for warnings (use in Shiny). Default is FALSE.

timestamp_mapping

Optional. A vector with the timestamp mapping of the environmental layers.

coords

Optional. Character vector of length 2 specifying the names of the columns containing the longitude and latitude coordinates. Default is c("decimalLongitude", "decimalLatitude").

sep

Optional. The field separator character. Default is tab-separated.

dec

Optional. The decimal point character. Default is ".".

Value

A data frame with the validated data if the file has the expected columns and formats, NULL otherwise.

Remove Duplicated Points from a Dataframe

Description

This function removes duplicated points from a dataframe based on specified coordinate columns.

Usage

remove_duplicate_points(df, coords = c("decimalLongitude", "decimalLatitude"))

Arguments

df

A dataframe object with each row representing one point.

coords

A character vector specifying the names of the coordinate columns used for identifying duplicate points. Default is c("decimalLongitude", "decimalLatitude").

Value

A dataframe without duplicated points.

Remove Points Inside or Outside a Polygon

Description

This function removes points from a dataframe based on their location relative to a specified polygon.

Usage

remove_points_polygon(
  df,
  polygon,
  overlapping = FALSE,
  coords = c("decimalLongitude", "decimalLatitude")
)

Arguments

df

A dataframe object with rows representing points.

polygon

An sf polygon object defining the region for point removal.

overlapping

Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE).

coords

Character vector specifying the column names for longitude and latitude. Default is c("decimalLongitude", "decimalLatitude").

Value

A dataframe containing the filtered points.

Calculate Response Curve Using BART Model

Description

This function calculates the response curve (functional responses) using a Bayesian Additive Regression Trees (BART) model.

Usage

response_curve_bart(bart_model, data, predictor_names)

Arguments

bart_model

A BART model object obtained from fitting BART ('dbarts::bart').

data

A data frame containing the predictor variables (the design matrix) used in the BART model.

predictor_names

A character vector containing the names of the predictor variables.

Value

A list containing a data frame for each independent variable with mean, 2.5th percentile, 97.5th percentile, and corresponding values of the variables.

Run GLOSSA Shiny App

Description

This function launches the GLOSSA Shiny web application.

Usage

run_glossa(
  request_size_mb = 2000,
  launch.browser = TRUE,
  port = getOption("shiny.port"),
  clear_global_env = FALSE
)

Arguments

request_size_mb

Maximum request size for file uploads, in megabytes. Default is 2000 MB.

launch.browser

Logical indicating whether to launch the app in the browser (default is TRUE).

port

Port number for the Shiny app. Uses the port specified by 'getOption("shiny.port")' by default.

clear_global_env

Logical. If TRUE, clears the global environment after the app exits.

Details

The GLOSSA Shiny app provides an interactive interface for users to access GLOSSA functionalities.

Value

No return value, called to launch the GLOSSA app.

Note

Use 'clear_global_env = TRUE' cautiously, as it removes all objects from your R environment after the app exits.

Examples

if(interactive()) {
run_glossa()
run_glossa(clear_global_env = TRUE)  # clears all global objects
}

Calculate the sensitivity for a given logit model

Description

Calculate the sensitivity for a given logit model

Usage

sensitivity(actuals, predictedScores, threshold = 0.5)

Arguments

actuals

The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'.

predictedScores

The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's.

threshold

If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5.

Details

This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).

Value

The sensitivity of the given binary response actuals and predicted probability scores, which is, the number of observations with the event AND predicted to have the event divided by the number of observations with the event.

Create a Sparkline Value Box

Description

This function creates a custom value box with a sparkline plot embedded in it.

Usage

sparkvalueBox(
  title,
  sparkline_data,
  description,
  type = "line",
  box_color = "white",
  width = 4,
  elevation = 0,
  ...
)

Arguments

title

The title or heading of the value box.

sparkline_data

The data used to generate the sparkline plot.

description

A short description or additional information displayed below the value box.

type

The type of sparkline plot to generate. Default is "line".

box_color

The background color of the value box.

width

The width of the value box. Default is 4.

elevation

The elevation of the value box. Default is 0.

...

Additional parameters to be passed to the sparkline function.

Value

Returns a custom value box with the specified parameters.

Calculate the specificity for a given logit model

Description

Calculate the specificity for a given logit model

Usage

specificity(actuals, predictedScores, threshold = 0.5)

Arguments

actuals

The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'.

predictedScores

The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's.

threshold

If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5.

Details

This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).

Value

The specificity of the given binary response actuals and predicted probability scores, which is, the number of observations without the event AND predicted to not have the event divided by the number of observations without the event.

Validate Fit and Projection Layers

Description

This function validates fit and projection layers by checking their covariates.

Usage

validate_fit_projection_layers(
  fit_layers_path,
  proj_layers_path,
  show_modal = FALSE
)

Arguments

fit_layers_path

Path to the ZIP file containing fit layers.

proj_layers_path

Path to the ZIP file containing projection layers.

show_modal

Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE.

Value

TRUE if the layers pass validation criteria, FALSE otherwise.

Validate Layers Zip

Description

This function validates a ZIP file containing environmental layers. It checks if the layers have the same number of files, CRS (Coordinate Reference System), and resolution.

Usage

validate_layers_zip(file_path, timestamp_mapping = NULL, show_modal = FALSE)

Arguments

file_path

Path to the ZIP file containing environmental layers.

timestamp_mapping

Optional. A vector with the timestamp mapping of the environmental layers.

show_modal

Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE.

Value

TRUE if the layers pass validation criteria, FALSE otherwise.

Validate Match Between Presence/Absence Files and Fit Layers

Description

This function validates whether the time periods of the presence/absence data match the environmental layers.

Usage

validate_pa_fit_time(pa_data, fit_layers_path, show_modal = FALSE)

Arguments

pa_data

Data frame containing the presence/absence data with a 'timestamp' column.

fit_layers_path

Path to the ZIP file containing fit layers.

show_modal

Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE.

Value

TRUE if the timestamps match the fit layers, FALSE otherwise.

Variable Importance in BART Model

Description

This function computes the variable importance scores for a fitted BART (Bayesian Additive Regression Trees) model using a permutation-based approach. It measures the impact of each predictor variable on the model's performance by permuting the values of that variable and evaluating the change in performance (F-score is the performance metric).

Usage

variable_importance(bart_model, y, x, cutoff = 0, n_repeats = 10, seed = NULL)

Arguments

bart_model

A BART model object.

y

Vector indicating presence (1) or absence (0).

x

Dataframe with same number of rows as the length of the vector 'y' with the covariate values.

cutoff

A numeric threshold for converting predicted probabilities into presence-absence.

n_repeats

An integer indicating the number of times to repeat the permutation for each variable.

seed

An optional seed for random number generation.

Value

A data frame where each column corresponds to a predictor variable, and each row contains the variable importance scores across permutations.

Calculate Youden's index

Description

Calculate Youden's index

Usage

youdensIndex(actuals, predictedScores, threshold = 0.5)

Arguments

actuals

The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'.

predictedScores

The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's.

threshold

If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5.

Details

This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).

Value

The youdensIndex of the given binary response actuals and predicted probability scores, which is calculated as Sensitivity + Specificity - 1