Type: | Package |
Title: | User-Friendly 'shiny' App for Bayesian Species Distribution Models |
Version: | 1.2.2 |
Description: | A user-friendly 'shiny' application for Bayesian machine learning analysis of marine species distributions. GLOSSA (Global Ocean Species Spatio-temporal Analysis) uses Bayesian Additive Regression Trees (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) to model species distributions with intuitive workflows for data upload, processing, model fitting, and result visualization. It supports presence-absence and presence-only data (with pseudo-absence generation), spatial thinning, cross-validation, and scenario-based projections. GLOSSA is designed to facilitate ecological research by providing easy-to-use tools for analyzing and visualizing marine species distributions across different spatial and temporal scales. |
License: | GPL-3 |
URL: | https://github.com/iMARES-group/glossa, https://iMARES-group.github.io/glossa/ |
BugReports: | https://github.com/iMARES-group/glossa/issues |
Depends: | bs4Dash, R (≥ 4.1.0), shiny |
Imports: | automap, blockCV, dbarts, dplyr, DT, GeoThinneR, ggplot2, htmltools, leaflet, markdown, mcp, pROC, sf, shinyWidgets, sparkline, svglite, terra, tidyterra, waiter, zip |
Suggests: | jsonlite, knitr, matrixStats, rmarkdown, testthat (≥ 3.0.0), tidyr, tidyverse |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-07-15 10:57:36 UTC; jorge |
Author: | Jorge Mestre-Tomás
|
Maintainer: | Jorge Mestre-Tomás <jorge.mestre.tomas@csic.es> |
Repository: | CRAN |
Date/Publication: | 2025-07-15 16:00:02 UTC |
Enlarge/Buffer a Polygon
Description
This function enlarges a polygon by applying a buffer.
Usage
buffer_polygon(polygon, buffer_distance)
Arguments
polygon |
An sf object representing the polygon to be buffered. |
buffer_distance |
Numeric. The buffer distance in decimal degrees (arc degrees). |
Value
An sf object representing the buffered polygon.
Clean Coordinates of Presence/Absence Data
Description
This function cleans coordinates of presence/absence data by removing NA coordinates, rounding coordinates if specified, removing duplicated points, and removing points outside specified spatial polygon boundaries.
Usage
clean_coordinates(
df,
study_area,
overlapping = FALSE,
thinning_method = NULL,
thinning_value = NULL,
coords = c("decimalLongitude", "decimalLatitude"),
by_timestamp = TRUE,
seed = NULL
)
Arguments
df |
A dataframe object with rows representing points. Coordinates are in WGS84 (EPSG:4326) coordinate system. |
study_area |
A spatial polygon in WGS84 (EPSG:4326) representing the boundaries within which coordinates should be kept. |
overlapping |
Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE). |
thinning_method |
Character; spatial thinning method to apply to occurrence data. Options are 'c("None", "Distance", "Grid", "Precision")'. See 'GeoThinneR' package for details. |
thinning_value |
Numeric; value used for thinning depending on the selected method: distance in meters ('Distance'), grid resolution in degrees ('Grid'), or decimal precision ('Precision'). |
coords |
Character vector specifying the column names for longitude and latitude. |
by_timestamp |
If TRUE, clean coordinates taking into account different time periods defined in the column 'timestamp'. |
seed |
Optional; an integer seed for reproducibility of results. |
Details
This function takes a data frame containing presence/absence data with longitude and latitude coordinates, a spatial polygon representing boundaries within which to keep points, and parameters for rounding coordinates and handling duplicated points. It returns a cleaned data frame with valid coordinates within the specified boundaries.
Value
A cleaned data frame containing presence/absence data with valid coordinates.
Continuous Boyce Index (CBI) with weighting
Description
This function is a copy from the 'contBoyce()' function from the 'enmSdm' R package.
This function calculates the continuous Boyce index (CBI), a measure of model accuracy for presence-only test data. This version uses multiple, overlapping windows, in contrast to link{contBoyce2x}
, which covers each point by at most two windows.
Usage
contBoyce(
pres,
contrast,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
numBins = 101,
binWidth = 0.1,
autoWindow = TRUE,
method = "spearman",
dropZeros = TRUE,
na.rm = FALSE,
...
)
Arguments
pres |
Numeric vector. Predicted values at presence sites. |
contrast |
Numeric vector. Predicted values at background sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
numBins |
Positive integer. Number of (overlapping) bins into which to divide predictions. |
binWidth |
Positive numeric value < 1. Size of a bin. Each bin will be |
autoWindow |
Logical. If |
method |
Character. Type of correlation to calculate. The default is |
dropZeros |
Logical. If |
na.rm |
Logical. If |
... |
Other arguments (not used). |
Details
CBI is the Spearman rank correlation coefficient between the proportion of sites in each prediction class and the expected proportion of predictions in each prediction class based on the proportion of the landscape that is in that class. The index ranges from -1 to 1. Values >0 indicate the model's output is positively correlated with the true probability of presence. Values <0 indicate it is negatively correlated with the true probability of presence.
Value
Numeric value.
Note
This function is directly copied from the 'enmSdm' package.
References
Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300. doi:10.1016/S0304-3800(02)00200-4
Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152. doi:10.1016/j.ecolmodel.2006.05.017
Create Geographic Coordinate Layers
Description
Generates raster layers for longitude and latitude from given raster data, applies optional scaling, and restricts the output to a specified spatial mask.
Usage
create_coords_layer(layers, study_area = NULL, scale_layers = FALSE)
Arguments
layers |
Raster or stack of raster layers to derive geographic extent and resolution. |
study_area |
Spatial object for masking output layers. |
scale_layers |
Logical indicating if scaling is applied. Default is FALSE. |
Value
Raster stack with layers lon and lat.
Cross-validation for BART model
Description
This function performs cross-validation for a Bayesian Additive Regression Trees (BART) model using presence-absence data and environmental covariate layers. It calculates various performance metrics for model evaluation.
Usage
cross_validate_model(data, folds, predictor_cols = NULL, seed = NULL)
Arguments
data |
Data frame with a column (named 'pa') indicating presence (1) or absence (0) and columns for the predictor variables. |
folds |
A vector of fold assignments (same length as 'data'). |
predictor_cols |
Optional; a character vector of column names to be used as predictors. If NULL, all columns except 'pa' will be used. |
seed |
Optional; random seed. |
Value
A list with:
- metrics
A data frame containing the true positives (TP), false positives (FP), false negatives (FN), true negatives (TN), and various performance metrics including precision (PREC), sensitivity (SEN), specificity (SPC), false discovery rate (FDR), negative predictive value (NPV), false negative rate (FNR), false positive rate (FPR), F-score, accuracy (ACC), balanced accuracy (BA), and true skill statistic (TSS) for each fold.
- predictions
Data frame with observed, predicted, probability, and fold assignment per test instance.
Create a Download Action Button
Description
This function generates a download action button that triggers the download of a file when clicked.
Usage
downloadActionButton(
outputId,
label = "Download",
icon = NULL,
width = NULL,
status = NULL,
outline = FALSE,
...
)
Arguments
outputId |
The output ID for the button. |
label |
The label text displayed on the button. Default is "Download". |
icon |
The icon to be displayed on the button. Default is NULL. |
width |
The width of the button. Default is NULL. |
status |
The status of the button. Default is NULL. |
outline |
Logical indicating whether to use outline style for the button. Default is FALSE. |
... |
Additional parameters to be passed to the actionButton function. |
Value
Returns a download action button with the specified parameters.
Evaluation metrics for model predictions
Description
Computes a set of performance metrics (e.g., AUC, TSS, CBI) based on observed and predicted values.
Usage
evaluation_metrics(df, na.rm = TRUE, method = "spearman")
Arguments
df |
A data.frame with columns: 'observed' (0/1), 'predicted' (0/1), 'probability' (numeric). |
na.rm |
Logical. Whether to remove rows with NA values. |
method |
Correlation method for CBI ("spearman", "pearson", or "kendall"). |
Value
A named list or data.frame with evaluation metrics.
Server Logic for Export Plot Functionality
Description
Sets up server-side functionality for exporting plots, including creating a modal dialog for user input on export preferences (height, width, format) and processing the download.
Usage
export_plot_server(id, exported_plot)
Value
No return value, this function is used for its side effects within a Shiny app.
Create UI for Export Plot Button
Description
This function generates a UI element (action button) for exporting plots. The button is styled to be minimalistic, featuring only a download icon.
Usage
export_plot_ui(id)
Value
Returns an actionButton for use in a Shiny UI that triggers plot export modal when clicked.
Extract Non-NA Covariate Values
Description
This function extracts covariate values for species occurrences, excluding NA values.
Usage
extract_noNA_cov_values(data, covariate_layers, predictor_variables)
Arguments
data |
A data frame containing species occurrence data with columns x/long (first column) and y/lat (second column). |
covariate_layers |
A list of raster layers representing covariates. |
predictor_variables |
Variables to select from all the layers. |
Details
This function extracts covariate values for each species occurrence location from the provided covariate layers. It returns a data frame containing species occurrence data with covariate values, excluding any NA values.
Value
A data frame containing species occurrence data with covariate values, excluding NA values.
Server-side Logic for Custom File Input
Description
Processes the file input from the UI component created by file_input_area_ui
and provides access to the uploaded file data.
Usage
file_input_area_server(id)
Value
A reactive expression that returns a data frame containing information about the uploaded files, or NULL if no files have been uploaded.
Custom File Input UI
Description
Creates a customized file input area in a Shiny application. The file input is designed to be visually distinct and supports features such as multiple file selection and file type restrictions.
Usage
file_input_area_ui(
id,
label = "Input text: ",
multiple = FALSE,
accept = NULL,
width = NULL,
button_label = "Browse...",
icon_name = NULL
)
Value
A Shiny UI object that can be added to a Shiny application.
Fit a BART Model Using Environmental Covariate Layers
Description
This function fits a Bayesian Additive Regression Trees (BART) model using presence/absence data and environmental covariate layers.
Usage
fit_bart_model(y, x, seed = NULL, ...)
Arguments
y |
A numeric vector indicating presence (1) or absence (0). |
x |
A data frame with the same number of rows as the length of the vector 'y', containing the covariate values. |
seed |
An optional integer value for setting the random seed for reproducibility. |
... |
Additional arguments passed to 'dbarts::bart()'. |
Value
A BART model object.
Generate cross-validation folds
Description
Creates cross-validation fold assignments for presence-absence or presence-only data, supporting three types of strategies: k-fold, spatial blocks (through blockCV R package), and temporal blocks.
Usage
generate_cv_folds(
data,
method = "k-fold",
block_method = "predictors_autocorrelation",
block_size = NULL,
k = 10,
predictor_raster = NULL,
model_residuals = NULL,
coords = c("decimalLongitude", "decimalLatitude")
)
Arguments
data |
A 'data.frame' with at least presence-absence data ('pa'), coordinates, and optionally a 'timestamp'. |
method |
The cross-validation strategy. One of: '"k-fold"', '"spatial_blocks"', '"temporal_blocks"'. |
block_method |
For spatial blocks, how to determine block size. One of: '"residuals_autocorrelation"', '"predictors_autocorrelation"', '"manual"'. |
block_size |
Numeric. Manual block size in meters (used if 'block_method = "manual"'). |
k |
Integer. Number of folds to generate. |
predictor_raster |
A 'terra::SpatRaster' used for estimating spatial autocorrelation (only needed if 'block_method = "predictors_autocorrelation"'). |
model_residuals |
A 'data.frame' with residuals and coordinates (only needed if 'block_method = "residuals_autocorrelation"'). |
coords |
A character vector of length 2 indicating the longitude and latitude column names. |
Value
A list with the following elements:
- folds
A vector of fold assignments (one per row in 'data').
- method
The CV method used.
- block_method
The spatial block size method (if applicable).
- block_size
The estimated or manual block size (in meters), if spatial blocks were used.
Generate Pseudo-Absences Using Buffer-Out Strategy
Description
This function generates pseudo-absences outside a buffer around presence points but within the convex hull of those points. This prevents spatial overlap while preserving geographic realism.
Usage
generate_pa_buffer_out(
presences,
raster_stack,
predictor_variables,
coords = c("decimalLongitude", "decimalLatitude"),
pa_buffer_distance = 0.5,
ratio = 1,
attempts = 100,
seed = NULL
)
Arguments
presences |
Data frame containing presence points. |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
pa_buffer_distance |
Numeric; buffer radius in degrees around each presence. Default is 0.5. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional seed for reproducibility. |
Value
A data frame of pseudo-absences with coordinates, timestamp, 'pa = 0', and covariate values.
Generate Random Pseudo-Absences
Description
This function generates pseudo-absence points randomly across the study area (random background), optionally applying spatial thinning to match presence filtering strategy.
Usage
generate_pa_random(
presences,
study_area,
raster_stack,
predictor_variables,
coords = c("decimalLongitude", "decimalLatitude"),
ratio = 1,
attempts = 100,
seed = NULL
)
Arguments
presences |
Data frame containing presence points. |
study_area |
Spatial polygon defining the study area ('sf' object). |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional random seed. |
Value
Data frame containing pseudo-absence points with coordinates, timestamp, pa = 0, and covariates.
Generate Pseudo-Absences Using Target-Group Background
Description
Generate Pseudo-Absences Using Target-Group Background
Usage
generate_pa_target_group(
presences,
target_group_points,
study_area,
raster_stack,
predictor_variables,
coords = c("decimalLongitude", "decimalLatitude"),
ratio = 1,
attempts = 100,
seed = NULL
)
Arguments
presences |
Data frame containing presence points. |
target_group_points |
Data frame of all sampling locations (target group). |
study_area |
Spatial polygon defining the study area ('sf' object). |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional random seed. |
Value
Data frame containing pseudo-absence points with coordinates, timestamp, pa = 0, and covariates.
Generate Prediction Plot
Description
This function generates a plot based on prediction raster layers and presence/absence points.
Usage
generate_prediction_plot(
prediction_layer,
pa_points,
legend_label,
non_study_area_mask,
coords
)
Arguments
prediction_layer |
Raster prediction layer. |
pa_points |
Presence/absence points. |
legend_label |
Label for the legend. |
non_study_area_mask |
Spatial polygon representing the non study areas. |
Value
Returns a ggplot object representing the world prediction plot.
Generate Pseudo-Absence Points Using Different Methods Based on Presence Points, Covariates, and Study Area Polygon
Description
Wrapper function for pseudo-absence generation methods, such as background random points, target-group, and using buffer area.
Usage
generate_pseudo_absences(
method = c("random", "target_group", "buffer_out"),
presences,
raster_stack,
predictor_variables,
study_area = NULL,
target_group_points = NULL,
coords = c("decimalLongitude", "decimalLatitude"),
pa_buffer_distance = 0.5,
ratio = 1,
attempts = 100,
seed = NULL
)
Arguments
method |
Character; one of "random", "target_group", or "buffer_out". |
presences |
Data frame of presence points with coordinates and timestamp. |
raster_stack |
SpatRaster of covariates. |
predictor_variables |
Character vector of selected predictors. |
study_area |
Optional sf polygon (used for clipping). |
target_group_points |
Optional data frame of sampling points (for target-group). |
coords |
Vector of coordinate column names. |
pa_buffer_distance |
Numeric; buffer radius in degrees around each presence. Default is 0.5. |
ratio |
Ratio of pseudo-absences to presences. |
attempts |
Max attempts to fulfill sample size. |
seed |
Optional seed for reproducibility. |
Value
A data frame of pseudo-absence points (pa = 0) with covariates.
Compute specificity and sensitivity
Description
Compute specificity and sensitivity
Usage
getFprTpr(actuals, predictedScores, threshold = 0.5)
Arguments
actuals |
The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'. |
predictedScores |
The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's. |
threshold |
If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5. |
Details
This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).
Value
A list with two elements: fpr (false positive rate) and tpr (true positive rate).
Get Covariate Names
Description
This function extracts the names of covariates from a ZIP file containing covariate layers.
Usage
get_covariate_names(file_path)
Arguments
file_path |
Path to the ZIP file containing covariate layers. |
Details
This function extracts the names of covariates from a ZIP file containing covariate layers.
Value
A character vector containing the names of covariates.
Main Analysis Function for GLOSSA Package
Description
This function wraps all the analysis that the GLOSSA package performs. It processes presence-absence data, environmental covariates, and performs species distribution modeling and projections under past and future scenarios.
Usage
glossa_analysis(
pa_data = NULL,
fit_layers = NULL,
proj_files = NULL,
study_area_poly = NULL,
predictor_variables = NULL,
thinning_method = NULL,
thinning_value = NULL,
scale_layers = FALSE,
buffer = NULL,
native_range = NULL,
suitable_habitat = NULL,
other_analysis = NULL,
model_args = list(),
cv_methods = NULL,
cv_folds = 5,
cv_block_source = "residuals_autocorrelation",
cv_block_size = NULL,
pseudoabsence_method = "random",
pa_ratio = 1,
target_group_points = NULL,
pa_buffer_distance = NULL,
seed = NA,
waiter = NULL
)
Arguments
pa_data |
A list of data frames containing presence-absence data including 'decimalLongitude', 'decimalLatitude', 'timestamp', and 'pa' columns. |
fit_layers |
A ZIP file with the raster files containing model fitting environmental layers formatted as explained in the website documentation. |
proj_files |
A list of ZIP file paths containing environmental layers for projection scenarios. |
study_area_poly |
A spatial polygon defining the study area. |
predictor_variables |
A list of the predictor variables to be used in the analysis for each occurrence dataset. |
thinning_method |
A character specifying the spatial thinning method to apply to occurrence data. Options are 'c("none", "distance", "grid", "precision")'. See 'GeoThinneR' package for details. |
thinning_value |
A numeric value used for thinning depending on the selected method: distance in meters ('distance'), grid resolution in degrees ('grid'), or decimal precision ('precision'). |
scale_layers |
Logical; if 'TRUE', covariate layers will be standardize (z-score) based on fit layers. |
buffer |
Buffer value or distance in decimal degrees (arc_degrees) for buffering the study area polygon. |
native_range |
A vector of scenarios ‘c(’fit_layers', 'projections')' where native range modeling should be performed. |
suitable_habitat |
A vector of scenarios ‘c(’fit_layers', 'projections')' where habitat suitability modeling should be performed. |
other_analysis |
A vector of additional analyses to perform (e.g., ''variable_importance', 'functional_responses', 'cross_validation''). |
model_args |
A named list of additional arguments passed to the modeling function (e.g., 'dbarts::bart'). This allows users to fine-tune model parameters such as 'ntree' or 'k'. These are passed internally via '...' and must match the arguments of the selected model function. |
cv_methods |
A vector of the cross-validation strategies to perform. One or multiple of '"k-fold"', '"spatial_blocks"', '"temporal_blocks"'. |
cv_folds |
Integer indicating the number of folds to generate. |
cv_block_source |
For spatial blocks, how to determine block size. One of: '"residuals_autocorrelation"', '"predictors_autocorrelation"', '"manual"'. |
cv_block_size |
Numeric block size in meters (used if 'cv_block_source = "manual"'). |
pseudoabsence_method |
Method for generating pseudo-absences. One of "random", "target_group", or "buffer_out". |
pa_ratio |
Ratio of pseudo-absences to presences (pseudo-absence:presences). |
target_group_points |
Optional data frame for sampling points for target-group method. |
pa_buffer_distance |
Numeric buffer radius in degrees around each presence. Default is NULL. |
seed |
Optional; an integer seed for reproducibility of results. |
waiter |
Optional; a waiter instance to update progress in a Shiny application. |
Value
A list containing structured outputs from each major section of the analysis, including model data, projections, variable importance scores, and habitat suitability assessments.
Export Glossa Model Results
Description
This function exports various types of Glossa model results, including native range predictions, suitable habitat predictions, model data, variable importance, functional response results, and presence/absence probability cutoffs. It generates raster files for prediction results, TSV files for model data and variable importance, and TSV files for functional response results. Additionally, it creates a TSV file for presence/absence probability cutoffs if provided.
Usage
glossa_export(
species = NULL,
models = NULL,
layer_results = NULL,
fields = NULL,
model_data = FALSE,
model_summary = FALSE,
fr = FALSE,
prob_cut = FALSE,
varimp = FALSE,
cross_val = FALSE,
layer_format = "tif",
projections_results = NULL,
presence_absence_list = NULL,
other_results = NULL,
pa_cutoff = NULL,
config_snapshot = NULL
)
Arguments
species |
A character vector specifying the species names. |
models |
A character vector specifying the types of models to export results for. |
layer_results |
A list containing layer results for native range and suitable habitat predictions. |
fields |
A character vector specifying the fields to include in the exported results. |
model_data |
Logical, indicating whether to export model data. |
fr |
Logical, indicating whether to export functional response results. |
prob_cut |
Logical, indicating whether to export presence/absence probability cutoffs. |
varimp |
Logical, indicating whether to export variable importance. |
cross_val |
Logical, indicating whether to export cross-validation metrics. |
layer_format |
A character vector specifying the format of the exported raster files. |
projections_results |
A list containing projections results. |
presence_absence_list |
A list containing presence/absence lists. |
other_results |
A list containing other types of results (e.g., variable importance, functional responses, cross-validation). |
pa_cutoff |
A list containing presence/absence probability cutoffs. |
Value
A character vector of file paths for the exported files or directories.
Invert a Polygon
Description
This function inverts a polygon by calculating the difference between the bounding box and the polygon.
Usage
invert_polygon(polygon, bbox = NULL)
Arguments
polygon |
An sf object representing the polygon to be inverted. |
bbox |
Optional. An sf or bbox object representing the bounding box. If NULL, the bounding box of the input polygon is used. |
Value
An sf object representing the inverted polygon.
Apply Polygon Mask to Raster Layers
Description
This function crops and extends raster layers to a study area extent (bbox) defined by longitude and latitude then applies a mask based on a provided spatial polygon to remove areas outside the polygon.
Usage
layer_mask(layers, study_area)
Arguments
layers |
A stack of raster layers ('SpatRaster' object) to be processed. |
study_area |
A spatial polygon ('sf' object) used to mask the raster layers. |
Value
A 'SpatRaster' object representing the masked raster layers.
Misclassification Error
Description
Misclassification Error
Usage
misClassError(actuals, predictedScores, threshold = 0.5)
Arguments
actuals |
The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'. |
predictedScores |
The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's. |
threshold |
If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5. |
Details
This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).
Value
The misclassification error, which tells what proportion of predicted direction did not match with the actuals.
Compute the optimal probability cutoff score
Description
Compute the optimal probability cutoff score
Usage
optimalCutoff(
actuals,
predictedScores,
optimiseFor = "misclasserror",
returnDiagnostics = FALSE
)
Arguments
actuals |
The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'. |
predictedScores |
The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's. |
optimiseFor |
The maximization criterion for which probability cutoff score needs to be optimised. Can take either of following values: "Ones" or "Zeros" or "Both" or "misclasserror"(default). If "Ones" is used, 'optimalCutoff' will be chosen to maximise detection of "One's". If 'Both' is specified, the probability cut-off that gives maximum Youden's Index is chosen. If 'misclasserror' is specified, the probability cut-off that gives minimum mis-classification error is chosen. |
returnDiagnostics |
If TRUE, would return additional diagnostics such as 'sensitivityTable', 'misclassificationError', 'TPR', 'FPR' and 'specificity' for the chosen cut-off. |
Details
This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).
Value
The optimal probability score cutoff that maximises a given criterion. If 'returnDiagnostics' is TRUE, then the following items are returned in a list:
Optimal Cutoff for Presence-Absence Prediction
Description
This function calculates the optimal cutoff for presence-absence prediction using a BART model.
Usage
pa_optimal_cutoff(y, x, model, seed = NULL)
Arguments
y |
Vector indicating presence (1) or absence (0). |
x |
Dataframe with same number of rows as the length of the vector 'y' with the covariate values. |
model |
A BART model object. |
seed |
Random seed for reproducibility. |
Value
The optimal cutoff value for presence-absence prediction.
Plot cross-validation fold assignments
Description
Plot cross-validation fold assignments
Usage
plot_cv_folds_points(data, polygon = NULL)
Arguments
data |
Dataframe with columns: 'decimalLongitude', 'decimalLatitude', 'pa' and 'fold'. |
polygon |
An sf object representing the inverted study area. |
Value
A ggplot object showing point color-coded by cv fold and shaped by presence/absence.
Plot cross-validation metrics
Description
This function generates a cross-validation radial plot based on evaluation metrics.
Usage
plot_cv_metrics(data)
Arguments
data |
Dataframe containing cross-validation results. |
Value
Returns a ggplot object representing the cross-validation plot.
Make Predictions Using a BART Model
Description
This function makes predictions using a Bayesian Additive Regression Trees (BART) model on a stack of environmental covariates ('SpatRaster').
Usage
predict_bart(bart_model, layers, cutoff = NULL)
Arguments
bart_model |
A BART model object obtained from fitting BART using the 'dbarts' package. |
layers |
A SpatRaster object containing environmental covariates for prediction. |
cutoff |
An optional numeric cutoff value for determining potential presences. If NULL, potential presences and absences will not be computed. |
Value
A SpatRaster containing the mean, median, standard deviation, and quantiles of the posterior predictive distribution, as well as a potential presences layer if cutoff is provided.
Read and Validate Extent Polygon
Description
This function reads and validates a polygon file containing the extent. It checks if the file has the correct format and extracts the geometry.
Usage
read_extent_polygon(file_path, show_modal = FALSE)
Arguments
file_path |
Path to the polygon file containing the extent. |
show_modal |
Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE. |
Value
A spatial object representing the extent if the file is valid, NULL otherwise.
Load Covariate Layers from ZIP Files
Description
This function loads covariate layers from a ZIP file, verifies their spatial characteristics, and returns them as a list of raster layers.
Usage
read_layers_zip(
file_path,
extend = TRUE,
first_layer = FALSE,
show_modal = FALSE
)
Arguments
file_path |
Path to the ZIP file containing covariate layers. |
extend |
If TRUE it will take the largest extent, if FALSE the smallest. |
first_layer |
If TRUE it will return only the layers from the first timestamp. |
show_modal |
Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE. |
Value
A list containing raster layers for each covariate.
Read and validate presences/absences CSV file
Description
This function reads and validates a CSV file containing presences and absences data for species occurrences. It checks if the file has the expected columns and formats.
Usage
read_presences_absences_csv(
file_path,
file_name = NULL,
show_modal = FALSE,
timestamp_mapping = NULL,
coords = c("decimalLongitude", "decimalLatitude"),
sep = "\t",
dec = "."
)
Arguments
file_path |
The file path to the CSV file. |
file_name |
Optional. The name of the file. If not provided, the base name of the file path is used. |
show_modal |
Optional. Logical. Whether to show a modal notification for warnings (use in Shiny). Default is FALSE. |
timestamp_mapping |
Optional. A vector with the timestamp mapping of the environmental layers. |
coords |
Optional. Character vector of length 2 specifying the names of the columns containing the longitude and latitude coordinates. Default is c("decimalLongitude", "decimalLatitude"). |
sep |
Optional. The field separator character. Default is tab-separated. |
dec |
Optional. The decimal point character. Default is ".". |
Value
A data frame with the validated data if the file has the expected columns and formats, NULL otherwise.
Remove Duplicated Points from a Dataframe
Description
This function removes duplicated points from a dataframe based on specified coordinate columns.
Usage
remove_duplicate_points(df, coords = c("decimalLongitude", "decimalLatitude"))
Arguments
df |
A dataframe object with each row representing one point. |
coords |
A character vector specifying the names of the coordinate columns used for identifying duplicate points. Default is c("decimalLongitude", "decimalLatitude"). |
Value
A dataframe without duplicated points.
Remove Points Inside or Outside a Polygon
Description
This function removes points from a dataframe based on their location relative to a specified polygon.
Usage
remove_points_polygon(
df,
polygon,
overlapping = FALSE,
coords = c("decimalLongitude", "decimalLatitude")
)
Arguments
df |
A dataframe object with rows representing points. |
polygon |
An sf polygon object defining the region for point removal. |
overlapping |
Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE). |
coords |
Character vector specifying the column names for longitude and latitude. Default is c("decimalLongitude", "decimalLatitude"). |
Value
A dataframe containing the filtered points.
Calculate Response Curve Using BART Model
Description
This function calculates the response curve (functional responses) using a Bayesian Additive Regression Trees (BART) model.
Usage
response_curve_bart(bart_model, data, predictor_names)
Arguments
bart_model |
A BART model object obtained from fitting BART ('dbarts::bart'). |
data |
A data frame containing the predictor variables (the design matrix) used in the BART model. |
predictor_names |
A character vector containing the names of the predictor variables. |
Value
A list containing a data frame for each independent variable with mean, 2.5th percentile, 97.5th percentile, and corresponding values of the variables.
Run GLOSSA Shiny App
Description
This function launches the GLOSSA Shiny web application.
Usage
run_glossa(
request_size_mb = 2000,
launch.browser = TRUE,
port = getOption("shiny.port"),
clear_global_env = FALSE
)
Arguments
request_size_mb |
Maximum request size for file uploads, in megabytes. Default is 2000 MB. |
launch.browser |
Logical indicating whether to launch the app in the browser (default is TRUE). |
port |
Port number for the Shiny app. Uses the port specified by 'getOption("shiny.port")' by default. |
clear_global_env |
Logical. If TRUE, clears the global environment after the app exits. |
Details
The GLOSSA Shiny app provides an interactive interface for users to access GLOSSA functionalities.
Value
No return value, called to launch the GLOSSA app.
Note
Use 'clear_global_env = TRUE' cautiously, as it removes all objects from your R environment after the app exits.
Examples
if(interactive()) {
run_glossa()
run_glossa(clear_global_env = TRUE) # clears all global objects
}
Calculate the sensitivity for a given logit model
Description
Calculate the sensitivity for a given logit model
Usage
sensitivity(actuals, predictedScores, threshold = 0.5)
Arguments
actuals |
The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'. |
predictedScores |
The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's. |
threshold |
If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5. |
Details
This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).
Value
The sensitivity of the given binary response actuals and predicted probability scores, which is, the number of observations with the event AND predicted to have the event divided by the number of observations with the event.
Create a Sparkline Value Box
Description
This function creates a custom value box with a sparkline plot embedded in it.
Usage
sparkvalueBox(
title,
sparkline_data,
description,
type = "line",
box_color = "white",
width = 4,
elevation = 0,
...
)
Arguments
title |
The title or heading of the value box. |
sparkline_data |
The data used to generate the sparkline plot. |
description |
A short description or additional information displayed below the value box. |
type |
The type of sparkline plot to generate. Default is "line". |
box_color |
The background color of the value box. |
width |
The width of the value box. Default is 4. |
elevation |
The elevation of the value box. Default is 0. |
... |
Additional parameters to be passed to the sparkline function. |
Value
Returns a custom value box with the specified parameters.
Calculate the specificity for a given logit model
Description
Calculate the specificity for a given logit model
Usage
specificity(actuals, predictedScores, threshold = 0.5)
Arguments
actuals |
The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'. |
predictedScores |
The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's. |
threshold |
If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5. |
Details
This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).
Value
The specificity of the given binary response actuals and predicted probability scores, which is, the number of observations without the event AND predicted to not have the event divided by the number of observations without the event.
Validate Fit and Projection Layers
Description
This function validates fit and projection layers by checking their covariates.
Usage
validate_fit_projection_layers(
fit_layers_path,
proj_layers_path,
show_modal = FALSE
)
Arguments
fit_layers_path |
Path to the ZIP file containing fit layers. |
proj_layers_path |
Path to the ZIP file containing projection layers. |
show_modal |
Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE. |
Value
TRUE if the layers pass validation criteria, FALSE otherwise.
Validate Layers Zip
Description
This function validates a ZIP file containing environmental layers. It checks if the layers have the same number of files, CRS (Coordinate Reference System), and resolution.
Usage
validate_layers_zip(file_path, timestamp_mapping = NULL, show_modal = FALSE)
Arguments
file_path |
Path to the ZIP file containing environmental layers. |
timestamp_mapping |
Optional. A vector with the timestamp mapping of the environmental layers. |
show_modal |
Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE. |
Value
TRUE if the layers pass validation criteria, FALSE otherwise.
Validate Match Between Presence/Absence Files and Fit Layers
Description
This function validates whether the time periods of the presence/absence data match the environmental layers.
Usage
validate_pa_fit_time(pa_data, fit_layers_path, show_modal = FALSE)
Arguments
pa_data |
Data frame containing the presence/absence data with a 'timestamp' column. |
fit_layers_path |
Path to the ZIP file containing fit layers. |
show_modal |
Optional. Logical. Whether to show a modal notification for warnings. Default is FALSE. |
Value
TRUE if the timestamps match the fit layers, FALSE otherwise.
Variable Importance in BART Model
Description
This function computes the variable importance scores for a fitted BART (Bayesian Additive Regression Trees) model using a permutation-based approach. It measures the impact of each predictor variable on the model's performance by permuting the values of that variable and evaluating the change in performance (F-score is the performance metric).
Usage
variable_importance(bart_model, y, x, cutoff = 0, n_repeats = 10, seed = NULL)
Arguments
bart_model |
A BART model object. |
y |
Vector indicating presence (1) or absence (0). |
x |
Dataframe with same number of rows as the length of the vector 'y' with the covariate values. |
cutoff |
A numeric threshold for converting predicted probabilities into presence-absence. |
n_repeats |
An integer indicating the number of times to repeat the permutation for each variable. |
seed |
An optional seed for random number generation. |
Value
A data frame where each column corresponds to a predictor variable, and each row contains the variable importance scores across permutations.
Calculate Youden's index
Description
Calculate Youden's index
Usage
youdensIndex(actuals, predictedScores, threshold = 0.5)
Arguments
actuals |
The actual binary flags for the response variable. It can take a numeric vector containing values of either 1 or 0, where 1 represents the 'Good' or 'Events' while 0 represents 'Bad' or 'Non-Events'. |
predictedScores |
The prediction probability scores for each observation. If your classification model gives the 1/0 predictions, convert it to a numeric vector of 1's and 0's. |
threshold |
If predicted value is above the threshold, it will be considered as an event (1), else it will be a non-event (0). Defaults to 0.5. |
Details
This function was obtained from the InformationValue R package (https://github.com/selva86/InformationValue).
Value
The youdensIndex of the given binary response actuals and predicted probability scores, which is calculated as Sensitivity + Specificity - 1