Type: | Package |
Title: | Tools Developed by the Long Term Ecological Research Community |
Version: | 2.0.0 |
Date: | 2025-03-26 |
Maintainer: | Nicholas Lyon <lyon@nceas.ucsb.edu> |
Description: | Set of the data science tools created by various members of the Long Term Ecological Research (LTER) community. These functions were initially written largely as standalone operations and have later been aggregated into this package. |
License: | BSD_3_clause + file LICENSE |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
URL: | https://lter.github.io/ltertools/ |
BugReports: | https://github.com/lter/ltertools/issues |
RoxygenNote: | 7.3.1 |
Depends: | R (≥ 3.5) |
Imports: | dplyr, generics, ggplot2, magrittr, purrr, readxl, stats, stringr, supportR, tidyr, utils |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-03-26 18:14:42 UTC; lyon |
Author: | Nicholas Lyon [aut, cre] (https://njlyon0.github.io/), Angel Chen [aut] (https://angelchen7.github.io), Miguel C. Leon [ctb] (https://luquillo.lter.network/), National Science Foundation [fnd] (NSF 1929393, 09/01/2019 - 08/31/2024), University of California, Santa Barbara [cph] |
Repository: | CRAN |
Date/Publication: | 2025-03-26 18:30:02 UTC |
ltertools: Tools Developed by the Long Term Ecological Research Community
Description
Set of the data science tools created by various members of the Long Term Ecological Research (LTER) community. These functions were initially written largely as standalone operations and have later been aggregated into this package.
Author(s)
Maintainer: Nicholas Lyon lyon@nceas.ucsb.edu (https://njlyon0.github.io/)
Authors:
Angel Chen anchen@nceas.ucsb.edu (https://angelchen7.github.io)
Other contributors:
Miguel C. Leon (https://luquillo.lter.network/) [contributor]
National Science Foundation (NSF 1929393, 09/01/2019 - 08/31/2024) [funder]
University of California, Santa Barbara [copyright holder]
See Also
Useful links:
Generate the Skeleton of a Column Key
Description
Creates the start of a 'column key' for harmonizing data. A column key includes a column for the file names to be harmonized into a single data object as well as a column for the column names in those files. Finally, it includes a column indicating the tidied name that corresponds with each raw column name. Harmonization can accept this key object and use it to rename all raw column names–in a reproducible way–to standardize across datasets. Currently supports raw files of the following formats: CSV, TXT, XLS, and XLSX
Usage
begin_key(
raw_folder = NULL,
data_format = c("csv", "txt", "xls", "xlsx"),
guess_tidy = FALSE
)
Arguments
raw_folder |
(character) folder / folder path containing data files to include in key |
data_format |
(character) file extensions to identify within the |
guess_tidy |
(logical) whether to attempt to "guess" what the tidy name equivalent should be for each raw column name. This is accomplished via coercion to lowercase and removal of special character/repeated characters. If |
Value
(dataframe) skeleton of column key
Examples
# Generate two simple tables
## Dataframe 1
df1 <- data.frame("xx" = c(1:3),
"unwanted" = c("not", "needed", "column"),
"yy" = letters[1:3])
## Dataframe 2
df2 <- data.frame("LETTERS" = letters[4:7],
"NUMBERS" = c(4:7),
"BONUS" = c("plantae", "animalia", "fungi", "protista"))
# Generate a local folder for exporting
temp_folder <- tempdir()
# Export both files to that folder
utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE)
utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE)
# Generate a column key with "guesses" at tidy column names
ltertools::begin_key(raw_folder = temp_folder, data_format = "csv", guess_tidy = TRUE)
Check and Prepare a Column Key Object
Description
Accepts a column key dataframe and checks to make sure it has the needed structure for ltertools::harmonize
. Also removes unnecessary columns and rows that lack a "tidy_name". Function invoked 'under the hood' by ltertools::harmonize
.
Usage
check_key(key = NULL)
Arguments
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
Value
(dataframe) key object with only "source", "raw_name" and "tidy_name" columns and only retains rows where a "tidy_name" is specified.
Examples
# Generate a column key object manually
key_obj <- data.frame("source" = c(rep("df1.csv", 3),
rep("df2.csv", 3)),
"raw_name" = c("xx", "unwanted", "yy",
"LETTERS", "NUMBERS", "BONUS"),
"tidy_name" = c("numbers", NA, "letters",
"letters", "numbers", "kingdom"))
# Check it
ltertools::check_key(key = key_obj)
Convert Temperature Values
Description
Converts a given set of temperature values from one unit to another
Usage
convert_temp(value = NULL, from = NULL, to = NULL)
Arguments
value |
(numeric) temperature values to convert |
from |
(character) starting units of the value, not case sensitive. |
to |
(character) units to which to convert, not case sensitive. |
Value
(numeric) converted temperature values
Examples
# Convert from Fahrenheit to Celsius
convert_temp(value = 32, from = "Fahrenheit", to = "c")
Calculate Coefficient of Variation
Description
Computes the coefficient of variation (CV), by dividing the standard deviation (SD) by the arithmetic mean of a set of numbers. If na_rm
is TRUE
then missing values are removed before calculation is completed
Usage
cv(x, na_rm = TRUE)
Arguments
x |
(numeric) vector of numbers for which to calculate CV |
na_rm |
(logical) whether to remove missing values from both average and SD calculation |
Value
(numeric) coefficient of variation
Examples
# Convert from Fahrenheit to Celsius
cv(x = c(4, 5, 6, 4, 5, 5), na_rm = TRUE)
Generate the Skeleton of a Column Key for Only New Data Files
Description
Data discovery–and harmonization–is an iterative process. For those already depending upon a column key and the harmonize
function, it can be cumbersome to add rows to an existing column key. This function formats rows for an existing column key for only datasets that are not already (A) in the column key or (B) in the harmonized data table.
Usage
expand_key(
key = NULL,
raw_folder = NULL,
harmonized_df = NULL,
data_format = c("csv", "txt", "xls", "xlsx"),
guess_tidy = FALSE
)
Arguments
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
raw_folder |
(character) folder / folder path containing data files to include in key |
harmonized_df |
(dataframe) harmonized data table produced with the current version of the column key. Must include a "source" column but other columns are ignored. |
data_format |
(character) file extensions to identify within the |
guess_tidy |
(logical) whether to attempt to "guess" what the tidy name equivalent should be for each raw column name. This is accomplished via coercion to lowercase and removal of special character/repeated characters. If |
Value
(dataframe) skeleton of rows to add to column key for data sources not already in harmonized data table
Examples
# Generate two simple tables
## Dataframe 1
df1 <- data.frame("xx" = c(1:3),
"unwanted" = c("not", "needed", "column"),
"yy" = letters[1:3])
## Dataframe 2
df2 <- data.frame("LETTERS" = letters[4:7],
"NUMBERS" = c(4:7),
"BONUS" = c("plantae", "animalia", "fungi", "protista"))
# Generate a local folder for exporting
temp_folder <- tempdir()
# Export both files to that folder
utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE)
utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE)
# Generate a column key with "guesses" at tidy column names
key1 <- ltertools::begin_key(raw_folder = temp_folder, data_format = "csv", guess_tidy = TRUE)
# Harmonize the data
harmony <- ltertools::harmonize(key = key1, raw_folder = temp_folder)
# Make a new data file
df3 <- data.frame("xx" = c(10:15),
"letters" = letters[10:15])
# Export this locally to the temp folder too
utils::write.csv(x = df3, file = file.path(temp_folder, "df3.csv"), row.names = FALSE)
# Identify what needs to be added to the existing column key
ltertools::expand_key(key = key1, raw_folder = temp_folder, harmonized_df = harmony,
data_format = "csv", guess_tidy = TRUE)
Harmonize Data via a Column Key
Description
A "column key" is meant to streamline harmonization of disparate datasets. This key must include three columns containing: (1) the name of each raw data file to be harmonized, (2) the name of all of the columns in each of those files, and (3) the "tidy name" that corresponds to each raw column name. This function accepts that key and the path to a folder containing all raw data files included in the key. Each dataset is then read in and the original column names are replaced with their respective "tidy_name" indicated in the key. Once this has been done to all files, a single dataframe is returned with only columns indicated in the column name. Currently the following file formats are supported for the raw data: CSV, TXT, XLS, and XLSX
Note that raw column names without an associated tidy name in the key are removed. We recommend using the begin_key
function in this package to generate the skeleton of the key to make achieving the required structure simpler.
Usage
harmonize(
key = NULL,
raw_folder = NULL,
data_format = c("csv", "txt", "xls", "xlsx"),
quiet = TRUE
)
Arguments
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
raw_folder |
(character) folder / folder path containing data files to include in key |
data_format |
(character) file extensions to identify within the |
quiet |
(logical) whether to suppress certain non-warning messages. Defaults to |
Value
(dataframe) harmonized dataframe including all columns defined in the "tidy_name" column of the key object
Examples
# Generate two simple tables
## Dataframe 1
df1 <- data.frame("xx" = c(1:3),
"unwanted" = c("not", "needed", "column"),
"yy" = letters[1:3])
## Dataframe 2
df2 <- data.frame("LETTERS" = letters[4:7],
"NUMBERS" = c(4:7),
"BONUS" = c("plantae", "animalia", "fungi", "protista"))
# Generate a local folder for exporting
temp_folder <- tempdir()
# Export both files to that folder
utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE)
utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE)
# Generate a column key object manually
key_obj <- data.frame("source" = c(rep("df1.csv", 3),
rep("df2.csv", 3)),
"raw_name" = c("xx", "unwanted", "yy",
"LETTERS", "NUMBERS", "BONUS"),
"tidy_name" = c("numbers", NA, "letters",
"letters", "numbers", "kingdom"))
# Use that to harmonize the 'raw' files we just created
ltertools::harmonize(key = key_obj, raw_folder = temp_folder, data_format = "csv")
Long Term Ecological Research Site Information
Description
There are currently 28 field sites involved with the Long Term Ecological Research (LTER) network. These sites occupy a range of habitats and were started / are renewed on site-specific timelines. To make this information more readily available to interested parties, this data object summarizes the key components of each site in an easy-to-use data format.
Usage
lter_sites
Format
Dataframe with 8 columns and 32 rows
- name
Full name of the LTER site
- code
Abbreviation (typically three letters) of the site name
- habitat
Simplified habitat designation of the site (or "mixed" for more complex habitat contexts)
- start_year
Year of initial funding by NSF as an official LTER site
- end_year
End of current funding cycle grant
- latitude
Degrees latitude of site
- longitude
Degrees longitude of site
- site_url
Website URL for the site
Source
Long Term Ecological Research Network Office. https://lternet.edu/site/
Read Data from Folder
Description
Reads in all data files of specified types found in the designated folder. Returns a list with one element for each data file. Currently supports CSV, TXT, XLS, and XLSX
Usage
read(raw_folder = NULL, data_format = c("csv", "txt", "xls", "xlsx"))
Arguments
raw_folder |
(character) folder / folder path containing data files to read |
data_format |
(character) file extensions to identify within the |
Value
(list) data found in specified folder of specified file format(s)
Examples
# Generate two simple tables
## Dataframe 1
df1 <- data.frame("xx" = c(1:3),
"unwanted" = c("not", "needed", "column"),
"yy" = letters[1:3])
## Dataframe 2
df2 <- data.frame("LETTERS" = letters[4:7],
"NUMBERS" = c(4:7),
"BONUS" = c("plantae", "animalia", "fungi", "protista"))
# Generate a local folder for exporting
temp_folder <- tempdir()
# Export both files to that folder
utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE)
utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE)
# Read in all CSV files in that folder
read(raw_folder = temp_folder, data_format = "csv")
Subsets the LTER Site Information Table by Site Codes and Habitats
Description
Subsets the information on long term ecological research (LTER) sites based on user-specified site codes (i.e., three letter abbreviations), and/or desired habitats. See lter_sites
for the full set of site information
Usage
site_subset(sites = NULL, habitats = NULL)
Arguments
sites |
(character) three letter site code(s) identifying site(s) of interest |
habitats |
(character) habitat(s) of interest. See |
Value
(dataframe) complete site information (8 columns) for all sites that meet the provided site code and/or habitat criteria
Create a Timeline of Site(s) that Meet Criteria
Description
Creates a ggplot2 plot of all sites that meet the user-specified site code (i.e., three letter abbreviation) and/or habitat criteria. See lter_sites
for the full set of site information including accepted site codes and habitat designations (unrecognized entries will trigger a warning and be ignored). Lines are grouped and colored by habitat to better emphasize possible similarities among sites
Usage
site_timeline(sites = NULL, habitats = NULL, colors = NULL)
Arguments
sites |
(character) three letter site code(s) identifying site(s) of interest |
habitats |
(character) habitat(s) of interest. See |
colors |
(character) colors to assign to the timelines expressed as a hexadecimal (e.g, #00FF00). Note there must be as many colors as habitats included in the graph |
Value
(ggplot2) plot object of timeline of site(s) that meet user-specified criteria
Examples
# Make the full timeline of all sites with default colors by supplying no arguments
site_timeline()
# Or make a timeline of only sites that meet certain criteria
site_timeline(habitats = c("grassland", "forest"))
Identify Solar Day Information
Description
For all days between the specified start and end date, identify the time of sunrise, sunset, and solar noon (in UTC) as well as the day length. The idea for this function was contributed by Miguel C. Leon and a Python equivalent lives in the Luquillo site's LUQ-general-utils GitHub repository.
Usage
solar_day_info(
lat = NULL,
lon = NULL,
start_date = NULL,
end_date = NULL,
quiet = FALSE
)
Arguments
lat |
(numeric) latitude coordinate for which to find day length |
lon |
(numeric) longitude coordinate for which to find day length |
start_date |
(character) starting date in 'YYYY-MM-DD' format |
end_date |
(character) ending date in 'YYYY-MM-DD' format |
quiet |
(logical) whether to suppress certain non-warning messages. Defaults to |
Value
(dataframe) table of 6 columns and a number of rows equal to the number of days between the specified start and end dates (inclusive). Columns contain: (1) date, (2) sunrise time, (3) sunset time, (4) solar noon, (5) day length, and (6) time zone of columns 2 to 4.
Examples
## Not run:
# Identify day information in Santa Barbara (California) for one week
solar_day_info(lat = 34.416857, lon = -119.712777,
start_date = "2022-02-07", end_date = "2022-02-12",
quiet = TRUE)
## End(Not run)
Standardize a Single Dataset via a Column Key
Description
A "column key" is meant to streamline harmonization of disparate datasets. This key must include three columns containing: (1) the name of each raw data file to be harmonized, (2) the name of all of the columns in each of those files, and (3) the "tidy name" that corresponds to each raw column name. This function accepts that key and a list of datasets that can be standardized with that key. The function standardizes the specified dataset out of any number of datasets in the key or list. While usable on its own, this function is intended to streamline internal operations of ltertools::harmonize
– which is the recommended tool for key-based harmonization.
Usage
standardize(focal_file = NULL, key = NULL, df_list = NULL)
Arguments
focal_file |
(character) filename corresponding to one value of "source" column of "key" data and to one name in "df_list". |
key |
(dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored |
df_list |
(list) named list of dataframe-like objects where each name is the filename initially containing that data |
Value
(dataframe) single standardized dataframe including all columns defined in the "tidy_name" column of the key object
Examples
#' # Generate two simple tables
## Dataframe 1
df1 <- data.frame("xx" = c(1:3),
"unwanted" = c("not", "needed", "column"),
"yy" = letters[1:3])
## Dataframe 2
df2 <- data.frame("LETTERS" = letters[4:7],
"NUMBERS" = c(4:7),
"BONUS" = c("plantae", "animalia", "fungi", "protista"))
# Generate a local folder for exporting
temp_folder <- tempdir()
# Export both files to that folder
utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE)
utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE)
# Read in list of these data files
data_list <- ltertools::read(raw_folder = temp_folder, data_format = "csv")
# Generate a column key object manually
key_obj <- data.frame("source" = c(rep("df1.csv", 3),
rep("df2.csv", 3)),
"raw_name" = c("xx", "unwanted", "yy",
"LETTERS", "NUMBERS", "BONUS"),
"tidy_name" = c("numbers", NA, "letters",
"letters", "numbers", "kingdom"))
# Standardize one dataset
ltertools::standardize(focal_file = "df1.csv", key = key_obj, df_list = data_list)