Title: | K's "Don't Repeat Yourself"-Collection |
Version: | 0.0.2 |
Description: | A collection of personal helper functions to avoid redundancy in the spirit of the "Don't repeat yourself" principle of software development (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). |
License: | GPL (≥ 3) |
URL: | https://github.com/kapsner/kdry |
BugReports: | https://github.com/kapsner/kdry/issues |
Depends: | R (≥ 2.10) |
Imports: | data.table, doParallel, foreach, Hmisc, magrittr, parallel, stats, utils |
Suggests: | ggplot2, lintr, survival, testthat (≥ 3.0.1) |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | false |
Date/Publication: | 2024-03-08 13:30:02 UTC |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | no |
Packaged: | 2024-03-08 12:38:22 UTC; user |
Author: | Lorenz A. Kapsner |
Maintainer: | Lorenz A. Kapsner <lorenz.kapsner@gmail.com> |
Repository: | CRAN |
dtr_matrix2df
Description
Data transformation: Converts a matrix
to data.table
and
encodes categorical variables as factor
.
Usage
dtr_matrix2df(matrix, cat_vars = NULL)
Arguments
matrix |
An R |
cat_vars |
A character vector with colnames that should be converted to
|
Value
A data.table
is returned.
Examples
data("iris")
mat <- data.matrix(iris)
dataset <- dtr_matrix2df(mat)
str(dataset)
dataset <- dtr_matrix2df(mat, cat_vars = "Species")
str(dataset)
icolnames
Description
Return colnames in a table with index numbers.
Usage
icolnames(df)
Arguments
df |
A data.frame object. |
Value
A data.table
with the two columns index
and name
is returned.
Examples
data("iris")
icolnames(iris)
list.append
Description
Helper function to append an R list.
Usage
list.append(main_list, append_list, ...)
Arguments
main_list |
A list, to which another should be appended. |
append_list |
A list to append to |
... |
Further arguments passed to |
Details
This function is a save wrapper around utils::modifyLists
to
combine lists as it checks for the input types and only appends the new
list if its length is greater than 0.
Value
A list
is returned.
See Also
Examples
l1 <- list("a" = 1, "b" = 2)
l2 <- list("c" = 3, "d" = 4)
list.append(l1, l2)
list.update
Description
Helper function to update items in an R list.
Usage
list.update(main_list, new_list, ...)
Arguments
main_list |
A list, which items should be updated. |
new_list |
A list with new values of items from |
... |
Further arguments passed to |
Details
This function is a save wrapper around utils::modifyLists
to
update items in R lists as it checks for the input types and only accepts
named lists.
Value
A list
is returned.
See Also
Examples
l1 <- list("a" = 1, "b" = 2)
l2 <- list("a" = 3, "b" = 4)
list.update(l1, l2)
misc_argument_catcher
Description
Miscellaneous helper function to type-save catch arguments passed with R's ellipsis ("...").
Usage
misc_argument_catcher(...)
Arguments
... |
Named arguments passed to a function. |
Details
This function aims at catching arguments that have been passed to an R function using R's ellipsis ("..."). Its purpos is to catch these arguments even in the case, if a list with arguments was provided to the ellipsis.
Value
A list
is returned.
Examples
misc_argument_catcher(a = 1)
misc_argument_catcher(a = 1, b = 2, c = 3, d = "car")
misc_argument_catcher(list(a = 1, b = 2, c = 3, d = "car"))
misc_argument_catcher(list(a = 1, b = 2, c = 3, d = "car"), f = 9)
misc_duplicated_by_names
Description
Miscellaneous helper function to detect items in an object with duplicated names, e.g. in named vectors or named lists.
Usage
misc_duplicated_by_names(object, ...)
Arguments
object |
An R object that has names. |
... |
Named arguments passed on to |
Value
Returns a logical vector of length(object)
with TRUE
indicating
the identified items with duplicated names.
See Also
Examples
misc_duplicated_by_names(list(a = 1, a = 1))
misc_recursive_copy
Description
Recursively copying directories and subdirectories.
Usage
misc_recursive_copy(source_dir, target_dir, force = FALSE)
Arguments
source_dir |
A character string. The path to the directory to be copied. |
target_dir |
A character string. The target path. |
force |
A boolean. If |
Value
This function has no return value.
Examples
if (interactive()) {
d1 <- file.path(tempdir(), "folder1")
d2 <- file.path(d1, "folder2")
d3 <- file.path(tempdir(), "new_folder")
f1 <- file.path(d1, "file.one")
dir.create(d2, recursive = TRUE)
file.create(f1)
misc_recursive_copy(d1, d3)
}
misc_subset_options
Description
Miscellaneous helper function to subset R options by a keyword.
Usage
misc_subset_options(keyword)
Arguments
keyword |
A character. The keyword to subset the R options. |
Details
This function subsets R's options()
by a keyword. It returns a
list of all available options that match with the keyword
. The keyword
is evaluated as a regular expression.
Value
A list
is returned, containing the subset of R's options()
that
matches with the keyword
.
Examples
misc_subset_options("default")
mlh_outsample_row_indices
Description
Machine learning helper function to convert a vector of (in- sample) row indices of a fold into out-of-sample row indices.
Usage
mlh_outsample_row_indices(fold_list, dataset_nrows, type = NULL)
Arguments
fold_list |
A list of integer vectors that describe the row indices of cross-validation folds. The list must be named. |
dataset_nrows |
An integer. The number of rows in the dataset dataset. This parameter is required in order to compute the out-of-sample row indices. |
type |
A character. To be used if the out-of-sample row indices need to
be formatted in a special manner (default: |
Value
If type = NULL
, returns a list of same length as fold_list
with
each item containing a vector of out-of-sample row indices. If
type = "glmnet"
, a data.table is returned with two columns and each row
representing one observation of the dataset that is assigned to a specific
test fold. The column "fold_id" should be passed further on to the argument
foldid
of glmnet::cv.glmnet
.
Examples
fold_list <- list(
"Fold1" = setdiff(seq_len(100), 1:33),
"Fold2" = setdiff(seq_len(100),66:100),
"Fold3" = setdiff(seq_len(100),34:65)
)
mlh_outsample_row_indices(fold_list, 100)
mlh_outsample_row_indices(fold_list, 100, "glmnet")
mlh_reshape
Description
Machine learning helper function to reshape a matrix of predicted probabilities to classes.
Usage
mlh_reshape(object)
Arguments
object |
A matrix with predicted probabilities for several classes. Each row must sum up to 1. |
Value
Returns a vector of type factor of the same length as rows in object, representing the class with the highest probability for each observation in object.
Examples
set.seed(123)
class_0 <- rbeta(100, 2, 4)
class_1 <- (1 - class_0) * 0.4
class_2 <- (1 - class_0) * 0.6
dataset <- cbind("0" = class_0, "1" = class_1, "2" = class_2)
mlh_reshape(dataset)
mlh_subset
Description
Machine learning helper function to select a subset from a data matrix or a response vector.
Usage
mlh_subset(object, ids)
Arguments
object |
A vector or a data matrix. Supports also subsetting of "Surv" objects. |
ids |
An integer vector specifying the indices that should be selected from the object. |
Value
Returns the specified subset of the object.
Examples
data("iris")
mlh_subset(iris, c(1:30))
mlh_subset(iris[, 5], c(1:30))
pch_check_available_cores
Description
Parallel computing helper function to check for the available cores.
Usage
pch_check_available_cores(ncores = -1L)
Arguments
ncores |
An integer. A number of cores requested for parallel computing
(default: |
Value
The function returns an integer that indicates the number of cores
available. If ncores <= parallel::detectCores()
the function returns
ncores
. If ncores > parallel::detectCores()
, the function returns
parallel::detectCores() - 1L
.
Examples
pch_check_available_cores(2)
pch_clean_up
Description
Parallel computing helper function to clean up the parallel backend.
Usage
pch_clean_up(cl)
Arguments
cl |
A cluster object of class |
Value
The function returns nothing. Internally, it calls
parallel::stopCluster()
and foreach::registerDoSEQ()
.
See Also
parallel::stopCluster()
, foreach::registerDoSEQ()
Examples
if (require("doParallel") && require("foreach")) {
cl <- pch_register_parallel(pch_check_available_cores(2))
pch_clean_up(cl)
}
pch_register_parallel
Description
Parallel computing helper function to register a parallel backend.
Usage
pch_register_parallel(ncores)
Arguments
ncores |
An integer. A number of cores requested for parallel computing
(default: |
Value
The function returns a object of class c("SOCKcluster", "cluster")
,
created with parallel::makePSOCKcluster()
.
See Also
parallel::makePSOCKcluster()
, doParallel::registerDoParallel()
Examples
if (require("doParallel") && require("foreach")) {
cl <- pch_register_parallel(pch_check_available_cores(2))
pch_clean_up(cl)
}
plt_parallel_coordinates
Description
Parallel coordinates plot
Usage
plt_parallel_coordinates(
data,
cols = NULL,
color_variable = NULL,
color_args = list(alpha = 0.6, begin = 0.1, end = 0.9, option = "inferno", direction =
1),
line_jitter = list(w = 0.04, h = 0.04),
text_label_size = 3.5
)
Arguments
data |
A data.table object with the columns containing the parameters to be plotted with the parallel coordinates plot. |
cols |
A character vector with column names to subset |
color_variable |
A character. The name of the column to be used to
colorize the lines of the plot (default: |
color_args |
A list with parameters for the color gradient (see details). |
line_jitter |
A list with the elements |
text_label_size |
A numeric value to define the size of the text
annotations (default: |
Details
The color gradient of the plotted lines can be defined with a list
provided to the argument color_args
. Its default values are
alpha = 0.6
, begin = .1
, end = .9
, option = "inferno"
, and
direction = 1
and are passed furhter on to
ggplot2::scale_color_viridis_c()
.
The implementation to display categorical variables is still experimental.
Value
Returns a parallel coordinates plot as ggplot2
object.
See Also
ggplot2::scale_color_viridis_c()
Examples
if (require("ggplot2")) {
data("iris")
plt_parallel_coordinates(
data = data.table::as.data.table(iris[, -5]),
cols = colnames(iris)[c(-1, -5)],
color_variable = "Sepal.Length"
)
}
rep_frac_pct
Description
Reporting helper function: computes and formats the relative percentage of a fraction.
Usage
rep_frac_pct(
count,
count_reference,
digits = 2,
na.rm = TRUE,
brackets = c("round", "square"),
suffix = TRUE
)
Arguments
count |
A numeric. The numerator. |
count_reference |
A numeric. The denominator. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
brackets |
A character. Either |
suffix |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
Value
A character with the formatted output.
See Also
stats::median, stats::quantile, Hmisc::wtd.quantile()
Examples
rep_frac_pct(count = 40, count_reference = 200)
rep_frac_pct(count = 40, count_reference = 200, brackets = "square")
rep_frac_pct(40, 200, brackets = "square", suffix = FALSE)
rep_mean_sd
Description
Reporting helper function: computes and formats mean and standard deviation from a numeric vector.
Usage
rep_mean_sd(
x,
digits = 2,
na.rm = TRUE,
sd_brackets = c("round", "square"),
sd_prefix = TRUE,
weighted = FALSE,
weights = NA
)
Arguments
x |
A numeric vector. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
sd_brackets |
A character. Either |
sd_prefix |
A logical. If |
weighted |
A logical. If |
weights |
A vector with the weights (if |
Value
A character with the formatted output.
See Also
mean()
, stats::sd()
, stats::weighted.mean()
,
Hmisc::wtd.var()
Examples
set.seed(123)
x <- rnorm(1000)
rep_mean_sd(x)
rep_mean_sd(rep(1, 10))
rep_mean_sd(x, sd_brackets = "square")
rep_mean_sd(x, sd_brackets = "square", sd_prefix = FALSE)
rep_median_ci
Description
Reporting helper function: computes and formats median and confidence interval from a numeric vector.
Usage
rep_median_ci(
x,
conf_int,
digits = 2,
na.rm = TRUE,
collapse = "to",
iqr_brackets = c("round", "square"),
iqr_prefix = TRUE,
weighted = FALSE,
weights = NA
)
Arguments
x |
A numeric vector. |
conf_int |
A numeric between 0 and 100 to indicate the confidence interval that should be computed. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
collapse |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
iqr_brackets |
A character. Either |
iqr_prefix |
A logical. If |
weighted |
A logical. If |
weights |
A numeric vector of weights passed further on to
|
Value
A character with the formatted output.
See Also
stats::median, stats::quantile, Hmisc::wtd.quantile()
Examples
set.seed(123)
x <- rnorm(1000)
rep_median_ci(x, conf_int = 95)
rep_median_ci(rep(1, 10), conf_int = 95)
rep_median_ci(x, conf_int = 95, collapse = "-")
rep_median_ci(x, iqr_brackets = "square", conf_int = 50)
rep_median_iqr
Description
Reporting helper function: computes and formats median and interquartile range from a numeric vector.
Usage
rep_median_iqr(
x,
digits = 2,
na.rm = TRUE,
collapse = "to",
iqr_brackets = c("round", "square"),
iqr_prefix = TRUE
)
Arguments
x |
A numeric vector. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
collapse |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
iqr_brackets |
A character. Either |
iqr_prefix |
A logical. If |
Details
This is just a special case of rep_median_ci()
with the parameter
conf_int
set to 50
.
Value
A character with the formatted output.
See Also
Examples
set.seed(123)
x <- rnorm(1000)
rep_median_iqr(x)
rep_median_iqr(rep(1, 10))
rep_median_iqr(x, collapse = "-")
rep_median_iqr(x, iqr_brackets = "square")
rep_median_iqr(x, iqr_brackets = "square", iqr_prefix = FALSE)
rep_median_iqr(x, collapse = ";", iqr_prefix = FALSE)
rep_pval
Description
Reporting helper function: formats p-value.
Usage
rep_pval(p, threshold = 0.001, digits = 3L)
Arguments
p |
The p-value that should be formatted. |
threshold |
A threshold to indicate that only "< threshold" is printed as output (default: 0.001). |
digits |
The number of digits of the formatted p-value (digits). |
Details
If the p-value is lower than the threshold, the output of the function is "< threshold". Otherwise, the p-value is formatted to the number of digits.
Value
A character with the formatted p-value.
Examples
rep_pval(0.032)
rep_pval(0.00032)
rep_sum_pct
Description
Reporting helper function: computes and formats the relative percentage of a count.
Usage
rep_sum_pct(
count,
count_reference,
digits = 2,
na.rm = TRUE,
brackets = c("round", "square"),
suffix = TRUE
)
Arguments
count |
A numeric. The numerator. |
count_reference |
A numeric. The denominator. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
brackets |
A character. Either |
suffix |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
Value
A character with the formatted output.
See Also
stats::median, stats::quantile, Hmisc::wtd.quantile()
Examples
rep_sum_pct(count = 40, count_reference = 200)
rep_sum_pct(count = 40, count_reference = 200, brackets = "square")
rep_sum_pct(40, 200, brackets = "square", suffix = FALSE)
sts_normalize
Description
Statistic helper function to normalize a continuous variable between zero and one.
Usage
sts_normalize(x, na.rm = FALSE)
Arguments
x |
A vector of type |
na.rm |
A logical to indicate, if missings should be removed. |
Value
Returns a vector of same length as x
with values normalized between
zero and one. If x
contains missings and na.rm = TRUE
, the missings are
removed before normalization; otherwise, a vector of NA
is returend.
Examples
sts_normalize(1:100)