Title: | Data Quality in Epidemiological Research |
Version: | 2.5.1 |
Description: | Data quality assessments guided by a 'data quality framework introduced by Schmidt and colleagues, 2021' <doi:10.1186/s12874-021-01252-7> target the data quality dimensions integrity, completeness, consistency, and accuracy. The scope of applicable functions rests on the availability of extensive metadata which can be provided in spreadsheet tables. Either standardized (e.g. as 'html5' reports) or individually tailored reports can be generated. For an introduction into the specification of corresponding metadata, please refer to the 'package website' https://dataquality.qihs.uni-greifswald.de/VIN_Annotation_of_Metadata.html. |
License: | BSD_2_clause + file LICENSE |
URL: | https://dataquality.qihs.uni-greifswald.de/ |
BugReports: | https://gitlab.com/libreumg/dataquier/-/issues |
Depends: | R (≥ 3.6.0) |
Imports: | dplyr (≥ 1.0.2), emmeans, ggplot2 (≥ 3.5.0), lme4, lubridate, MASS, MultinomialCI, parallelMap, patchwork (≥ 1.3.0), R.devices, rlang, robustbase, qmrparser, utils, rio, readr, scales, withr, lifecycle, units, methods |
Suggests: | openxlsx2, GGally, grDevices, jsonlite, cli, whoami, anytime, cowplot (≥ 0.9.4), digest, DT (≥ 0.23), flexdashboard, flexsiteboard, htmltools, knitr, markdown, parallel, parallelly, rJava, rmarkdown, rstudioapi, testthat (≥ 3.1.9), tibble, vdiffr, pkgload, Rdpack, callr, colorspace, plotly, ggvenn, htmlwidgets, future, processx, R6, shiny, xml2, mgcv, rvest, textutils, dbx, ggpubr, grImport2, rsvg, stringdist, rankICC, nnet, ordinal, storr, reticulate |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
KeepSource: | FALSE |
Language: | en-US |
RoxygenNote: | 7.3.2 |
Config/testthat/parallel: | true |
Config/testthat/edition: | 3 |
Config/testthat/start-first: | dq_report_by_sm, dq_report2, dq_report_by_arguments, dq_report_by_s, int_encoding_errors, dq_report_by_pipesymbol_list, dq_report_by_m, plots, acc_loess, com_item_missingness, dq_report_by_na, dq_report_by_directories, con_limit_deviations, con_contradictions_redcap, com_segment_missingness, util_correct_variable_use |
BuildManual: | TRUE |
NeedsCompilation: | no |
Packaged: | 2025-03-05 17:44:09 UTC; struckmanns |
Author: | University Medicine Greifswald [cph],
Elisa Kasbohm |
Maintainer: | Stephan Struckmann <stephan.struckmann@uni-greifswald.de> |
Repository: | CRAN |
Date/Publication: | 2025-03-05 18:10:02 UTC |
The dataquieR
package about Data Quality in
Epidemiological Research
Description
For a quick start please read dq_report2 and maybe the vignettes or the package's website.
Options
This package features the following options()
:
Author(s)
Maintainer: Stephan Struckmann stephan.struckmann@uni-greifswald.de (ORCID)
Authors:
Elisa Kasbohm elisa.kasbohm@uni-greifswald.de (ORCID)
Elena Salogni elena.salogni@uni-greifswald.de (ORCID)
Joany Marino joany.marino@uni-greifswald.de (ORCID)
Adrian Richter richtera@uni-greifswald.de (ORCID)
Carsten Oliver Schmidt carsten.schmidt@uni-greifswald.de (ORCID)
Other contributors:
University Medicine Greifswald [copyright holder]
German Research Foundation (DFG SCHM 2744/3-1, SCHM 2744/9-1, SCHM 2744/3-4) [funder]
National Research Data Infrastructure for Personal Health Data: (NFDI 13/1) [funder]
European Union’s Horizon 2020 programme (euCanSHare, grant agreement No. 825903) [funder]
References
See Also
Useful links:
Report bugs at https://gitlab.com/libreumg/dataquier/-/issues
Other options:
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Write single results from a dataquieR_resultset2 report
Description
Write single results from a dataquieR_resultset2 report
Usage
## S3 replacement method for class 'dataquieR_resultset2'
x$el <- value
Arguments
x |
the report |
el |
the index |
value |
the single result |
Value
the dataquieR
result object
Extract elements of a dataquieR
Result Object
Description
Extract elements of a dataquieR
Result Object
Usage
## S3 method for class 'dataquieR_result'
x$...
Arguments
x |
the |
... |
arguments passed to the implementation for lists. |
Value
the element of the dataquieR
result object with all messages
still attached
See Also
Access single results from a dataquieR_resultset2 report
Description
Access single results from a dataquieR_resultset2 report
Usage
## S3 method for class 'dataquieR_resultset2'
x$el
Arguments
x |
the report |
el |
the index |
Value
the dataquieR
result object
Holds Indicator .// Descriptor assignments from the manual at run-time
Description
Holds Indicator .// Descriptor assignments from the manual at run-time
Usage
..indicator_or_descriptor
Format
An object of class environment
of length 0.
Holds parts of the manual at run-time
Description
Holds parts of the manual at run-time
Usage
..manual
Format
An object of class environment
of length 0.
Access elements from a dataquieR_resultset2
Description
does so, but similar to [
for lists.
Usage
.access_dq_rs2(x, els)
Arguments
x |
the |
els |
the selector (character, number or logical) |
Value
the sub-list of x
Write elements from a dataquieR_resultset2
Description
does so, but similar to [
for lists.
Usage
.access_dq_rs2(x, els) <- value
Arguments
x |
the |
els |
the selector (character, number or logical) |
value |
|
Value
the modified x
Get Access to Utility Functions
Description
Usage
.get_internal_api(fkt, version = API_VERSION, or_newer = TRUE)
Arguments
fkt |
function name |
version |
version number to get |
Value
an API object
Roxygen
-Template for indicator functions
Description
Roxygen
-Template for indicator functions
Usage
.template_function_indicator(
resp_vars,
study_data,
label_col,
item_level,
meta_data,
meta_data_v2,
meta_data_dataframe,
meta_data_segment,
dataframe_level,
segment_level
)
Arguments
resp_vars |
variable the names of the measurement variables, if
missing or |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
meta_data_segment |
data.frame – optional: Segment level metadata |
dataframe_level |
data.frame alias for |
segment_level |
data.frame alias for |
Value
invisible(NULL)
Make normalizations of v2.0 item_level metadata.
Description
Requires referred missing-tables being available by
prep_get_data_frame
.
Usage
.util_internal_normalize_meta_data(
meta_data = "item_level",
label_col = LABEL,
verbose = TRUE
)
Arguments
meta_data |
data.frame old name for |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
verbose |
logical display all estimated decisions, defaults to |
Variable-argument roles
Description
A Variable-argument role is the intended use of an argument of a indicator
function – an argument that refers variables.
In general for the table .variable_arg_roles, the suffix _var means one
variable allowed,
while _vars means more than one. The default sets of arguments
for util_correct_variable_use/util_correct_variable_use2 are defined
from the point of usage, e.g. if it could be, that NAs are in
the list of variable names, the function should be able to remove certain
response variables
from the output and not disallow them by setting allow_na
to FALSE
.
Usage
.variable_arg_roles
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 14 rows and 9 columns.
See Also
Version of the API
Description
Version of the API
Usage
API_VERSION
Format
An object of class package_version
(inherits from numeric_version
) of length 1.
See Also
Cross-item level metadata attribute name
Description
The allowable direction of an association. The input is a string that can be either "positive" or "negative".
Usage
ASSOCIATION_DIRECTION
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
The allowable form of association. The string specifies the form based on a selected list.
Usage
ASSOCIATION_FORM
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
The metric underlying the association in ASSOCIATION_RANGE. The input is a string that specifies the analysis algorithm to be used.
Usage
ASSOCIATION_METRIC
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
Specifies the allowable range of an association. The inclusion of the endpoints follows standard mathematical notation using round brackets for open intervals and square brackets for closed intervals. Values must be separated by a semicolon.
Usage
ASSOCIATION_RANGE
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
Specifies the unique IDs for cross-item level metadata records
Usage
CHECK_ID
Format
An object of class character
of length 1.
Details
if missing, dataquieR
will create such IDs
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
Specifies the unique labels for cross-item level metadata records
Usage
CHECK_LABEL
Format
An object of class character
of length 1.
Details
if missing, dataquieR
will create such labels
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
types of value codes
Description
types of value codes
Usage
CODE_CLASSES
Format
An object of class list
of length 3.
Default Name of the Table featuring Code Lists
Description
Default Name of the Table featuring Code Lists
Metadata sheet name containing VALUE_LABEL_TABLES This metadata sheet can contain both value labels of several VALUE_LABEL_TABLE and also Missing and JUMP tables
Usage
CODE_LIST_TABLE
CODE_LIST_TABLE
Format
An object of class character
of length 1.
An object of class character
of length 1.
Only existence is checked, order not yet used
Description
Only existence is checked, order not yet used
Usage
CODE_ORDER
Format
An object of class character
of length 1.
Cross-item level metadata attribute name
Description
Note: in some prep_
-functions, this field is named RULE
Usage
CONTRADICTION_TERM
Format
An object of class character
of length 1.
Details
Specifies a contradiction rule. Use REDCap
like syntax, see
online vignette
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
Specifies the type of a contradiction. According to the data quality concept, there are logical and empirical contradictions, see online vignette
Usage
CONTRADICTION_TYPE
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
For contradiction rules, the required pre-processing steps that can be given. TODO JM: MISSING_LABEL will not work for non-factor variables
Usage
DATA_PREPARATION
Format
An object of class character
of length 1.
Details
LABEL LIMITS MISSING_NA MISSING_LABEL MISSING_INTERPRET
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Data Types
Description
Data Types of Study Data
In the metadata, the following entries are allowed for the variable attribute DATA_TYPE:
Usage
DATA_TYPES
Format
An object of class list
of length 4.
Details
-
integer
for integer numbers -
string
for text/string/character data -
float
for decimal/floating point numbers -
datetime
for timepoints
Data Types of Function Arguments
As function arguments, dataquieR uses additional type specifications:
-
numeric
is a numerical value (float or integer), but it is not an allowedDATA_TYPE
in the metadata. However, some functions may acceptfloat
orinteger
for specific function arguments. This is, where we use the termnumeric
. -
enum
allows one element out of a set of allowed options similar to match.arg -
set
allows a subset out of a set of allowed options similar to match.arg withseveral.ok = TRUE
. -
variable
Function arguments of this type expect a character scalar that specifies one variable using the variable identifier given in the metadata attributeVAR_NAMES
or, iflabel_col
is set, given in the metadata attribute given in that argument. Labels can easily be translated using prep_map_labels -
variable list
Function arguments of this type expect a character vector that specifies variables using the variable identifiers given in the metadata attributeVAR_NAMES
or, iflabel_col
is set, given in the metadata attribute given in that argument. Labels can easily be translated using prep_map_labels
See Also
All available data types, mapped from their respective R types
Description
All available data types, mapped from their respective R types
Usage
DATA_TYPES_OF_R_TYPE
Format
An object of class list
of length 14.
See Also
Data frame level metadata attribute name
Description
Name of the data frame
Usage
DF_CODE
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
Number of expected data elements in a data frame. numeric. Check only conducted if number entered
Usage
DF_ELEMENT_COUNT
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
The name of the data frame containing the reference IDs to be compared with the IDs in the study data set.
Usage
DF_ID_REF_TABLE
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
All variables that are to be used as one single ID variable (combined key) in a data frame.
Usage
DF_ID_VARS
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
Name of the data frame
Usage
DF_NAME
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
The type of check to be conducted when comparing the reference ID table with the IDs delivered in the study data files.
Usage
DF_RECORD_CHECK
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
Number of expected data records in a data frame. numeric. Check only conducted if number entered
Usage
DF_RECORD_COUNT
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
Defines expectancies on the uniqueness of the IDs across the rows of a data frame, or the number of times some ID can be repeated.
Usage
DF_UNIQUE_ID
Format
An object of class character
of length 1.
See Also
Data frame level metadata attribute name
Description
Specifies whether identical data is permitted across rows in a data frame (excluding ID variables)
Usage
DF_UNIQUE_ROWS
Format
An object of class character
of length 1.
See Also
All available probability distributions for acc_shape_or_scale
Description
-
uniform
For uniform distribution -
normal
For Gaussian distribution -
gamma
For a gamma distribution
Usage
DISTRIBUTIONS
Format
An object of class list
of length 3.
Descriptor Function
Description
A function that returns some figure or table to assess data quality, but it does not return a value correlating with the magnitude of a data quality problem. It's the opposite of an Indicator.
The object Descriptor
only contains the name used internally to tag
such functions.
Usage
Descriptor
Format
An object of class character
of length 1.
See Also
Cross-item level metadata attribute name
Description
Defines the measurement variable to be used as a known gold standard. Only one variable can be defined as the gold standard.
Usage
GOLDSTANDARD
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Indicator Function
Description
A function that returns some value that correlates with the magnitude of
a certain class of data quality problems. Typically, in dataquieR
, such
functions return a SummaryTable
that features columns with names, that
start with a short abbreviation that describes the specific semantics of
the value (e.g., PCT
for a percentage or COR
for a correlation) and
the public name of the indicator according to the data quality concept
DQ_OBS
, e.g., com_qum_nonresp
for item-non-response-rate. A name could
therefore be PCT_com_qum_nonresp
.
The object Indicator
only contains the name used internally to tag
such functions.
Usage
Indicator
Format
An object of class character
of length 1.
See Also
An exception class assigned for exceptions caused by long variable labels
Description
An exception class assigned for exceptions caused by long variable labels
Usage
LONG_LABEL_EXCEPTION
Format
An object of class character
of length 1.
Cross-item level metadata attribute name
Description
Select, whether to compute acc_multivariate_outlier.
Usage
MULTIVARIATE_OUTLIER_CHECK
Format
An object of class character
of length 1.
Details
You can leave the cell empty, then the depends on the setting of the
option
dataquieR.MULTIVARIATE_OUTLIER_CHECK. If this column is missing,
all this is the same as having all cells empty and
dataquieR.MULTIVARIATE_OUTLIER_CHECK
set to "auto"
.
See also MULTIVARIATE_OUTLIER_CHECKTYPE
.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
Select, which outlier criteria to compute, see acc_multivariate_outlier.
Usage
MULTIVARIATE_OUTLIER_CHECKTYPE
Format
An object of class character
of length 1.
Details
You can leave the cell empty, then, all checks will apply. If you enter
a set of methods, the maximum for N_RULES changes. See also
UNIVARIATE_OUTLIER_CHECKTYPE
.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item and item level metadata attribute name
Description
Select, how many violated outlier criteria make an observation an outlier, see acc_multivariate_outlier.
Usage
N_RULES
Format
An object of class character
of length 1.
Details
You can leave the cell empty, then, all applied checks must deem an observation an outlier to have it flagged. See UNIVARIATE_OUTLIER_CHECKTYPE and MULTIVARIATE_OUTLIER_CHECKTYPE for the selected outlier criteria.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Cross-item level metadata attribute name
Description
Specifies the type of reliability or validity analysis. The string specifies the analysis algorithm to be used, and can be either "inter-class" or "intra-class".
Usage
REL_VAL
Format
An object of class character
of length 1.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
VARIABLE_LIST
,
meta_data_cross
,
util_normalize_cross_item()
Scale Levels
Description
Scale Levels of Study Data according to Stevens's
Typology
In the metadata, the following entries are allowed for the variable attribute SCALE_LEVEL:
Usage
SCALE_LEVELS
Format
An object of class list
of length 5.
Details
-
nominal
for categorical variables -
ordinal
for ordinal variables (i.e., comparison of values is possible) -
interval
for interval scales, i.e., distances are meaningful -
ratio
for ratio scales, i.e., ratios are meaningful -
na
for variables, that contain e.g. unstructured texts,json
,xml
, ... to distinguish them from variables, that still need to have theSCALE_LEVEL
estimated byprep_scalelevel_from_data_and_metadata()
Examples
sex, eye color –
nominal
income group, education level –
ordinal
temperature in degree Celsius –
interval
body weight, temperature in Kelvin –
ratio
See Also
Segment level metadata attribute name
Description
The name of the data frame containing the reference IDs to be compared with the IDs in the targeted segment.
Usage
SEGMENT_ID_REF_TABLE
Format
An object of class character
of length 1.
See Also
Deprecated segment level metadata attribute name
Description
The name of the data frame containing the reference IDs to be compared with the IDs in the targeted segment.
Usage
SEGMENT_ID_TABLE
Format
An object of class character
of length 1.
Details
Please use SEGMENT_ID_REF_TABLE
Segment level metadata attribute name
Description
All variables that are to be used as one single ID variable (combined key) in a segment.
Usage
SEGMENT_ID_VARS
Format
An object of class character
of length 1.
See Also
Segment level metadata attribute name
Description
true
or false
to suppress crude segment missingness output
(Completeness/Misg. Segments
in the report). Defaults to compute
the output, if more than one segment is available in the item-level
metadata.
Usage
SEGMENT_MISS
Format
An object of class character
of length 1.
See Also
Segment level metadata attribute name
Description
The name of the segment participation status variable
Usage
SEGMENT_PART_VARS
Format
An object of class character
of length 1.
See Also
Segment level metadata attribute name
Description
The type of check to be conducted when comparing the reference ID table with the IDs in a segment.
Usage
SEGMENT_RECORD_CHECK
Format
An object of class character
of length 1.
See Also
Segment level metadata attribute name
Description
Number of expected data records in each segment. numeric. Check only conducted if number entered
Usage
SEGMENT_RECORD_COUNT
Format
An object of class character
of length 1.
See Also
Segment level metadata attribute name
Description
Segment level metadata attribute name
Usage
SEGMENT_UNIQUE_ID
Format
An object of class character
of length 1.
See Also
Segment level metadata attribute name
Description
Specifies whether identical data is permitted across rows in a segment (excluding ID variables)
Usage
SEGMENT_UNIQUE_ROWS
Format
An object of class character
of length 1.
See Also
Character used by default as a separator in metadata such as missing codes
Description
This 1 character is according to our metadata concept "|".
Usage
SPLIT_CHAR
Format
An object of class character
of length 1.
Valid unit symbols according to units::valid_udunits()
Description
like m, g, N, ...
See Also
Other UNITS:
UNIT_IS_COUNT
,
UNIT_PREFIXES
,
UNIT_SOURCES
,
WELL_KNOWN_META_VARIABLE_NAMES
Is a unit a count according to units::valid_udunits()
Description
see column def
, therein
Details
like %
, ppt
, ppm
See Also
Other UNITS:
UNITS
,
UNIT_PREFIXES
,
UNIT_SOURCES
,
WELL_KNOWN_META_VARIABLE_NAMES
Valid unit prefixes according to units::valid_udunits_prefixes()
Description
like k, m, M, c, ...
See Also
Other UNITS:
UNITS
,
UNIT_IS_COUNT
,
UNIT_SOURCES
,
WELL_KNOWN_META_VARIABLE_NAMES
Maturity stage of a unit according to units::valid_udunits()
Description
see column source_xml
therein, i.e., base, derived, accepted, or common
See Also
Other UNITS:
UNITS
,
UNIT_IS_COUNT
,
UNIT_PREFIXES
,
WELL_KNOWN_META_VARIABLE_NAMES
Item level metadata attribute name
Description
Select, which outlier criteria to compute, see acc_univariate_outlier.
Usage
UNIVARIATE_OUTLIER_CHECKTYPE
Format
An object of class character
of length 1.
Details
You can leave the cell empty, then, all checks will apply. If you enter
a set of methods, the maximum for N_RULES changes. See also
MULTIVARIATE_OUTLIER_CHECKTYPE
.
See Also
WELL_KNOWN_META_VARIABLE_NAMES
Requirement levels of certain metadata columns
Description
These levels are cumulatively used by the function prep_create_meta and
related in the argument level
therein.
Usage
VARATT_REQUIRE_LEVELS
Format
An object of class list
of length 5.
Details
currently available:
'COMPATIBILITY' = "compatibility"
'REQUIRED' = "required"
'RECOMMENDED' = "recommended"
'OPTIONAL' = "optional"
'TECHNICAL' = "technical"
Cross-item level metadata attribute name
Description
Specifies a group of variables for multivariate analyses. Separated
by |, please use variable names from VAR_NAMES or
a label as specified in label_col
, usually LABEL or LONG_LABEL.
Usage
VARIABLE_LIST
Format
An object of class character
of length 1.
Details
if missing, dataquieR
will create such IDs from CONTRADICTION_TERM,
if specified.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
meta_data_cross
,
util_normalize_cross_item()
Variable roles can be one of the following:
Description
-
intro
a variable holding consent-data -
primary
a primary outcome variable -
secondary
a secondary outcome variable -
process
a variable describing the measurement process -
suppress
a variable added on the fly computing sub-reports, i.e., by dq_report_by to have all referred variables available, even if they are not part of the currently processed segment. But they will only be fully assessed in their real segment's report.
Usage
VARIABLE_ROLES
Format
An object of class list
of length 5.
Well-known metadata column names, names of metadata columns
Description
names of the variable attributes in the metadata frame holding the names of the respective observers, devices, lower limits for plausible values, upper limits for plausible values, lower limits for allowed values, upper limits for allowed values, the variable name (column name, e.g. v0020349) used in the study data, the variable name used for processing (readable name, e.g. RR_DIAST_1) and in parameters of the QA-Functions, the variable label, variable long label, variable short label, variable data type (see also DATA_TYPES), re-code for definition of lists of event categories, missing lists and jump lists as CSV strings. For valid units see UNITS.
Usage
WELL_KNOWN_META_VARIABLE_NAMES
Format
An object of class list
of length 58.
Details
all entries of this list will be mapped to the package's exported NAMESPACE environment directly, i.e. they are available directly by their names too:
See Also
meta_data_segment for STUDY_SEGMENT
Other UNITS:
UNITS
,
UNIT_IS_COUNT
,
UNIT_PREFIXES
,
UNIT_SOURCES
Examples
print(WELL_KNOWN_META_VARIABLE_NAMES$VAR_NAMES)
# print(VAR_NAMES) # should usually also work
Write to a report
Description
Overwriting of elements only list-wise supported
Usage
## S3 replacement method for class 'dataquieR_resultset2'
x[...] <- value
Arguments
x |
a 'dataquieR_resultset2 |
... |
if this contains only one entry and this entry is not named
or its name is |
value |
new value to write |
Value
nothing, stops
Extract Parts of a dataquieR
Result Object
Description
Extract Parts of a dataquieR
Result Object
Usage
## S3 method for class 'dataquieR_result'
x[...]
Arguments
x |
the |
... |
arguments passed to the implementation for lists. |
Value
the sub-list of the dataquieR
result object with all messages
still attached
See Also
Get a subset of a dataquieR
dq_report2
report
Description
Get a subset of a dataquieR
dq_report2
report
Usage
## S3 method for class 'dataquieR_resultset2'
x[row, col, res, drop = FALSE, els = row]
Arguments
x |
the report |
row |
the variable names, must be unique |
col |
the function-call-names, must be unique |
res |
the result slot, must be unique |
drop |
drop, if length is 1 |
els |
used, if in list-mode with named argument |
Value
a list with results, depending on drop
and the number of results,
the list may contain all requested results in sub-lists. The order
of the results follows the order of the row/column/result-names given
Set a single result from a dataquieR 2
report
Description
Set a single result from a dataquieR 2
report
Usage
## S3 replacement method for class 'dataquieR_resultset2'
x[[el]] <- value
Arguments
x |
the report |
el |
the index |
value |
the single result |
Value
the dataquieR
result object
Extract Elements of a dataquieR
Result Object
Description
Extract Elements of a dataquieR
Result Object
Usage
## S3 method for class 'dataquieR_result'
x[[...]]
Arguments
x |
the |
... |
arguments passed to the implementation for lists. |
Value
the element of the dataquieR
result object with all messages
still attached
See Also
Get a single result from a dataquieR 2
report
Description
Get a single result from a dataquieR 2
report
Usage
## S3 method for class 'dataquieR_resultset2'
x[[el]]
Arguments
x |
the report |
el |
the index |
Value
the dataquieR
result object
Plots and checks for distributions for categorical variables
Description
To complete
Usage
acc_cat_distributions(
resp_vars = NULL,
group_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable the name of the measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
To complete
Value
A list with:
-
SummaryPlot
: ggplot2::ggplot for the response variable inresp_vars
.
See Also
Plots and checks for distributions
Description
Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.
Usage
acc_distributions(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
check_param = c("any", "location", "proportion"),
plot_ranges = TRUE,
flip_mode = "noflip",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the names of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
check_param |
enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available. |
plot_ranges |
logical Should the plot show ranges and results from the data quality checks? (default: TRUE) |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
A list with:
-
SummaryTable
: data.frame containing data quality checks for "Unexpected location" (FLG_acc_ud_loc
) and "Unexpected proportion" (FLG_acc_ud_prop
) for each response variable inresp_vars
. -
SummaryData
: a data.frame containing data quality checks for "Unexpected location" and / or "Unexpected proportion" for a report -
SummaryPlotList
: list of ggplot2::ggplots for each response variable inresp_vars
.
Algorithm of this implementation:
If no response variable is defined, select all variables of type float or integer in the study data.
Remove missing codes from the study data (if defined in the metadata).
Remove measurements deviating from (hard) limits defined in the metadata (if defined).
Exclude variables containing only
NA
or only one unique value (excludingNA
s).Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).
Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).
Plot histogram(s).
See Also
ECDF plots for distribution checks
Description
Data quality indicator checks "Unexpected location" and "Unexpected proportion" if a grouping variable is included: Plots of empirical cumulative distributions for the subgroups.
Usage
acc_distributions_ecdf(
resp_vars = NULL,
group_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
dataquieR.max_group_var_levels_in_plot_default),
n_obs_per_group_min = getOption("dataquieR.min_obs_per_group_var_in_plot",
dataquieR.min_obs_per_group_var_in_plot_default)
)
Arguments
resp_vars |
variable list the names of the measurement variables |
group_vars |
variable list the name of the observer, device or reader variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
n_group_max |
maximum number of categories to be displayed individually
for the grouping variable ( |
n_obs_per_group_min |
minimum number of data points per group to create
a graph for an individual category of the |
Value
A list with:
-
SummaryPlotList
: list of ggplot2::ggplots for each response variable inresp_vars
.
See Also
Plots and checks for distributions – Location
Description
Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.
Usage
acc_distributions_loc(
resp_vars = NULL,
study_data,
label_col = VAR_NAMES,
item_level = "item_level",
check_param = "location",
plot_ranges = TRUE,
flip_mode = "noflip",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the names of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
check_param |
enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available. |
plot_ranges |
logical Should the plot show ranges and results from the data quality checks? (default: TRUE) |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
A list with:
-
SummaryTable
: data.frame containing data quality checks for "Unexpected location" (FLG_acc_ud_loc
) and "Unexpected proportion" (FLG_acc_ud_prop
) for each response variable inresp_vars
. -
SummaryData
: a data.frame containing data quality checks for "Unexpected location" and / or "Unexpected proportion" for a report -
SummaryPlotList
: list of ggplot2::ggplots for each response variable inresp_vars
.
Algorithm of this implementation:
If no response variable is defined, select all variables of type float or integer in the study data.
Remove missing codes from the study data (if defined in the metadata).
Remove measurements deviating from (hard) limits defined in the metadata (if defined).
Exclude variables containing only
NA
or only one unique value (excludingNA
s).Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).
Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).
Plot histogram(s).
See Also
Plots and checks for distributions – only
Description
Usage
acc_distributions_only(
resp_vars = NULL,
study_data,
label_col = VAR_NAMES,
item_level = "item_level",
flip_mode = "noflip",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the names of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
A list with:
-
SummaryTable
: data.frame containing data quality checks for "Unexpected location" (FLG_acc_ud_loc
) and "Unexpected proportion" (FLG_acc_ud_prop
) for each response variable inresp_vars
. -
SummaryData
: a data.frame containing data quality checks for "Unexpected location" and / or "Unexpected proportion" for a report -
SummaryPlotList
: list of ggplot2::ggplots for each response variable inresp_vars
.
Algorithm of this implementation:
If no response variable is defined, select all variables of type float or integer in the study data.
Remove missing codes from the study data (if defined in the metadata).
Remove measurements deviating from (hard) limits defined in the metadata (if defined).
Exclude variables containing only
NA
or only one unique value (excludingNA
s).Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).
Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).
Plot histogram(s).
See Also
Plots and checks for distributions – Proportion
Description
Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.
Usage
acc_distributions_prop(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
check_param = "proportion",
plot_ranges = TRUE,
flip_mode = "noflip",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the names of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
check_param |
enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available. |
plot_ranges |
logical Should the plot show ranges and results from the data quality checks? (default: TRUE) |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
A list with:
-
SummaryTable
: data.frame containing data quality checks for "Unexpected location" (FLG_acc_ud_loc
) and "Unexpected proportion" (FLG_acc_ud_prop
) for each response variable inresp_vars
. -
SummaryData
: a data.frame containing data quality checks for "Unexpected location" and / or "Unexpected proportion" for a report -
SummaryPlotList
: list of ggplot2::ggplots for each response variable inresp_vars
.
Algorithm of this implementation:
If no response variable is defined, select all variables of type float or integer in the study data.
Remove missing codes from the study data (if defined in the metadata).
Remove measurements deviating from (hard) limits defined in the metadata (if defined).
Exclude variables containing only
NA
or only one unique value (excludingNA
s).Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).
Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).
Plot histogram(s).
See Also
Extension of acc_shape_or_scale to examine uniform distributions of end digits
Description
This implementation contrasts the empirical distribution of a measurement variables against assumed distributions. The approach is adapted from the idea of rootograms (Tukey (1977)) which is also applicable for count data (Kleiber and Zeileis (2016)).
Usage
acc_end_digits(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable the names of the measurement variables, mandatory |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
a list with:
-
SummaryTable
: data.frame with the columnsVariables
andFLG_acc_ud_shape
-
SummaryPlot
: ggplot2 distribution plot comparing expected with observed distribution
ALGORITHM OF THIS IMPLEMENTATION:
This implementation is restricted to data of type float or integer.
Missing codes are removed from resp_vars (if defined in the metadata)
The user must specify the column of the metadata containing probability distribution (currently only: normal, uniform, gamma)
Parameters of each distribution can be estimated from the data or are specified by the user
A histogram-like plot contrasts the empirical vs. the technical distribution
See Also
Smoothes and plots adjusted longitudinal measurements and longitudinal trends from logistic regression models
Description
The following R implementation executes calculations for quality indicator "Unexpected location" (see here. Local regression (LOESS) is a versatile statistical method to explore an averaged course of time series measurements (Cleveland, Devlin, and Grosse 1988). In context of epidemiological data, repeated measurements using the same measurement device or by the same examiner can be considered a time series. LOESS allows to explore changes in these measurements over time.
Usage
acc_loess(
resp_vars,
group_vars = NULL,
time_vars,
co_vars = NULL,
study_data,
label_col = VAR_NAMES,
item_level = "item_level",
min_obs_in_subgroup = 30,
resolution = 80,
comparison_lines = list(type = c("mean/sd", "quartiles"), color = "grey30", linetype =
2, sd_factor = 0.5),
mark_time_points = getOption("dataquieR.acc_loess.mark_time_points",
dataquieR.acc_loess.mark_time_points_default),
plot_observations = getOption("dataquieR.acc_loess.plot_observations",
dataquieR.acc_loess.plot_observations_default),
plot_format = getOption("dataquieR.acc_loess.plot_format",
dataquieR.acc_loess.plot_format_default),
meta_data = item_level,
meta_data_v2,
n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
dataquieR.max_group_var_levels_in_plot_default),
enable_GAM = getOption("dataquieR.GAM_for_LOESS", dataquieR.GAM_for_LOESS.default),
exclude_constant_subgroups =
getOption("dataquieR.acc_loess.exclude_constant_subgroups",
dataquieR.acc_loess.exclude_constant_subgroups.default),
min_bandwidth = getOption("dataquieR.acc_loess.min_bw",
dataquieR.acc_loess.min_bw.default),
min_proportion = getOption("dataquieR.acc_loess.min_proportion",
dataquieR.acc_loess.min_proportion.default)
)
Arguments
resp_vars |
variable the name of the continuous measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
time_vars |
variable the name of the variable giving the time of measurement |
co_vars |
variable list a vector of covariables for adjustment, for example age and sex. Can be NULL (default) for no adjustment. |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
min_obs_in_subgroup |
integer (optional argument) If |
resolution |
numeric the maximum number of time points used for plotting the trend lines |
comparison_lines |
list type and style of lines with which trend
lines are to be compared. Can be mean +/- 0.5
standard deviation (the factor can be specified
differently in |
mark_time_points |
logical mark time points with observations (caution, there may be many marks) |
plot_observations |
logical show observations as scatter plot in the
background. If there are |
plot_format |
enum AUTO | COMBINED | FACETS | BOTH. Return the plot
as one combined plot for all groups or as
facet plots (one figure per group). |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
n_group_max |
integer maximum number of categories to be displayed
individually for the grouping variable ( |
enable_GAM |
logical Can LOESS computations be replaced by general additive models to reduce memory consumption for large datasets? |
exclude_constant_subgroups |
logical Should subgroups with constant values be excluded? |
min_bandwidth |
numeric lower limit for the LOESS bandwidth, should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line. |
min_proportion |
numeric lower limit for the proportion of the smaller group (cases or controls) for creating a LOESS figure, should be greater than 0 and less than 0.4. |
Details
If mark_time_points
or plot_observations
is selected, but would result in
plotting more than 400 points, only a sample of the data will be displayed.
Limitations
The application of LOESS requires model fitting, i.e. the smoothness
of a model is subject to a smoothing parameter (span).
Particularly in the presence of interval-based missing data, high
variability of measurements combined with a low number of
observations in one level of the group_vars
may distort the fit.
Since our approach handles data without knowledge
of such underlying characteristics, finding the best fit is complicated if
computational costs should be minimal. The default of
LOESS in R uses a span of 0.75, which provides in most cases reasonable fits.
The function acc_loess
adapts the span for each level of the group_vars
(with at least as many observations as specified in min_obs_in_subgroup
and with at least three time points) based on the respective
number of observations.
LOESS consumes a lot of memory for larger datasets. That is why acc_loess
switches to a generalized additive model with integrated smoothness
estimation (gam
by mgcv
) if there are 1000 observations or more for
at least one level of the group_vars
(similar to geom_smooth
from ggplot2
).
Value
a list with:
-
SummaryPlotList
: list with two plots ifplot_format = "BOTH"
, otherwise one of the two figures described below:-
Loess_fits_facets
: The plot contains LOESS-smoothed curves for each level of thegroup_vars
in a separate panel. Added trend lines represent mean and standard deviation or quartiles (specified incomparison_lines
) for moving windows over the whole data. -
Loess_fits_combined
: This plot combines all curves into one panel. Given a low number of levels in thegroup_vars
, this plot eases comparisons. However, if the number increases this plot may be too crowded and unclear.
-
See Also
Estimate marginal means, see emmeans::emmeans
Description
This function examines the impact of so-called process variables on a measurement variable. This implementation combines a descriptive and a model-based approach. Process variables that can be considered in this implementation must be categorical. It is currently not possible to consider more than one process variable within one function call. The measurement variable can be adjusted for (multiple) covariables, such as age or sex, for example.
Marginal means rests on model-based results, i.e. a significantly different marginal mean depends on sample size. Particularly in large studies, small and irrelevant differences may become significant. The contrary holds if sample size is low.
Usage
acc_margins(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
study_data,
label_col,
item_level = "item_level",
threshold_type = "empirical",
threshold_value,
min_obs_in_subgroup = 5,
min_obs_in_cat = 5,
dichotomize_categorical_resp = TRUE,
cut_off_linear_model_for_ord = 10,
meta_data = item_level,
meta_data_v2,
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default),
include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
dataquieR.acc_margins_num_default),
n_violin_max = getOption("dataquieR.max_group_var_levels_with_violins",
dataquieR.max_group_var_levels_with_violins_default)
)
Arguments
resp_vars |
variable the name of the measurement variable |
group_vars |
variable list len=1-1. the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
threshold_type |
enum empirical | user | none. In case |
threshold_value |
numeric a multiplier or absolute value (see
|
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
min_obs_in_cat |
integer This optional argument specifies the minimum
number of observations that is required to include
a category (level) of the outcome ( |
dichotomize_categorical_resp |
logical Should nominal response variables always be transformed to binary variables? |
cut_off_linear_model_for_ord |
integer from=0. This optional argument
specifies the minimum number of observations for
individual levels of an ordinal outcome ( |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations? Note that ordinal grouping variables will not be reordered. |
include_numbers_in_figures |
logical Should the figure report the number of observations for each level of the grouping variable? |
n_violin_max |
integer from=0. This optional argument specifies
the maximum number of levels of the |
Details
Limitations
Selecting the appropriate distribution is complex. Dozens of continuous,
discrete or mixed distributions are conceivable in the context of
epidemiological data. Their exact exploration is beyond the scope of this
data quality approach. The present function uses the help function
util_dist_selection, the assigned SCALE_LEVEL
and the DATA_TYPE
to discriminate the following cases:
continuous data
binary data
count data with <= 20 distinct values
count data with > 20 distinct values (treated as continuous)
nominal data
ordinal data
Continuous data and count data with more than 20 distinct values are analyzed
by linear models. Count data with up to 20 distinct values are modeled by a
Poisson regression. For binary data, the implementation uses logistic
regression.
Nominal response variables will either be transformed to binary variables or
analyzed by multinomial logistic regression models. The latter option is only
available if the argument dichotomize_categorical_resp
is set to FALSE
and if the package nnet
is installed. The transformation to a binary
variable can be user-specified using the metadata columns RECODE_CASES
and/or RECODE_CONTROL
. Otherwise, the most frequent category will be
assigned to cases and the remaining categories to control.
For ordinal response variables, the argument cut_off_linear_model_for_ord
controls whether the data is analyzed in the same way as continuous data:
If every level of the variable has at least as many observations as specified
in the argument, the data will be analyzed by a linear model. Otherwise,
the data will be modeled by a ordered regression, if the package ordinal
is installed.
Value
a list with:
-
SummaryTable
: data.frame underlying the plot -
ResultData
: data.frame -
SummaryPlot
:ggplot2::ggplot()
margins plot
See Also
Calculate and plot Mahalanobis distances
Description
A standard tool to detect multivariate outliers is the Mahalanobis distance. This approach is very helpful for the interpretation of the plausibility of a measurement given the value of another. In this approach the Mahalanobis distance is used as a univariate measure itself. We apply the same rules for the identification of outliers as in univariate outliers:
the classical approach from Tukey:
1.5 * IQR
from the 1st (Q_{25}
) or 3rd (Q_{75}
) quartile.the 3SD approach, i.e. any measurement of the Mahalanobis distance not in the interval of
\bar{x} \pm 3*\sigma
is considered an outlier.the approach from Hubert for skewed distributions which is embedded in the R package robustbase
a completely heuristic approach named
\sigma
-gap.
For further details, please see the vignette for univariate outlier.
Usage
acc_multivariate_outlier(
variable_group = NULL,
id_vars = NULL,
label_col = VAR_NAMES,
study_data,
item_level = "item_level",
n_rules = 4,
max_non_outliers_plot = 10000,
criteria = c("tukey", "3sd", "hubert", "sigmagap"),
meta_data = item_level,
meta_data_v2,
scale = getOption("dataquieR.acc_multivariate_outlier.scale",
dataquieR.acc_multivariate_outlier.scale_default),
multivariate_outlier_check = TRUE
)
Arguments
variable_group |
variable list the names of the continuous measurement variables building a group, for that multivariate outliers make sense. |
id_vars |
variable optional, an ID variable of the study data. If not specified row numbers are used. |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
n_rules |
numeric from=1 to=4. the no. of rules that must be violated to classify as outlier |
max_non_outliers_plot |
integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic. |
criteria |
set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
scale |
logical Should min-max-scaling be applied per variable? |
multivariate_outlier_check |
logical really check, pipeline use, only. |
Value
a list with:
-
SummaryTable
: data.frame underlying the plot -
SummaryPlot
: ggplot2::ggplot2 outlier plot -
FlaggedStudyData
data.frame contains the original data frame with the additional columnstukey
,3SD
,hubert
, andsigmagap
. Every observation is coded 0 if no outlier was detected in the respective column and 1 if an outlier was detected. This can be used to exclude observations with outliers.
ALGORITHM OF THIS IMPLEMENTATION:
Implementation is restricted to variables of type float
Remove missing codes from the study data (if defined in the metadata)
The covariance matrix is estimated for all variables from
variable_group
The Mahalanobis distance of each observation is calculated
MD^2_i = (x_i - \mu)^T \Sigma^{-1} (x_i - \mu)
The four rules mentioned above are applied on this distance for each observation in the study data
An output data frame is generated that flags each outlier
A parallel coordinate plot indicates respective outliers
List function.
See Also
Identify univariate outliers by four different approaches
Description
A classical but still popular approach to detect univariate outlier is the
boxplot method introduced by Tukey 1977. The boxplot is a simple graphical
tool to display information about continuous univariate data (e.g., median,
lower and upper quartile). Outliers are defined as values deviating more
than 1.5 \times IQR
from the 1st (Q25) or 3rd (Q75) quartile. The
strength of Tukey's method is that it makes no distributional assumptions
and thus is also applicable to skewed or non mound-shaped data
Marsh and Seo, 2006. Nevertheless, this method tends to identify frequent
measurements which are falsely interpreted as true outliers.
A somewhat more conservative approach in terms of symmetric and/or normal
distributions is the 3SD approach, i.e. any measurement not in
the interval of mean(x) +/- 3 * \sigma
is considered an outlier.
Both methods mentioned above are not ideally suited to skewed distributions.
As many biomarkers such as laboratory measurements represent in skewed
distributions the methods above may be insufficient. The approach of Hubert
and Vandervieren 2008 adjusts the boxplot for the skewness of the
distribution. This approach is implemented in several R packages such as
robustbase::mc
which is used in this implementation of dataquieR
.
Another completely heuristic approach is also included to identify outliers. The approach is based on the assumption that the distances between measurements of the same underlying distribution should homogeneous. For comprehension of this approach:
consider an ordered sequence of all measurements.
between these measurements all distances are calculated.
the occurrence of larger distances between two neighboring measurements may than indicate a distortion of the data. For the heuristic definition of a large distance
1 * \sigma
has been been chosen.
Note, that the plots are not deterministic, because they use ggplot2::geom_jitter.
Usage
acc_robust_univariate_outlier(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
exclude_roles,
n_rules = length(unique(criteria)),
max_non_outliers_plot = 10000,
criteria = c("tukey", "3sd", "hubert", "sigmagap"),
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the continuous measurement variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
exclude_roles |
variable roles a character (vector) of variable roles not included |
n_rules |
integer from=1 to=4. the no. rules that must be violated to flag a variable as containing outliers. The default is 4, i.e. all. |
max_non_outliers_plot |
integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic. |
criteria |
set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Hint: The function is designed for unimodal data only.
Value
a list with:
-
SummaryTable
:data.frame
with the columnsVariables
,Mean
,SD
,Median
,Skewness
,Tukey (N)
,3SD (N)
,Hubert (N)
,Sigma-gap (N)
,NUM_acc_ud_outlu
,Outliers, low (N)
,Outliers, high (N)
Grading
-
SummaryData
:data.frame
with the columnsVariables
,Mean
,SD
,Median
,Skewness
,Tukey (N)
,3SD (N)
,Hubert (N)
,Sigma-gap (N)
,Outliers (N)
,Outliers, low (N)
,Outliers, high (N)
-
SummaryPlotList
:ggplot2::ggplot
univariate outlier plots
-
ALGORITHM OF THIS IMPLEMENTATION:
Select all variables of type float in the study data
Remove missing codes from the study data (if defined in the metadata)
Remove measurements deviating from limits defined in the metadata
Identify outliers according to the approaches of Tukey (Tukey 1977), 3SD (Saleem et al. 2021), Hubert (Hubert and Vandervieren 2008), and SigmaGap (heuristic)
An output data frame is generated which indicates the no. possible outliers, the direction of deviations (Outliers, low; Outliers, high) for all methods and a summary score which sums up the deviations of the different rules
A scatter plot is generated for all examined variables, flagging observations according to the no. violated rules (step 5).
See Also
Compare observed versus expected distributions
Description
This implementation contrasts the empirical distribution of a measurement variables against assumed distributions. The approach is adapted from the idea of rootograms (Tukey 1977) which is also applicable for count data (Kleiber and Zeileis 2016).
Usage
acc_shape_or_scale(
resp_vars,
study_data,
label_col,
item_level = "item_level",
dist_col,
guess,
par1,
par2,
end_digits,
flip_mode = "noflip",
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable the name of the continuous measurement variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
dist_col |
variable attribute the name of the variable attribute in meta_data that provides the expected distribution of a study variable |
guess |
logical estimate parameters |
par1 |
numeric first parameter of the distribution if applicable |
par2 |
numeric second parameter of the distribution if applicable |
end_digits |
logical internal use. check for end digits preferences |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
a list with:
-
ResultData
: data.frame underlying the plot -
SummaryPlot
: ggplot2::ggplot2 probability distribution plot -
SummaryTable
: data.frame with the columnsVariables
andFLG_acc_ud_shape
ALGORITHM OF THIS IMPLEMENTATION:
This implementation is restricted to data of type float or integer.
Missing codes are removed from resp_vars (if defined in the metadata)
The user must specify the column of the metadata containing probability distribution (currently only: normal, uniform, gamma)
Parameters of each distribution can be estimated from the data or are specified by the user
A histogram-like plot contrasts the empirical vs. the technical distribution
See Also
Identify univariate outliers by four different approaches
Description
A classical but still popular approach to detect univariate outlier is the
boxplot method introduced by Tukey 1977. The boxplot is a simple graphical
tool to display information about continuous univariate data (e.g., median,
lower and upper quartile). Outliers are defined as values deviating more
than 1.5 \times IQR
from the 1st (Q25) or 3rd (Q75) quartile. The
strength of Tukey's method is that it makes no distributional assumptions
and thus is also applicable to skewed or non mound-shaped data
Marsh and Seo, 2006. Nevertheless, this method tends to identify frequent
measurements which are falsely interpreted as true outliers.
A somewhat more conservative approach in terms of symmetric and/or normal
distributions is the 3SD approach, i.e. any measurement not in
the interval of mean(x) +/- 3 * \sigma
is considered an outlier.
Both methods mentioned above are not ideally suited to skewed distributions.
As many biomarkers such as laboratory measurements represent in skewed
distributions the methods above may be insufficient. The approach of Hubert
and Vandervieren 2008 adjusts the boxplot for the skewness of the
distribution. This approach is implemented in several R packages such as
robustbase::mc
which is used in this implementation of dataquieR
.
Another completely heuristic approach is also included to identify outliers. The approach is based on the assumption that the distances between measurements of the same underlying distribution should homogeneous. For comprehension of this approach:
consider an ordered sequence of all measurements.
between these measurements all distances are calculated.
the occurrence of larger distances between two neighboring measurements may than indicate a distortion of the data. For the heuristic definition of a large distance
1 * \sigma
has been been chosen.
Note, that the plots are not deterministic, because they use ggplot2::geom_jitter.
Usage
acc_univariate_outlier(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
exclude_roles,
n_rules = length(unique(criteria)),
max_non_outliers_plot = 10000,
criteria = c("tukey", "3sd", "hubert", "sigmagap"),
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the continuous measurement variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
exclude_roles |
variable roles a character (vector) of variable roles not included |
n_rules |
integer from=1 to=4. the no. rules that must be violated to flag a variable as containing outliers. The default is 4, i.e. all. |
max_non_outliers_plot |
integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic. |
criteria |
set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Hint: The function is designed for unimodal data only.
Value
a list with:
-
SummaryTable
:data.frame
with the columnsVariables
,Mean
,SD
,Median
,Skewness
,Tukey (N)
,3SD (N)
,Hubert (N)
,Sigma-gap (N)
,NUM_acc_ud_outlu
,Outliers, low (N)
,Outliers, high (N)
Grading
-
SummaryData
:data.frame
with the columnsVariables
,Mean
,SD
,Median
,Skewness
,Tukey (N)
,3SD (N)
,Hubert (N)
,Sigma-gap (N)
,Outliers (N)
,Outliers, low (N)
,Outliers, high (N)
-
SummaryPlotList
:ggplot2::ggplot
univariate outlier plots
-
ALGORITHM OF THIS IMPLEMENTATION:
Select all variables of type float in the study data
Remove missing codes from the study data (if defined in the metadata)
Remove measurements deviating from limits defined in the metadata
Identify outliers according to the approaches of Tukey (Tukey 1977), 3SD (Saleem et al. 2021), Hubert (Hubert and Vandervieren 2008), and SigmaGap (heuristic)
An output data frame is generated which indicates the no. possible outliers, the direction of deviations (Outliers, low; Outliers, high) for all methods and a summary score which sums up the deviations of the different rules
A scatter plot is generated for all examined variables, flagging observations according to the no. violated rules (step 5).
See Also
Utility function to compute model-based ICC depending on the (statistical) data type
Description
This function is still under construction. It is designed to run for any statistical data type as follows:
Variables with only two distinct values will be modeled by mixed effects logistic regression.
Nominal variables will be transformed to binary variables. This can be user-specified using the metadata columns
RECODE_CASES
and/orRECODE_CONTROL
. Otherwise, the most frequent category will be assigned to cases and the remaining categories to control. As for other binary variables, the ICC will be computed using a mixed effects logistic regression.Ordinal variables will be analyzed by linear mixed effects models, if every level of the variable has at least as many observations as specified in the argument
cut_off_linear_model_for_ord
. Otherwise, the data will be modeled by a mixed effects ordered regression, if the packageordinal
is available.Metric variables with integer values are analyzed by linear mixed effects models.
For variables with data type
float
, the existing implementationacc_varcomp
is called, which also uses linear mixed effects models.
Usage
acc_varcomp(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
study_data,
label_col,
item_level = "item_level",
min_obs_in_subgroup = 10,
min_subgroups = 5,
cut_off_linear_model_for_ord = 10,
threshold_value = lifecycle::deprecated(),
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable the name of the measurement variable |
group_vars |
variable the name of the examiner, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex, for adjustment |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is
required to include a subgroup (level) of the
|
min_subgroups |
integer from=0. This optional argument specifies
the minimum number of subgroups (level) of the
|
cut_off_linear_model_for_ord |
integer from=0. This optional argument
specifies the minimum number of observations for
individual levels of an ordinal outcome
( |
threshold_value |
Deprecated. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Not yet described
Value
The function returns two data frames, 'SummaryTable' and 'SummaryData', that differ only in the names of the columns.
Convert a full dataquieR
report to a data.frame
Description
Deprecated
Usage
## S3 method for class 'dataquieR_resultset'
as.data.frame(x, ...)
Arguments
x |
Deprecated |
... |
Deprecated |
Value
Deprecated
Convert a full dataquieR
report to a list
Description
Deprecated
Usage
## S3 method for class 'dataquieR_resultset'
as.list(x, ...)
Arguments
x |
Deprecated |
... |
Deprecated |
Value
Deprecated
inefficient way to convert a report to a list. try prep_set_backend()
Description
inefficient way to convert a report to a list. try prep_set_backend()
Usage
## S3 method for class 'dataquieR_resultset2'
as.list(x, ...)
Arguments
x |
|
... |
no used |
Value
Data frame with contradiction rules
Description
Two versions exist, the newer one is used by con_contradictions_redcap and is described here., the older one used by con_contradictions is described here.
See Also
Summarize missingness columnwise (in variable)
Description
Item-Missingness (also referred to as item nonresponse (De Leeuw et al. 2003)) describes the missingness of single values, e.g. blanks or empty data cells in a data set. Item-Missingness occurs for example in case a respondent does not provide information for a certain question, a question is overlooked by accident, a programming failure occurs or a provided answer were missed while entering the data.
Usage
com_item_missingness(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
show_causes = TRUE,
cause_label_df,
include_sysmiss = TRUE,
threshold_value,
suppressWarnings = FALSE,
assume_consistent_codes = TRUE,
expand_codes = assume_consistent_codes,
drop_levels = FALSE,
expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
pretty_print = lifecycle::deprecated(),
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
show_causes |
logical if TRUE, then the distribution of missing codes is shown |
cause_label_df |
data.frame missing code table. If missing codes have labels the respective data frame can be specified here or in the metadata as assignments, see cause_label_df |
include_sysmiss |
logical Optional, if TRUE system missingness (NAs) is evaluated in the summary plot |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100 |
suppressWarnings |
logical warn about consistency issues with missing and jump lists |
assume_consistent_codes |
logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code are treated as being be the same for all variables. |
expand_codes |
logical if TRUE, code labels are copied from other variables, if the code is the same and the label is set somewhere |
drop_levels |
logical if TRUE, do not display unused missing codes in the figure legend. |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. If ALL, all
observations are expected to comprise
all study segments. If SEGMENT, the
|
pretty_print |
logical deprecated. If you want to have a human
readable output, use |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
a list with:
-
SummaryTable
: data frame about item missingness per response variable -
SummaryData
: data frame about item missingness per response variable formatted for user -
SummaryPlot
: ggplot2 heatmap plot, if show_causes was TRUE -
ReportSummaryTable
: data frame underlyingSummaryPlot
ALGORITHM OF THIS IMPLEMENTATION:
Lists of missing codes and, if applicable, jump codes are selected from the metadata
The no. of system missings (NA) in each variable is calculated
The no. of used missing codes is calculated for each variable
The no. of used jump codes is calculated for each variable
Two result dataframes (1: on the level of observations, 2: a summary for each variable) are generated
-
OPTIONAL: if
show_causes
is selected, one summary plot for allresp_vars
is provided
See Also
Compute Indicators for Qualified Item Missingness
Description
Usage
com_qualified_item_missingness(
resp_vars,
study_data,
label_col = NULL,
item_level = "item_level",
expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. Report the
number of observations expected using
the old |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
A list with:
-
SummaryTable
: data.frame containing data quality checks for "Non-response rate" (PCT_com_qum_nonresp
) and "Refusal rate" (PCT_com_qum_refusal
) for each response variable inresp_vars
. -
SummaryData
: a data.frame containing data quality checks for “Non-response rate” and "Refusal rate" for a report
Examples
## Not run:
prep_load_workbook_like_file("inst/extdata/Metadata_example_v3-6.xlsx")
clean <- prep_get_data_frame("item_level")
clean <- subset(clean, `Metadata name` == "Example" &
!dataquieR:::util_empty(VAR_NAMES))
clean$`Metadata name` <- NULL
clean[, "MISSING_LIST_TABLE"] <- "missing_matchtable1"
prep_add_data_frames(item_level = clean)
clean <- prep_get_data_frame("missing_matchtable1")
clean <- clean[clean$`Metadata name` == "Example", , FALSE]
clean <-
clean[suppressWarnings(as.character(as.integer(clean$CODE_VALUE)) ==
as.character(clean$CODE_VALUE)), , FALSE]
clean$CODE_VALUE <- as.integer(clean$CODE_VALUE)
clean <- clean[!is.na(clean$`Metadata name`), , FALSE]
clean$`Metadata name` <- NULL
prep_add_data_frames(missing_matchtable1 = clean)
ship <- prep_get_data_frame("ship")
number_of_mis <- ceiling(nrow(ship) / 20)
resp_vars <- sample(colnames(ship), ceiling(ncol(ship) / 20), FALSE)
mistab <- prep_get_data_frame("missing_matchtable1")
valid_replacement_codes <-
mistab[mistab$CODE_INTERPRET != "I", CODE_VALUE,
drop =
TRUE] # sample only replacement codes on item level. I uses the actual
# values
for (rv in resp_vars) {
values <- sample(as.numeric(valid_replacement_codes), number_of_mis,
replace = TRUE)
if (inherits(ship[[rv]], "POSIXct")) {
values <- as.POSIXct(values, origin = min(as.POSIXct(Sys.Date()), 0))
}
ship[sample(seq_len(nrow(ship)), number_of_mis, replace = FALSE), rv] <-
values
}
com_qualified_item_missingness(resp_vars = NULL, ship, "item_level", LABEL)
com_qualified_item_missingness(resp_vars = "Diabetes Age onset", ship,
"item_level", LABEL)
com_qualified_item_missingness(resp_vars = NULL, "study_data", "meta_data",
LABEL)
study_data <- ship
meta_data <- prep_get_data_frame("item_level")
label <- LABEL
## End(Not run)
Compute Indicators for Qualified Segment Missingness
Description
Usage
com_qualified_segment_missingness(
label_col = NULL,
study_data,
item_level = "item_level",
expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
meta_data = item_level,
meta_data_v2,
meta_data_segment,
segment_level
)
Arguments
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. Report the
number of observations expected using
the old |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
meta_data_segment |
data.frame Segment level metadata |
segment_level |
data.frame alias for |
Value
A list with:
-
SegmentTable
: data.frame containing data quality checks for "Non-response rate" (PCT_com_qum_nonresp
) and "Refusal rate" (PCT_com_qum_refusal
) for each segment. -
SegmentData
: a data.frame containing data quality checks for "Unexpected location" and "Unexpected proportion" per segment for a report
Summarizes missingness for individuals in specific segments
Description
This implementation can be applied in two use cases:
participation in study segments is not recorded by respective variables, e.g. a participant's refusal to attend a specific examination is not recorded.
participation in study segments is recorded by respective variables.
Use case (1) will be common in smaller studies. For the calculation of segment missingness it is assumed that study variables are nested in respective segments. This structure must be specified in the static metadata. The R-function identifies all variables within each segment and returns TRUE if all variables within a segment are missing, otherwise FALSE.
Use case (2) assumes a more complex structure of study data and metadata.
The study data comprise so-called intro-variables (either TRUE/FALSE or codes
for non-participation). The column PART_VAR
in the metadata is
filled by variable-IDs indicating for each variable the respective
intro-variable. This structure has the benefit that subsequent calculation of
item missingness obtains correct denominators for the calculation of
missingness rates.
Usage
com_segment_missingness(
study_data,
item_level = "item_level",
strata_vars = NULL,
group_vars = NULL,
label_col,
threshold_value,
direction,
color_gradient_direction,
expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
exclude_roles = c(VARIABLE_ROLES$PROCESS),
meta_data = item_level,
meta_data_v2,
segment_level,
meta_data_segment
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
strata_vars |
variable the name of a variable used for stratification, defaults to NULL for not grouping output |
group_vars |
variable the name of a variable used for grouping, defaults to NULL for not grouping output |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100 |
direction |
enum low | high. "high" or "low", i.e. are deviations above/below the threshold critical. This argument is deprecated and replaced by color_gradient_direction. |
color_gradient_direction |
enum above | below. "above" or "below", i.e. are deviations above or below the threshold critical? (default: above) |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. If ALL, all
observations are expected to comprise
all study segments. If SEGMENT, the
|
exclude_roles |
variable roles a character (vector) of variable roles not included |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
segment_level |
data.frame alias for |
meta_data_segment |
data.frame Segment level metadata. Optional. |
Details
Implementation and use of thresholds
This implementation uses one threshold to discriminate critical from non-critical values. If direction is above than all values below the threshold_value are normal (displayed in dark blue in the plot and flagged with GRADING = 0 in the dataframe). All values above the threshold_value are considered critical. The more they deviate from the threshold the displayed color shifts to dark red. All critical values are highlighted with GRADING = 1 in the summary data frame. By default, highest values are always shown in dark red irrespective of the absolute deviation.
If direction is below than all values above the threshold_value are normal (displayed in dark blue, GRADING = 0).
Hint
This function does not support a resp_vars
argument but exclude_roles
to
specify variables not relevant for detecting a missing segment.
List function.
Value
a list with:
-
ResultData
: data frame about segment missingness -
SummaryPlot
: ggplot2 heatmap plot: a heatmap-like graphic that highlights critical values depending on the respective threshold_value and direction. -
ReportSummaryTable
: data frame underlyingSummaryPlot
See Also
Counts all individuals with no measurements at all
Description
This implementation examines a crude version of unit missingness or unit-nonresponse (Kalton and Kasprzyk 1986), i.e. if all measurement variables in the study data are missing for an observation it has unit missingness.
The function can be applied on stratified data. In this case strata_vars must be specified.
Usage
com_unit_missingness(
id_vars = NULL,
strata_vars = NULL,
label_col,
study_data,
item_level = "item_level",
meta_data = item_level,
meta_data_v2
)
Arguments
id_vars |
variable list optional, a (vectorized) call of ID-variables that should not be considered in the calculation of unit- missingness |
strata_vars |
variable optional, a string or integer variable used for stratification |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
This implementations calculates a crude rate of unit-missingness. This type of missingness may have several causes and is an important research outcome. For example, unit-nonresponse may be selective regarding the targeted study population or technical reasons such as record-linkage may cause unit-missingness.
It has to be discriminated form segment and item missingness, since different causes and mechanisms may be the reason for unit-missingness.
Hint
This function does not support a resp_vars
argument but id_vars
, which
have a roughly inverse logic behind: id_vars with values do not prevent a row
from being considered missing, because an ID is the only hint for a unit that
elsewise would not occur in the data at all.
List function.
Value
A list with:
-
FlaggedStudyData
: data.frame with id-only-rows flagged in a columnUnit_missing
-
SummaryData
: data.frame with numbers and percentages of unit missingness
See Also
Checks user-defined contradictions in study data
Description
This approach considers a contradiction if impossible combinations of data are observed in one participant. For example, if age of a participant is recorded repeatedly the value of age is (unfortunately) not able to decline. Most cases of contradictions rest on comparison of two variables.
Important to note, each value that is used for comparison may represent a possible characteristic but the combination of these two values is considered to be impossible. The approach does not consider implausible or inadmissible values.
Usage
con_contradictions(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
threshold_value,
check_table,
summarize_categories = FALSE,
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100 |
check_table |
data.frame contradiction rules table. Table defining contradictions. See details for its required structure. |
summarize_categories |
logical Needs a column 'tag' in the
|
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Algorithm of this implementation:
Select all variables in the data with defined contradiction rules (static metadata column CONTRADICTIONS)
Remove missing codes from the study data (if defined in the metadata)
Remove measurements deviating from limits defined in the metadata
Assign label to levels of categorical variables (if applicable)
Apply contradiction checks on predefined sets of variables
Identification of measurements fulfilling contradiction rules. Therefore two output data frames are generated:
on the level of observation to flag each contradictory value combination, and
a summary table for each contradiction check.
A summary plot illustrating the number of contradictions is generated.
List function.
Value
If summarize_categories
is FALSE
:
A list with:
-
FlaggedStudyData
: The first output of the contradiction function is a data frame of similar dimension regarding the number of observations in the study data. In addition, for each applied check on the variables an additional column is added which flags observations with a contradiction given the applied check. -
SummaryTable
: The second output summarizes this information into one data frame. This output can be used to provide an executive overview on the amount of contradictions. This output is meant for automatic digestion within pipelines. -
SummaryData
: The third output is the same asSummaryTable
but for human readers. -
SummaryPlot
: The fourth output visualizes summarized information ofSummaryData
.
if summarize_categories
is TRUE
, other objects are returned:
one per category named by that category (e.g. "Empirical") containing a
result for contradictions within that category only. Additionally, in the
slot all_checks
a result as it would have been returned with
summarize_categories
set to FALSE
. Finally, a slot SummaryData
is
returned containing sums per Category and an according ggplot2::ggplot in
SummaryPlot
.
See Also
Checks user-defined contradictions in study data
Description
This approach considers a contradiction if impossible combinations of data are observed in one participant. For example, if age of a participant is recorded repeatedly the value of age is (unfortunately) not able to decline. Most cases of contradictions rest on comparison of two variables.
Important to note, each value that is used for comparison may represent a possible characteristic but the combination of these two values is considered to be impossible. The approach does not consider implausible or inadmissible values.
Usage
con_contradictions_redcap(
study_data,
item_level = "item_level",
label_col,
threshold_value,
meta_data_cross_item = "cross-item_level",
use_value_labels,
summarize_categories = FALSE,
meta_data = item_level,
cross_item_level,
`cross-item_level`,
meta_data_v2
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100 |
meta_data_cross_item |
data.frame contradiction rules table. Table defining contradictions. See online documentation for its required structure. |
use_value_labels |
logical Deprecated in favor of DATA_PREPARATION.
If set to |
summarize_categories |
logical Needs a column |
meta_data |
data.frame old name for |
cross_item_level |
data.frame alias for |
meta_data_v2 |
character path to workbook like metadata file, see
|
`cross-item_level` |
data.frame alias for |
Details
Algorithm of this implementation:
Remove missing codes from the study data (if defined in the metadata)
Remove measurements deviating from limits defined in the metadata
Assign label to levels of categorical variables (if applicable)
Apply contradiction checks (given as
REDCap
-like rules in a separate metadata table)Identification of measurements fulfilling contradiction rules. Therefore two output data frames are generated:
on the level of observation to flag each contradictory value combination, and
a summary table for each contradiction check.
A summary plot illustrating the number of contradictions is generated.
List function.
Value
If summarize_categories
is FALSE
:
A list with:
-
FlaggedStudyData
: The first output of the contradiction function is a data frame of similar dimension regarding the number of observations in the study data. In addition, for each applied check on the variables an additional column is added which flags observations with a contradiction given the applied check. -
VariableGroupData
: The second output summarizes this information into one data frame. This output can be used to provide an executive overview on the amount of contradictions. -
VariableGroupTable
: A subset ofVariableGroupData
used within the pipeline. -
SummaryPlot
: The third output visualizes summarized information ofSummaryData
.
If summarize_categories
is TRUE
, other objects are returned:
A list with one element Other
, a list with the following entries:
One per category named by that category (e.g. "Empirical") containing a
result for contradiction checks within that category only. Additionally, in the
slot all_checks
, a result as it would have been returned with
summarize_categories
set to FALSE
. Finally, in
the top-level list, a slot SummaryData
is
returned containing sums per Category and an according ggplot2::ggplot in
SummaryPlot
.
See Also
Online Documentation for the function meta_data_cross Online Documentation for the required cross-item-level metadata
Detects variable levels not specified in metadata
Description
For each categorical variable, value lists should be defined in the metadata. This implementation will examine, if all observed levels in the study data are valid.
Usage
con_inadmissible_categorical(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
threshold_value = 0,
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Algorithm of this implementation:
Remove missing codes from the study data (if defined in the metadata)
Interpretation of variable specific VALUE_LABELS as supplied in the metadata.
Identification of measurements not corresponding to the expected categories. Therefore two output data frames are generated:
on the level of observation to flag each undefined category, and
a summary table for each variable.
Values not corresponding to defined categories are removed in a data frame of modified study data
Value
a list with:
-
SummaryData
: data frame summarizing inadmissible categories with the columns:-
Variables
: variable name/label -
OBSERVED_CATEGORIES
: the categories observed in the study data -
DEFINED_CATEGORIES
: the categories defined in the metadata -
NON_MATCHING
: the categories observed but not defined -
NON_MATCHING_N
: the number of observations with categories not defined -
NON_MATCHING_N_PER_CATEGORY
: the number of observations for each of the unexpected categories
-
-
SummaryTable
: data frame for thedataquieR
pipeline reporting the number and percentage of inadmissible categorical values -
ModifiedStudyData
: study data having inadmissible categories removed -
FlaggedStudyData
: study data having cases with inadmissible categories flagged
See Also
Detects variable levels not specified in standardized vocabulary
Description
For each categorical variable, value lists should be defined in the metadata. This implementation will examine, if all observed levels in the study data are valid.
Usage
con_inadmissible_vocabulary(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
threshold_value = 0,
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Algorithm of this implementation:
Remove missing codes from the study data (if defined in the metadata)
Interpretation of variable specific VALUE_LABELS as supplied in the metadata.
Identification of measurements not corresponding to the expected categories. Therefore two output data frames are generated:
on the level of observation to flag each undefined category, and
a summary table for each variable.
Values not corresponding to defined categories are removed in a data frame of modified study data
Value
a list with:
-
SummaryData
: data frame summarizing inadmissible categories with the columns:-
Variables
: variable name/label -
OBSERVED_CATEGORIES
: the categories observed in the study data -
DEFINED_CATEGORIES
: the categories defined in the metadata -
NON_MATCHING
: the categories observed but not defined -
NON_MATCHING_N
: the number of observations with categories not defined -
NON_MATCHING_N_PER_CATEGORY
: the number of observations for each of the unexpected categories -
GRADING
: indicator TRUE/FALSE if inadmissible categorical values were observed (more than indicated by thethreshold_value
)
-
-
SummaryTable
: data frame for thedataquieR
pipeline reporting the number and percentage of inadmissible categorical values -
ModifiedStudyData
: study data having inadmissible categories removed -
FlaggedStudyData
: study data having cases with inadmissible categories flagged
See Also
Examples
## Not run:
sdt <- data.frame(DIAG = c("B050", "B051", "B052", "B999"),
MED0 = c("S01XA28", "N07XX18", "ABC", NA), stringsAsFactors = FALSE)
mdt <- tibble::tribble(
~ VAR_NAMES, ~ DATA_TYPE, ~ STANDARDIZED_VOCABULARY_TABLE, ~ SCALE_LEVEL, ~ LABEL,
"DIAG", "string", "<ICD10>", "nominal", "Diagnosis",
"MED0", "string", "<ATC>", "nominal", "Medication"
)
con_inadmissible_vocabulary(NULL, sdt, mdt, label_col = LABEL)
prep_load_workbook_like_file("meta_data_v2")
il <- prep_get_data_frame("item_level")
il$STANDARDIZED_VOCABULARY_TABLE[[11]] <- "<ICD10GM>"
il$DATA_TYPE[[11]] <- DATA_TYPES$INTEGER
il$SCALE_LEVEL[[11]] <- SCALE_LEVELS$NOMINAL
prep_add_data_frames(item_level = il)
r <- dq_report2("study_data", dimensions = "con")
r <- dq_report2("study_data", dimensions = "con",
advanced_options = list(dataquieR.non_disclosure = TRUE))
r
## End(Not run)
Detects variable values exceeding limits defined in metadata
Description
Inadmissible numerical values can be of type integer or float. This implementation requires the definition of intervals in the metadata to examine the admissibility of numerical study data.
This helps identify inadmissible measurements according to hard limits (for multiple variables).
Usage
con_limit_deviations(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
limits = NULL,
flip_mode = "noflip",
return_flagged_study_data = FALSE,
return_limit_categorical = TRUE,
meta_data = item_level,
meta_data_v2,
show_obs = TRUE
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
limits |
enum HARD_LIMITS | SOFT_LIMITS | DETECTION_LIMITS. what limits from metadata to check for |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
return_flagged_study_data |
logical return |
return_limit_categorical |
logical if TRUE return limit deviations also for categorical variables |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
show_obs |
logical Should (selected) individual observations be marked in the figure for continuous variables? |
Details
Algorithm of this implementation:
Remove missing codes from the study data (if defined in the metadata)
Interpretation of variable specific intervals as supplied in the metadata.
Identification of measurements outside defined limits. Therefore two output data frames are generated:
on the level of observation to flag each deviation, and
a summary table for each variable.
A list of plots is generated for each variable examined for limit deviations. The histogram-like plots indicate respective limits as well as deviations.
Values exceeding limits are removed in a data frame of modified study data
Value
a list with:
-
FlaggedStudyData
data.frame related to the study data by a 1:1 relationship, i.e. for each observation is checked whether the value is below or above the limits. Optional, seereturn_flagged_study_data
. -
SummaryTable
data.frame summarizing limit deviations for each variable. -
SummaryData
data.frame summarizing limit deviations for each variable for a report. -
SummaryPlotList
list of ggplot2::ggplots The plots for each variable are either a histogram (continuous) or a barplot (discrete). -
ReportSummaryTable
: heatmap-like data frame about limit violations
See Also
contradiction_functions
Description
Detect abnormalities help functions
Usage
contradiction_functions
Format
An object of class list
of length 11.
Details
2 variables:
-
A_not_equal_B
, ifA != B
-
A_greater_equal_B
, ifA >= B
-
A_greater_than_B
, ifA > B
-
A_less_than_B
, ifA < B
-
A_less_equal_B
, ifA <= B
-
A_present_not_B
, ifA & is.na(B)
-
A_present_and_B
, ifA & !(is.na(B))
-
A_present_and_B_levels
, ifA & B %in% {set of levels}
-
A_levels_and_B_gt_value
, ifA %in% {set of levels} & B > value
-
A_levels_and_B_lt_value
, ifA %in% {set of levels} & B < value
-
A_levels_and_B_levels
, ifA %in% {set of levels} & B %in% {set of levels}
description of the contradiction functions
Description
description of the contradiction functions
Usage
contradiction_functions_descriptions
Format
An object of class list
of length 11.
Log Level
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Add stack-trace in condition messages (to be deprecated)
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Metadata describes more than the current study data
Description
-
none
: no check will be provided about the match of variables and records available in the study data and described in the metadata -
exact
: There must be a 1:1 match between the study data and metadata regarding data frames and segments variables and records -
subset_u
: study data are a subset of metadata. All variables from the study data are expected to be present in the metadata, but one or more variables in the metadata are not expected to be present in the study data. In this case a variable present in the study data but not in the metadata would produce an issue. -
subset_m
: metadata are a subset of study data. All variables in the metadata are expected to be present in the study data, but one or more variables in the study data are not expected to be present in the metadata.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Set caller for error conditions (to be deprecated)
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Enable to switch to a general additive model instead of LOESS
Description
If this option is set to TRUE
, time course plots will use general additive
models (GAM) instead of LOESS when the number of observations exceeds a
specified threshold. LOESS computations for large datasets have a high
memory consumption.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Maximum length for variable labels
Description
All variable labels will be shortened to fit this maximum length. Cannot be larger than 200 for technical reasons.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Maximum length for value labels
Description
value labels are restricted to this length
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Set caller for message conditions (to be deprecated)
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Default availability of multivariate outlier checks in reports
Description
can be
-
TRUE
: forcross-item_level
-groups withMULTIVARIATE_OUTLIER_CHECK
empty, do a multivariate outlier check -
FALSE
: forcross-item_level
-groups withMULTIVARIATE_OUTLIER_CHECK
empty, don't do a multivariate outlier check -
"auto"
: forcross-item_level
-groups withMULTIVARIATE_OUTLIER_CHECK
empty, do multivariate outlier checks, if there is no entry in the column CONTRADICTION_TERM.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Assume, all VALUE_LABELS are HTML escaped
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Set caller for warning conditions (to be deprecated)
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Exclude subgroups with constant values from LOESS figure
Description
If this option is set to TRUE
, time course plots will only show subgroups
with more than one distinct value. This might improve the readability of
the figure.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Display time-points in LOESS plots
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Lower limit for the LOESS bandwidth
Description
The value should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Lower limit for the proportion of cases or controls to create a smoothed time trend figure
Description
The value should be greater than 0 and less than 0.4. If the proportion of cases or controls is lower than the specified value, the LOESS figure will not be created for the specified binary outcome.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
default for Plot-Format in acc_loess()
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Display observations in LOESS plots
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Include number of observations for each level of the grouping variable in the 'margins' figure
Description
If this option is set to FALSE
, the figures created by acc_margins
will
not include the number of observations for each level of the grouping
variable. This can be used to obtain clean static plots.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Sort levels of the grouping variable in the 'margins' figures
Description
If this option is set to TRUE
, the levels of the grouping variable in the
figure are sorted in descending order according to the number of
observations so that levels with more observations are easier to identify.
Otherwise, the original order of the levels is retained.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Apply min-max scaling in parallel coordinates figure to inspect multivariate outliers
Description
boolean, TRUE or FALSE
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Color for empirical contradictions
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Color for logical contradictions
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Call browser()
on errors
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Removal of hard limits from data before calculating descriptive statistics.
Description
can be
-
TRUE
: values outside hard limits will be removed from the data before calculating descriptive statistics -
FALSE
: values outside hard limits will not be removed from the original data
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Disable automatic post-processing of dataquieR
function results
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Try to avoid fallback to string columns when reading files
Description
If a file does not feature column data types ore features data types cell-based, choose that type which matches the majority of the sampled cells of a column for the column's data type.
Details
This may make you miss data type problems but it could fix them, so
prep_get_data_frame()
works better.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Flip-Mode to Use for figures
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Converting MISSING_LIST/JUMP_LIST to a MISSING_LIST_TABLE create on list per item
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Control, how the label_col
argument is used.
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Name of the data.frame featuring a format for grading-values
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Name of the data.frame featuring GRADING_RULESET
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Control, if dataquieR
tries to guess missing-codes from the study data in absence of metadata
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Language-Suffix for metadata Label-Columns
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Maximum number of levels of the grouping variable shown individually in figures
Description
If there are more examiners or devices than can be shown individually, they will be collapsed into "other".
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Maximum number of levels of the grouping variable shown with individual histograms ('violins') in 'margins' figures
Description
If there are more examiners or devices, the figure will be reduced to box-plots to save space.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Minimum number of observations per grouping variable that is required to include an individual level of the grouping variable in a figure
Description
Levels of the grouping variable with fewer observations than specified here will be excluded from the figure.
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Remove all observation-level-real-data from reports
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
function to call on progress increase
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
function to call on progress message update
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Number of levels to consider a variable ordinal in absence of SCALE_LEVEL
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_metriclevels
,
dataquieR.testdebug
Number of levels to consider a variable metric in absence of SCALE_LEVEL
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.testdebug
Disable all interactively used metadata-based function argument provision
Description
TODO
See Also
Other options:
dataquieR
,
dataquieR.CONDITIONS_LEVEL_TRHESHOLD
,
dataquieR.CONDITIONS_WITH_STACKTRACE
,
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
,
dataquieR.ERRORS_WITH_CALLER
,
dataquieR.GAM_for_LOESS
,
dataquieR.MAX_LABEL_LEN
,
dataquieR.MAX_VALUE_LABEL_LEN
,
dataquieR.MESSAGES_WITH_CALLER
,
dataquieR.MULTIVARIATE_OUTLIER_CHECK
,
dataquieR.VALUE_LABELS_htmlescaped
,
dataquieR.WARNINGS_WITH_CALLER
,
dataquieR.acc_loess.exclude_constant_subgroups
,
dataquieR.acc_loess.mark_time_points
,
dataquieR.acc_loess.min_bw
,
dataquieR.acc_loess.min_proportion
,
dataquieR.acc_loess.plot_format
,
dataquieR.acc_loess.plot_observations
,
dataquieR.acc_margins_num
,
dataquieR.acc_margins_sort
,
dataquieR.acc_multivariate_outlier.scale
,
dataquieR.col_con_con_empirical
,
dataquieR.col_con_con_logical
,
dataquieR.debug
,
dataquieR.des_summary_hard_lim_remove
,
dataquieR.dontwrapresults
,
dataquieR.fix_column_type_on_read
,
dataquieR.flip_mode
,
dataquieR.force_item_specific_missing_codes
,
dataquieR.force_label_col
,
dataquieR.grading_formats
,
dataquieR.grading_rulesets
,
dataquieR.guess_missing_codes
,
dataquieR.lang
,
dataquieR.max_group_var_levels_in_plot
,
dataquieR.max_group_var_levels_with_violins
,
dataquieR.min_obs_per_group_var_in_plot
,
dataquieR.non_disclosure
,
dataquieR.progress_fkt
,
dataquieR.progress_msg_fkt
,
dataquieR.scale_level_heuristics_control_binaryrecodelimit
,
dataquieR.scale_level_heuristics_control_metriclevels
Internal constructor for the internal class dataquieR_resultset.
Description
creates an object of the class dataquieR_resultset.
Usage
dataquieR_resultset(...)
Arguments
... |
properties stored in the object |
Details
The class features the following methods:
-
as.data.frame.dataquieR_resultset, * as.list.dataquieR_resultset, * print.dataquieR_resultset, * summary.dataquieR_resultset
Value
an object of the class dataquieR_resultset.
See Also
Class dataquieR_resultset2.
Description
Class dataquieR_resultset2.
See Also
Verify an object of class dataquieR_resultset
Description
Deprecated
Usage
dataquieR_resultset_verify(...)
Arguments
... |
Deprecated |
Value
Deprecated
Compute Pairwise Correlations
Description
works on variable groups (cross-item_level
), which are expected to show
a Pearson correlation
Usage
des_scatterplot_matrix(
label_col,
study_data,
item_level = "item_level",
meta_data_cross_item = "cross-item_level",
meta_data = item_level,
meta_data_v2,
cross_item_level,
`cross-item_level`
)
Arguments
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data_cross_item |
|
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
cross_item_level |
data.frame alias for |
`cross-item_level` |
data.frame alias for |
Details
Descriptor # TODO: This can be an indicator
Value
a list
with the slots:
-
SummaryPlotList
: for each variable group a ggplot2::ggplot object with pairwise correlation plots -
SummaryData
: table with columnsVARIABLE_LIST
,cors
,max_cor
,min_cor
-
SummaryTable
: likeSummaryData
, but machine readable and with stable column names.
Examples
## Not run:
devtools::load_all()
prep_load_workbook_like_file("meta_data_v2")
des_scatterplot_matrix("study_data")
## End(Not run)
Compute Descriptive Statistics
Description
generates a descriptive overview of the variables in resp_vars
.
Usage
des_summary(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
hard_limits_removal = getOption("dataquieR.des_summary_hard_lim_remove",
dataquieR.des_summary_hard_lim_remove_default),
...
)
Arguments
resp_vars |
variable the name of the measurement variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
hard_limits_removal |
logical if TRUE values outside hard limits are removed from the data before calculating descriptive statistics. The default is FALSE |
... |
arguments to be passed to all called indicator functions if applicable. |
Details
TODO
Value
a list with:
-
SummaryTable
: data.frame -
SummaryData
: data.frame
See Also
Examples
## Not run:
prep_load_workbook_like_file("meta_data_v2")
xx <- des_summary(study_data = "study_data", meta_data =
prep_get_data_frame("item_level"))
util_html_table(xx$SummaryData)
util_html_table(des_summary(study_data = prep_get_data_frame("study_data"),
meta_data = prep_get_data_frame("item_level"))$SummaryData)
## End(Not run)
Compute Descriptive Statistics - categorical variables
Description
generates a descriptive overview of the categorical variables (nominal and
ordinal) in resp_vars
.
Usage
des_summary_categorical(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
hard_limits_removal = getOption("dataquieR.des_summary_hard_lim_remove",
dataquieR.des_summary_hard_lim_remove_default),
...
)
Arguments
resp_vars |
variable the name of the categorical measurement variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
hard_limits_removal |
logical if TRUE values outside hard limits are removed from the data before calculating descriptive statistics. The default is FALSE |
... |
arguments to be passed to all called indicator functions if applicable. |
Details
TODO
Value
a list with:
-
SummaryTable
: data.frame -
SummaryData
: data.frame
See Also
Examples
## Not run:
prep_load_workbook_like_file("meta_data_v2")
xx <- des_summary_categorical(study_data = "study_data", meta_data =
prep_get_data_frame("item_level"))
util_html_table(xx$SummaryData)
util_html_table(des_summary_categorical(study_data = prep_get_data_frame("study_data"),
meta_data = prep_get_data_frame("item_level"))$SummaryData)
## End(Not run)
Compute Descriptive Statistics - continuous variables
Description
generates a descriptive overview of continuous variables (ratio and interval) in resp_vars
.
Usage
des_summary_continuous(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
hard_limits_removal = getOption("dataquieR.des_summary_hard_lim_remove",
dataquieR.des_summary_hard_lim_remove_default),
...
)
Arguments
resp_vars |
variable the name of the continuous measurement variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
hard_limits_removal |
logical if TRUE values outside hard limits are removed from the data before calculating descriptive statistics. The default is FALSE |
... |
arguments to be passed to all called indicator functions if applicable. |
Details
TODO
Value
a list with:
-
SummaryTable
: data.frame -
SummaryData
: data.frame
See Also
Examples
## Not run:
prep_load_workbook_like_file("meta_data_v2")
xx <- des_summary_continuous(study_data = "study_data", meta_data =
prep_get_data_frame("item_level"))
util_html_table(xx$SummaryData)
util_html_table(des_summary_continuous(study_data = prep_get_data_frame("study_data"),
meta_data = prep_get_data_frame("item_level"))$SummaryData)
## End(Not run)
Get the dimensions of a dq_report2
result
Description
Get the dimensions of a dq_report2
result
Usage
## S3 method for class 'dataquieR_resultset2'
dim(x)
Arguments
x |
a |
Value
dimensions
Names of DQ dimensions
Description
a vector of data quality dimensions. The supported dimensions are Completeness, Consistency and Accuracy.
Usage
dimensions
Format
An object of class character
of length 3.
Value
Only a definition, not a function, so no return value
See Also
Names of a dataquieR
report object (v2.0)
Description
Names of a dataquieR
report object (v2.0)
Usage
## S3 method for class 'dataquieR_resultset2'
dimnames(x)
Arguments
x |
the result object |
Value
the names
Dimension Titles for Prefixes
Description
order does matter, because it defines the order in the dq_report2
.
Usage
dims
Format
An object of class character
of length 5.
See Also
Generate a full DQ report
Description
Deprecated
Usage
dq_report(...)
Arguments
... |
Deprecated |
Value
Deprecated
Generate a full DQ report, v2
Description
Generate a full DQ report, v2
Usage
dq_report2(
study_data,
item_level = "item_level",
label_col = LABEL,
meta_data_segment = "segment_level",
meta_data_dataframe = "dataframe_level",
meta_data_cross_item = "cross-item_level",
meta_data_item_computation = "item_computation_level",
meta_data = item_level,
meta_data_v2,
...,
dimensions = c("Completeness", "Consistency"),
cores = list(mode = "socket", logging = FALSE, cpus = util_detect_cores(),
load.balancing = TRUE),
specific_args = list(),
advanced_options = list(),
author = prep_get_user_name(),
title = "Data quality report",
subtitle = as.character(Sys.Date()),
user_info = NULL,
debug_parallel = FALSE,
resp_vars = character(0),
filter_indicator_functions = character(0),
filter_result_slots = c("^Summary", "^Segment", "^DataTypePlotList",
"^ReportSummaryTable", "^Dataframe", "^Result", "^VariableGroup"),
mode = c("default", "futures", "queue", "parallel"),
mode_args = list(),
notes_from_wrapper = list(),
storr_factory = NULL,
amend = FALSE,
cross_item_level,
`cross-item_level`,
segment_level,
dataframe_level,
item_computation_level,
.internal = rlang::env_inherits(rlang::caller_env(), parent.env(environment()))
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_dataframe |
data.frame – optional: Data frame level metadata |
meta_data_cross_item |
data.frame – optional: Cross-item level metadata |
meta_data_item_computation |
data.frame optional. computation rules for computed variables. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
... |
arguments to be passed to all called indicator functions if applicable. |
dimensions |
dimensions Vector of dimensions to address in the report. Allowed values in the vector are Completeness, Consistency, and Accuracy. The generated report will only cover the listed data quality dimensions. Accuracy is computational expensive, so this dimension is not enabled by default. Completeness should be included, if Consistency is included, and Consistency should be included, if Accuracy is included to avoid misleading detections of e.g. missing codes as outliers, please refer to the data quality concept for more details. Integrity is always included. |
cores |
integer number of cpu cores to use or a named list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. Can also be a cluster. |
specific_args |
list named list of arguments specifically for one of the called functions, the of the list elements correspond to the indicator functions whose calls should be modified. The elements are lists of arguments. |
advanced_options |
list options to set during report computation,
see |
author |
character author for the report documents. |
title |
character optional argument to specify the title for the data quality report |
subtitle |
character optional argument to specify a subtitle for the data quality report |
user_info |
list additional info stored with the report, e.g., comments, title, ... |
debug_parallel |
logical print blocks currently evaluated in parallel |
resp_vars |
variable list the name of the measurement variables for the report. If missing, all variables will be used. Only item level indicator functions are filtered, so far. |
filter_indicator_functions |
character regular expressions, only if an indicator function's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed. |
filter_result_slots |
character regular expressions, only if an indicator function's result's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed. |
mode |
character work mode for parallel execution. default is
"default", the values mean:
- default: use |
mode_args |
list of arguments for the selected |
notes_from_wrapper |
list a list containing notes about changed labels
by |
storr_factory |
function |
amend |
logical if there is already data in. |
cross_item_level |
data.frame alias for |
segment_level |
data.frame alias for |
dataframe_level |
data.frame alias for |
item_computation_level |
data.frame alias for
|
.internal |
logical internal use, only. |
`cross-item_level` |
data.frame alias for |
Details
See dq_report_by for a way to generate stratified or splitted reports easily.
Value
a dataquieR_resultset2 that can be
printed creating a HTML
-report.
See Also
Examples
## Not run:
prep_load_workbook_like_file("inst/extdata/meta_data_v2.xlsx")
meta_data <- prep_get_data_frame("item_level")
meta_data_cross <- prep_get_data_frame("cross-item_level")
x <- dq_report2("study_data", dimensions = NULL, label_col = "LABEL")
xx <- pbapply::pblapply(x, util_eval_to_dataquieR_result, env = environment())
xx <- pbapply::pblapply(tail(x), util_eval_to_dataquieR_result, env = environment())
xx <- parallel
cat(vapply(x, deparse1, FUN.VALUE = character(1)), sep = "\n", file = "all_calls.txt")
rstudioapi::navigateToFile("all_calls.txt")
eval(x$`acc_multivariate_outlier.Blood pressure checks`)
prep_load_workbook_like_file("meta_data_v2")
rules <- tibble::tribble(
~resp_vars, ~RULE,
"BMI", '[BODY_WEIGHT_0]/(([BODY_HEIGHT_0]/100)^2)',
"R", '[WAIST_CIRC_0]/2/[pi]', # in m^3
"VOL_EST", '[pi]*([WAIST_CIRC_0]/2/[pi])^2*[BODY_HEIGHT_0] / 1000', # in l
)
prep_load_workbook_like_file("ship_meta_v2")
prep_add_data_frames(computed_items = rules)
r <- dq_report2("ship", dimensions = NULL, label_col = "LABEL")
## End(Not run)
Generate a stratified full DQ report
Description
Generate a stratified full DQ report
Usage
dq_report_by(
study_data,
item_level = "item_level",
meta_data_segment = "segment_level",
meta_data_dataframe = "dataframe_level",
meta_data_cross_item = "cross-item_level",
meta_data_item_computation = "item_computation_level",
missing_tables = NULL,
label_col,
meta_data_v2,
segment_column = NULL,
strata_column = NULL,
strata_select = NULL,
selection_type = NULL,
segment_select = NULL,
segment_exclude = NULL,
strata_exclude = NULL,
subgroup = NULL,
resp_vars = character(0),
id_vars = NULL,
advanced_options = list(),
storr_factory = NULL,
amend = FALSE,
...,
output_dir = NULL,
input_dir = NULL,
also_print = FALSE,
disable_plotly = FALSE,
view = TRUE,
meta_data = item_level,
cross_item_level,
`cross-item_level`,
segment_level,
dataframe_level,
item_computation_level
)
Arguments
study_data |
data.frame the data frame that contains the measurements:
it can be an R object (e.g., |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_dataframe |
data.frame – optional if |
meta_data_cross_item |
data.frame – optional: Cross-item level metadata |
meta_data_item_computation |
data.frame – optional: Computed items metadata |
missing_tables |
character the name of the data frame containing the
missing codes, it can be a vector if more
than one table is provided. Example:
|
label_col |
variable attribute the name of the column in the metadata containing the labels of the variables |
meta_data_v2 |
character path or file name of the workbook like
metadata file, see
|
segment_column |
variable attribute name of a metadata attribute usable to split the report in sections of variables, e.g. all blood-pressure related variables. By default, reports are split by STUDY_SEGMENT if available and no segment_column nor strata_column or subgroup are defined. To create an un-split report please write explicitly the argument 'segment_column = NULL' |
strata_column |
variable name of a study variable to stratify the
report by, e.g. the study centers.
Both labels and |
strata_select |
character if given, the strata of strata_column are limited to the content of this vector. A character vector or a regular expression can be provided (e.g., "^a.*$"). This argument can not be used if no strata_column is provided |
selection_type |
character optional, can only be specified if a
|
segment_select |
character if given, the levels of segment_column are limited to the content of this vector. A character vector or a regular expression (e.g., ".*_EXAM$") can be provided. This argument can not be used if no segment_column is provided. |
segment_exclude |
character optional, can only be specified if a
|
strata_exclude |
character optional, can only be specified if a
|
subgroup |
character optional, to define subgroups of cases. Rules are
to be written as |
resp_vars |
variable the names of the measurement variables, if
missing or |
id_vars |
variable a vector containing the name/s of the variables
containing ids, to
be used to merge multiple data frames if provided
in |
advanced_options |
list options to set during report computation,
see |
storr_factory |
function |
amend |
logical if there is already data in. |
... |
arguments to be passed through to dq_report or dq_report2 |
output_dir |
character if given, the output is not returned but saved in this directory |
input_dir |
character if given, the study data files that have
no path and that are not URL are searched in
this directory. Also |
also_print |
logical if |
disable_plotly |
logical do not use |
view |
logical open the returned report |
meta_data |
data.frame old name for |
cross_item_level |
data.frame alias for |
segment_level |
data.frame alias for |
dataframe_level |
data.frame alias for |
item_computation_level |
data.frame alias for
|
`cross-item_level` |
data.frame alias for |
Value
invisible()
. named list of named lists of dq_report2 reports
or, if output_dir
has been specified, invisible(NULL)
See Also
Examples
## Not run: # really long-running example.
prep_load_workbook_like_file("meta_data_v2")
rep <- dq_report_by("study_data", label_col =
LABEL, strata_column = "CENTER_0")
rep <- dq_report_by("study_data",
label_col = LABEL, strata_column = "CENTER_0",
segment_column = NULL
)
unlink("/tmp/testRep/", force = TRUE, recursive = TRUE)
dq_report_by("study_data",
label_col = LABEL, strata_column = "CENTER_0",
segment_column = STUDY_SEGMENT, output_dir = "/tmp/testRep"
)
unlink("/tmp/testRep/", force = TRUE, recursive = TRUE)
dq_report_by("study_data",
label_col = LABEL, strata_column = "CENTER_0",
segment_column = NULL, output_dir = "/tmp/testRep"
)
dq_report_by("study_data",
label_col = LABEL,
segment_column = STUDY_SEGMENT, output_dir = "/tmp/testRep"
)
dq_report_by("study_data",
label_col = LABEL,
segment_column = STUDY_SEGMENT, output_dir = "/tmp/testRep",
also_print = TRUE
)
dq_report_by(study_data = "study_data", meta_data_v2 = "meta_data_v2",
advanced_options = list(dataquieR.study_data_cache_max = 0,
dataquieR.study_data_cache_metrics = TRUE,
dataquieR.study_data_cache_metrics_env = environment()),
cores = NULL, dimensions = "int")
dq_report_by(study_data = "study_data", meta_data_v2 = "meta_data_v2",
advanced_options = list(dataquieR.study_data_cache_max = 0),
cores = NULL, dimensions = "int")
## End(Not run)
HTML Dependency for report headers in clipboard
Description
HTML Dependency for report headers in clipboard
Usage
html_dependency_clipboard()
Value
the dependency
HTML Dependency for dataquieR
Description
generate all dependencies used in static dataquieR
reports
Usage
html_dependency_dataquieR(iframe = FALSE)
Arguments
iframe |
logical |
Value
the dependency
HTML Dependency for report headers in DT::datatable
Description
HTML Dependency for report headers in DT::datatable
Usage
html_dependency_report_dt()
Value
the dependency
HTML Dependency for tippy
Description
HTML Dependency for tippy
Usage
html_dependency_tippy()
Value
the dependency
HTML Dependency for vertical headers in DT::datatable
Description
HTML Dependency for vertical headers in DT::datatable
Usage
html_dependency_vert_dt()
Value
the dependency
Wrapper function to check for studies data structure
Description
This function tests for unexpected elements and records, as well as duplicated identifiers and content. The unexpected element record check can be conducted by providing the number of expected records or an additional table with the expected records. It is possible to conduct the checks by study segments or to consider only selected segments.
Usage
int_all_datastructure_dataframe(
meta_data_dataframe = "dataframe_level",
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
dataframe_level
)
Arguments
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
dataframe_level |
data.frame alias for |
Value
a list with
-
DataframeTable
: data frame with selected check results, used for the data quality report.
Examples
## Not run:
out_dataframe <- int_all_datastructure_dataframe(
meta_data_dataframe = "meta_data_dataframe",
meta_data = "ship_meta"
)
md0 <- prep_get_data_frame("ship_meta")
md0
md0$VAR_NAMES
md0$VAR_NAMES[[1]] <- "Id" # is this missmatch reported -- is the data frame
# also reported, if nothing is wrong with it
out_dataframe <- int_all_datastructure_dataframe(
meta_data_dataframe = "meta_data_dataframe",
meta_data = md0
)
# This is the "normal" procedure for inside pipeline
# but outside this function checktype is exact by default
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "subset_u")
lapply(setNames(nm = prep_get_data_frame("meta_data_dataframe")$DF_NAME),
int_sts_element_dataframe, meta_data = md0)
md0$VAR_NAMES[[1]] <-
"id" # is this missmatch reported -- is the data frame also reported,
# if nothing is wrong with it
lapply(setNames(nm = prep_get_data_frame("meta_data_dataframe")$DF_NAME),
int_sts_element_dataframe, meta_data = md0)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
## End(Not run)
Wrapper function to check for segment data structure
Description
This function tests for unexpected elements and records, as well as duplicated identifiers and content. The unexpected element record check can be conducted by providing the number of expected records or an additional table with the expected records. It is possible to conduct the checks by study segments or to consider only selected segments.
Usage
int_all_datastructure_segment(
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
segment_level,
meta_data_segment = "segment_level"
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
segment_level |
data.frame alias for |
meta_data_segment |
data.frame the data frame that contains the metadata for the segment level, mandatory |
Value
a list with
-
SegmentTable
: data frame with selected check results, used for the data quality report.
Examples
## Not run:
out_segment <- int_all_datastructure_segment(
meta_data_segment = "meta_data_segment",
study_data = "ship",
meta_data = "ship_meta"
)
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speedx", "distx"),
DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
STUDY_SEGMENT = c("Intro", "Ex"))
out_segment <- int_all_datastructure_segment(
meta_data_segment = "meta_data_segment",
study_data = study_data,
meta_data = meta_data
)
## End(Not run)
Check declared data types of metadata in study data
Description
Checks data types of the study data and for the data type declared in the metadata
Usage
int_datatype_matrix(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
split_segments = FALSE,
max_vars_per_plot = 20,
threshold_value = 0,
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable the names of the measurement variables, if
missing or |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
split_segments |
logical return one matrix per study segment |
max_vars_per_plot |
integer from=0. The maximum number of variables per single plot. |
threshold_value |
numeric from=0 to=100. percentage failing conversions allowed to still classify a study variable convertible. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
This is a preparatory support function that compares study data with associated metadata. A prerequisite of this function is that the no. of columns in the study data complies with the no. of rows in the metadata.
For each study variable, the function searches for its data type declared in static metadata and returns a heatmap like matrix indicating data type mismatches in the study data.
List function.
Value
a list with:
-
SummaryTable
: data frame containing data quality check for "data type mismatch" (CLS_int_vfe_type
,PCT_int_vfe_type
). The following categories are possible: categories: "Non-matching datatype", "Non-Matching datatype, convertible", "Matching datatype" -
SummaryData
: data frame containing data quality check for "data type mismatch" for a report -
SummaryPlot
: ggplot2::ggplot2 heatmap plot, graphical representation ofSummaryTable
-
DataTypePlotList
: list of plots per (maybe artificial) segment -
ReportSummaryTable
: data frame underlyingSummaryPlot
Check for duplicated content
Description
This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.
Usage
int_duplicate_content(
level = c("dataframe", "segment"),
study_data,
item_level = "item_level",
label_col,
meta_data = item_level,
meta_data_v2,
...
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
... |
Depending on |
Value
a list. Depending on level
, see
util_int_duplicate_content_segment or
util_int_duplicate_content_dataframe for a description of the outputs.
Check for duplicated IDs
Description
This function tests for duplicates entries in identifiers. It is possible to check duplicated identifiers by study segments or to consider only selected segments.
Usage
int_duplicate_ids(
level = c("dataframe", "segment"),
study_data,
item_level = "item_level",
label_col,
meta_data = item_level,
meta_data_v2,
...
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
... |
Depending on |
Value
a list. Depending on level
, see
util_int_duplicate_ids_segment or
util_int_duplicate_ids_dataframe for a description of the outputs.
Encoding Errors
Description
Detects errors in the character encoding of string variables
Usage
int_encoding_errors(
resp_vars = NULL,
study_data,
label_col,
meta_data_dataframe = "dataframe_level",
item_level = "item_level",
ref_encs,
meta_data = item_level,
meta_data_v2,
dataframe_level
)
Arguments
resp_vars |
variable the names of the measurement variables, if
missing or |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
item_level |
data.frame the data frame that contains metadata attributes of study data |
ref_encs |
reference encodings (names are |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
dataframe_level |
data.frame alias for |
Details
Strings are stored based on code tables, nowadays, typically as UTF-8. However, other code systems are still in use, so, sometimes, strings from different systems are mixed in the data. This indicator checks for such problems and returns the count of entries per variable, that do not match the reference coding system, which is estimated from the study data (addition of metadata field is planned).
If not specified in the metadata (columns ENCODING
in item- or data-frame-
level, the encoding is guessed from the data). Otherwise, it may be any
supported encoding as returned by iconvlist()
.
Value
a list with:
-
SummaryTable
: data.frame with information on such problems -
SummaryData
: data.frame human readable version ofSummaryTable
-
FlaggedStudyData
: data.frame tells for each entry in study data if its encoding is OK. has the same dimensions asstudy_data
Detect Expected Observations
Description
For each participant, check, if an observation was expected, given the
PART_VARS
from item-level metadata
Usage
int_part_vars_structure(
label_col,
study_data,
item_level = "item_level",
expected_observations = c("HIERARCHY", "SEGMENT"),
disclose_problem_paprt_var_data = FALSE,
meta_data = item_level,
meta_data_v2
)
Arguments
label_col |
character mapping attribute |
study_data |
study_data must have all relevant |
item_level |
meta_data must be complete to avoid false positives on
non-existing |
expected_observations |
enum HIERARCHY | SEGMENT. How should
|
disclose_problem_paprt_var_data |
logical show the problematic data
( |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Value
empty list, so far – the function only warns.
Determine missing and/or superfluous data elements
Description
Depends on dataquieR.ELEMENT_MISSMATCH_CHECKTYPE option, see there
Usage
int_sts_element_dataframe(
item_level = "item_level",
meta_data_dataframe = "dataframe_level",
meta_data = item_level,
meta_data_v2,
check_type = getOption("dataquieR.ELEMENT_MISSMATCH_CHECKTYPE",
dataquieR.ELEMENT_MISSMATCH_CHECKTYPE_default),
dataframe_level
)
Arguments
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
check_type |
enum none | exact | subset_u | subset_m. See dataquieR.ELEMENT_MISSMATCH_CHECKTYPE |
dataframe_level |
data.frame alias for |
Details
Value
list with names lots:
-
DataframeData
: data frame with the unexpected elements check results. -
DataframeTable
: data.frame table with all errors, used for the data quality report: -PCT_int_sts_element
: Percentage of element mismatches -NUM_int_sts_element
: Number of element mismatches -resp_vars
: affected element names
Examples
## Not run:
prep_load_workbook_like_file("~/tmp/df_level_test.xlsx")
meta_data_dataframe <- "dataframe_level"
meta_data <- "item_level"
## End(Not run)
Checks for element set
Description
Depends on dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
option,
see there – # TODO: Rind out, how to document and link
it here using Roxygen
.
Usage
int_sts_element_segment(
study_data,
item_level = "item_level",
label_col,
meta_data = item_level,
meta_data_v2
)
Arguments
study_data |
data.frame the data frame that contains the measurements, mandatory. |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Value
a list with
-
SegmentData
: data frame with the unexpected elements check results. -Segment
: name of the corresponding segment, if applicable,ALL
otherwise -
SegmentTable
: data frame with the unexpected elements check results, used for the data quality report. -Segment
: name of the corresponding segment, if applicable,ALL
otherwise
Examples
## Not run:
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speedx", "distx"),
DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
STUDY_SEGMENT = c("Intro", "Ex"))
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "none")
int_sts_element_segment(study_data, meta_data)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
int_sts_element_segment(study_data, meta_data)
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speedx", "distx"),
DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
STUDY_SEGMENT = c("Intro", "Intro"))
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "none")
int_sts_element_segment(study_data, meta_data)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
int_sts_element_segment(study_data, meta_data)
study_data <- cars
meta_data <- dataquieR::prep_create_meta(VAR_NAMES = c("speed", "distx"),
DATA_TYPE = c("integer", "integer"), MISSING_LIST = "|", JUMP_LIST = "|",
STUDY_SEGMENT = c("Intro", "Intro"))
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "none")
int_sts_element_segment(study_data, meta_data)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "exact")
int_sts_element_segment(study_data, meta_data)
## End(Not run)
Check for unexpected data element count
Description
This function contrasts the expected element number in each study in the metadata with the actual element number in each study data frame.
Usage
int_unexp_elements(
identifier_name_list,
data_element_count,
meta_data_dataframe = "dataframe_level",
meta_data_v2,
dataframe_level
)
Arguments
identifier_name_list |
character a character vector indicating the name of each study data frame, mandatory. |
data_element_count |
integer an integer vector with the number of expected data elements, mandatory. |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
meta_data_v2 |
character path to workbook like metadata file, see
|
dataframe_level |
data.frame alias for |
Value
a list with
-
DataframeData
: data frame with the results of the quality check for unexpected data elements -
DataframeTable
: data frame with selected unexpected data elements check results, used for the data quality report.
Check for unexpected data record count at the data frame level
Description
This function contrasts the expected record number in each study in the metadata with the actual record number in each study data frame.
Usage
int_unexp_records_dataframe(
identifier_name_list,
data_record_count,
meta_data_dataframe = "dataframe_level",
meta_data_v2,
dataframe_level
)
Arguments
identifier_name_list |
character a character vector indicating the name of each study data frame, mandatory. |
data_record_count |
integer an integer vector with the number of expected data records per study data frame, mandatory. |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
meta_data_v2 |
character path to workbook like metadata file, see
|
dataframe_level |
data.frame alias for |
Value
a list with
-
DataframeData
: data frame with the results of the quality check for unexpected data elements -
DataframeTable
: data frame with selected unexpected data elements check results, used for the data quality report.
Check for unexpected data record count within segments
Description
This function contrasts the expected record number in each study segment in the metadata with the actual record number in each segment data frame.
Usage
int_unexp_records_segment(
study_segment,
study_data,
label_col,
item_level = "item_level",
data_record_count,
meta_data = item_level,
meta_data_segment = "segment_level",
meta_data_v2,
segment_level
)
Arguments
study_segment |
character a character vector indicating the name of each study data frame, mandatory. |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
data_record_count |
integer an integer vector with the number of expected data records, mandatory. |
meta_data |
data.frame old name for |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_v2 |
character path to workbook like metadata file, see
|
segment_level |
data.frame alias for |
Details
The current implementation does not take into account jump or missing codes, the function is rather based on checking whether NAs are present in the study data
Value
a list with
-
SegmentData
: data frame with the results of the quality check for unexpected data elements -
SegmentTable
: data frame with selected unexpected data elements check results, used for the data quality report.
Check for unexpected data record set
Description
This function tests that the identifiers match a provided record set. It is possible to check for unexpected data record sets by study segments or to consider only selected segments.
Usage
int_unexp_records_set(
level = c("dataframe", "segment"),
study_data,
item_level = "item_level",
label_col,
meta_data = item_level,
meta_data_v2,
...
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
... |
Depending on |
Value
a list. Depending on level
, see
util_int_unexp_records_set_segment or
util_int_unexp_records_set_dataframe for a description of the outputs.
.menu_env
– an environment for HTML menu creation
Description
used by the dq_report2-pipeline
Usage
.menu_env
Format
An object of class environment
of length 3.
Generate the menu for a report
Description
Generate the menu for a report
Arguments
pages |
encapsulated |
Value
the html-taglist
for the menu
Creates a drop-down menu
Description
Creates a drop-down menu
Arguments
title |
name of the entry in the main menu |
menu_description |
description, displayed, if the main menu entry itself is clicked |
... |
the sub-menu-entries |
id |
id for the entry, defaults to modified title |
Value
html div object
Create a single menu entry
Description
Create a single menu entry
Arguments
title |
of the entry |
id |
linked |
... |
additional arguments for the menu link |
Value
html-a-tag object
Data frame with metadata about the study data on variable level
Description
Variable level metadata.
See Also
further details on variable level metadata.
Well known columns on the meta_data_cross-item
sheet
Description
Metadata describing groups of variables, e.g., for their multivariate distribution or for defining contradiction rules.
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
util_normalize_cross_item()
Well known columns on the meta_data_dataframe
sheet
Description
Metadata describing data delivered on one data frame/table sheet, e.g., a full questionnaire, not its items.
.meta_data_env
– an environment for easy metadata access
Description
used by the dq_report2-pipeline
Usage
.meta_data_env
Format
An object of class environment
of length 8.
See Also
meta_data_env_id_vars meta_data_env_co_vars meta_data_env_time_vars meta_data_env_group_vars
Extract co-variables for a given item
Description
Extract co-variables for a given item
Arguments
entity |
vector of item-identifiers |
Value
a vector with co-variables for each entity-entry, having the
explode
attribute set to FALSE
See Also
Extract MULTIVARIATE_OUTLIER_CHECK
for variable group
Description
Extract MULTIVARIATE_OUTLIER_CHECK
for variable group
Extract selected outlier criteria for a given item or variable group
Arguments
entity |
vector of item- or variable group identifiers |
Details
In the environment, target_meta_data
should be set either to
item_level
or to cross-item_level
.
In the environment, target_meta_data
should be set either to
item_level
or to cross-item_level
.
Value
a vector with id-variables for each entity-entry, having the
explode
attribute set to FALSE
a vector with id-variables for each entity-entry, having the
explode
attribute set to FALSE
See Also
Extract group variables for a given item
Description
Extract group variables for a given item
Arguments
entity |
vector of item-identifiers |
Value
a vector with possible group-variables (can be more than one per
item) for each entity-entry, having the explode
attribute
set to TRUE
See Also
Extract id variables for a given item or variable group
Description
Extract id variables for a given item or variable group
Arguments
entity |
vector of item- or variable group identifiers |
Details
In the environment, target_meta_data
should be set either to
item_level
or to cross-item_level
.
Value
a vector with id-variables for each entity-entry, having the
explode
attribute set to FALSE
See Also
Extract outlier rules-number-threshold for a given item or variable group
Description
Extract outlier rules-number-threshold for a given item or variable group
Arguments
entity |
vector of item- or variable group identifiers |
Details
In the environment, target_meta_data
should be set either to
item_level
or to cross-item_level
.
Value
a vector with id-variables for each entity-entry, having the
explode
attribute set to FALSE
See Also
Extract measurement time variable for a given item
Description
Extract measurement time variable for a given item
Arguments
entity |
vector of item-identifiers |
Value
a vector with time-variables (usually one per item) for each
entity-entry, having the explode
attribute set to TRUE
See Also
Well known columns on the meta_data_segment
sheet
Description
Metadata describing study segments, e.g., a full questionnaire, not its items.
return the number of result slots in a report
Description
return the number of result slots in a report
Usage
nres(x)
Arguments
x |
the |
Value
the number of used result slots
Convert a pipeline result data frame to named encapsulated lists
Description
Deprecated
Usage
pipeline_recursive_result(...)
Arguments
... |
Deprecated |
Value
Deprecated
Call (nearly) one "Accuracy" function with many parameterizations at once automatically
Description
Deprecated
Usage
pipeline_vectorized(...)
Arguments
... |
Deprecated |
Value
Deprecated
Plot a dataquieR
summary
Description
Plot a dataquieR
summary
Usage
## S3 method for class 'dataquieR_summary'
plot(x, y, ..., filter, dont_plot = FALSE, stratify_by)
Arguments
x |
the |
y |
not yet used |
... |
not yet used |
filter |
if given, this filters the summary, e.g.,
|
dont_plot |
suppress the actual plotting, just return a printable
object derived from |
stratify_by |
column to stratify the summary, may be one string. |
Value
invisible html object
Utility function to plot a combined figure for distribution checks
Description
Data quality indicator checks "Unexpected location" with histograms and plots of empirical cumulative distributions for the subgroups.
Usage
prep_acc_distributions_with_ecdf(
resp_vars = NULL,
group_vars = NULL,
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
dataquieR.max_group_var_levels_in_plot_default),
n_obs_per_group_min = getOption("dataquieR.min_obs_per_group_var_in_plot",
dataquieR.min_obs_per_group_var_in_plot_default)
)
Arguments
resp_vars |
variable list the name of the measurement variable |
group_vars |
variable list the name of the observer, device or reader variable |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
n_group_max |
maximum number of categories to be displayed individually
for the grouping variable ( |
n_obs_per_group_min |
minimum number of data points per group to create
a graph for an individual category of the |
Value
A SummaryPlot
.
Convert missing codes in metadata format v1.0 and a missing-cause-table to v2.0 missing list / jump list assignments
Description
The function has to working modes. If replace_meta_data
is TRUE
, by
default, if cause_label_df
contains a column
named resp_vars
, then the missing/jump codes in
meta_data[, c(MISSING_CODES, JUMP_CODES)]
will be overwritten, otherwise,
it will be labeled using the cause_label_df
.
Usage
prep_add_cause_label_df(
item_level = "item_level",
cause_label_df,
label_col = VAR_NAMES,
assume_consistent_codes = TRUE,
replace_meta_data = ("resp_vars" %in% colnames(cause_label_df)),
meta_data = item_level,
meta_data_v2
)
Arguments
item_level |
data.frame the data frame that contains metadata attributes of study data |
cause_label_df |
data.frame missing code table. If missing codes have labels the respective data frame can be specified here, see cause_label_df |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
assume_consistent_codes |
logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code will be the same for all variables. |
replace_meta_data |
logical if |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
If a column resp_vars
exists, then rows with a value in resp_vars
will
only be used for the corresponding variable.
Value
data.frame updated metadata including all the code labels in missing/jump lists
See Also
Insert missing codes for NA
s based on rules
Description
Insert missing codes for NA
s based on rules
Usage
prep_add_computed_variables(
study_data,
meta_data,
label_col,
rules,
use_value_labels
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
rules |
data.frame with the columns:
|
use_value_labels |
logical In rules for factors, use the value labels,
not the codes. Defaults to |
Value
a list
with the entry:
-
ModifiedStudyData
: Study data with the new variables
Examples
## Not run:
study_data <- prep_get_data_frame("ship")
prep_load_workbook_like_file("ship_meta_v2")
meta_data <- prep_get_data_frame("item_level")
rules <- tibble::tribble(
~VAR_NAMES, ~RULE,
"BMI", '[BODY_WEIGHT_0]/(([BODY_HEIGHT_0]/100)^2)',
"R", '[WAIST_CIRC_0]/2/[pi]', # in m^3
"VOL_EST", '[pi]*([WAIST_CIRC_0]/2/[pi])^2*[BODY_HEIGHT_0] / 1000', # in l
)
r <- prep_add_computed_variables(study_data, meta_data,
label_col = "LABEL", rules, use_value_labels = FALSE)
## End(Not run)
Add data frames to the pre-loaded / cache data frame environment
Description
These can be referred to by their names, then, wherever dataquieR
expects
a data.frame – just pass a character instead. If this character is not
found, dataquieR
would additionally look for files with the name and for
URLs
. You can also refer to specific sheets of a workbook or specific
object from an RData
by appending a pipe symbol and its name. A second
pipe symbol allows to extract certain columns from such sheets (but
they will remain data frames).
Usage
prep_add_data_frames(..., data_frame_list = list())
Arguments
... |
data frames, if passed with names, these will be the names of these tables in the data frame environment. If not, then the names in the calling environment will be used. |
data_frame_list |
a named list with data frames. Also these will be
added and names will be handled as for the |
Value
data.frame invisible(the cache environment)
See Also
Other data-frame-cache:
prep_get_data_frame()
,
prep_list_dataframes()
,
prep_load_folder_with_metadata()
,
prep_load_workbook_like_file()
,
prep_purge_data_frame_cache()
,
prep_remove_from_cache()
Insert missing codes for NA
s based on rules
Description
Insert missing codes for NA
s based on rules
Usage
prep_add_missing_codes(
resp_vars,
study_data,
meta_data_v2,
item_level = "item_level",
label_col,
rules,
use_value_labels,
overwrite = FALSE,
meta_data = item_level
)
Arguments
resp_vars |
variable list the name of the measurement variables to be
modified, all from |
study_data |
data.frame the data frame that contains the measurements |
meta_data_v2 |
character path to workbook like metadata file, see
|
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
rules |
data.frame with the columns:
|
use_value_labels |
logical In rules for factors, use the value labels,
not the codes. Defaults to |
overwrite |
logical Also insert missing codes, if the values are not
|
meta_data |
data.frame old name for |
Value
a list
with the entries:
-
ModifiedStudyData
: Study data withNA
s replaced by theCODE_VALUE
-
ModifiedMetaData
: Metadata having the new codes amended in the columnsJUMP_LIST
orMISSING_LIST
, respectively
Support function to augment metadata during data quality reporting
Description
adds an annotation to static metadata
Usage
prep_add_to_meta(
VAR_NAMES,
DATA_TYPE,
LABEL,
VALUE_LABELS,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
...
)
Arguments
VAR_NAMES |
character Names of the Variables to add |
DATA_TYPE |
character Data type for the added variables |
LABEL |
character Labels for these variables |
VALUE_LABELS |
character Value labels for the values of the variables
as usually pipe separated and assigned with
|
item_level |
data.frame the metadata to extend |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
... |
Further defined variable attributes, see prep_create_meta |
Details
Add metadata e.g. of transformed/new variable This function is not yet considered stable, but we already export it, because it could help. Therefore, we have some inconsistencies in the formals still.
Value
a data frame with amended metadata.
Re-Code labels with their respective codes according to the meta_data
Description
Re-Code labels with their respective codes according to the meta_data
Usage
prep_apply_coding(
study_data,
meta_data_v2,
item_level = "item_level",
meta_data = item_level
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
meta_data_v2 |
character path to workbook like metadata file, see
|
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
Value
data.frame modified study data with labels replaced by the codes
Check for package updates
Description
Check for package updates
Usage
prep_check_for_dataquieR_updates(
beta = FALSE,
deps = TRUE,
ask = interactive()
)
Arguments
beta |
logical check for beta version too |
deps |
logical check for missing (optional) dependencies |
ask |
logical ask for updates |
Value
invisible(NULL)
Verify and normalize metadata on data frame level
Description
if possible, mismatching data types are converted ("true"
becomes TRUE
)
Usage
prep_check_meta_data_dataframe(
meta_data_dataframe = "dataframe_level",
meta_data_v2,
dataframe_level
)
Arguments
meta_data_dataframe |
data.frame data frame or path/url of a metadata sheet for the data frame level |
meta_data_v2 |
character path to workbook like metadata file, see
|
dataframe_level |
data.frame alias for |
Details
missing columns are added, filled with NA
, if this is valid, i.e., n.a.
for DF_NAME
as the key column
Value
standardized metadata sheet as data frame
Examples
## Not run:
mds <- prep_check_meta_data_dataframe("ship_meta_dataframe|dataframe_level") # also converts
print(mds)
prep_check_meta_data_dataframe(mds)
mds1 <- mds
mds1$DF_RECORD_COUNT <- NULL
print(prep_check_meta_data_dataframe(mds1)) # fixes the missing column by NAs
mds1 <- mds
mds1$DF_UNIQUE_ROWS[[2]] <- "xxx" # not convertible
# print(prep_check_meta_data_dataframe(mds1)) # fail
mds1 <- mds
mds1$DF_UNIQUE_ID[[2]] <- 12
# print(prep_check_meta_data_dataframe(mds1)) # fail
## End(Not run)
Verify and normalize metadata on segment level
Description
if possible, mismatching data types are converted ("true"
becomes TRUE
)
Usage
prep_check_meta_data_segment(
meta_data_segment = "segment_level",
meta_data_v2,
segment_level
)
Arguments
meta_data_segment |
data.frame data frame or path/url of a metadata sheet for the segment level |
meta_data_v2 |
character path to workbook like metadata file, see
|
segment_level |
data.frame alias for |
Details
missing columns are added, filled with NA
, if this is valid, i.e., n.a.
for STUDY_SEGMENT
as the key column
Value
standardized metadata sheet as data frame
Examples
## Not run:
mds <- prep_check_meta_data_segment("ship_meta_v2|segment_level") # also converts
print(mds)
prep_check_meta_data_segment(mds)
mds1 <- mds
mds1$SEGMENT_RECORD_COUNT <- NULL
print(prep_check_meta_data_segment(mds1)) # fixes the missing column by NAs
mds1 <- mds
mds1$SEGMENT_UNIQUE_ROWS[[2]] <- "xxx" # not convertible
# print(prep_check_meta_data_segment(mds1)) # fail
## End(Not run)
Checks the validity of metadata w.r.t. the provided column names
Description
This function verifies, if a data frame complies to metadata conventions and
provides a given richness of meta information as specified by level
.
Usage
prep_check_meta_names(
item_level = "item_level",
level,
character.only = FALSE,
meta_data = item_level,
meta_data_v2
)
Arguments
item_level |
data.frame the data frame that contains metadata attributes of study data |
level |
enum level of requirement (see also VARATT_REQUIRE_LEVELS).
set to |
character.only |
logical a logical indicating whether level can be assumed to be character strings. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Note, that only the given level is checked despite, levels are somehow hierarchical.
Value
a logical with:
invisible(TRUE). In case of problems with the metadata, a condition is raised (
stop()
).
Examples
## Not run:
prep_check_meta_names(data.frame(VAR_NAMES = 1, DATA_TYPE = 2,
MISSING_LIST = 3))
prep_check_meta_names(
data.frame(
VAR_NAMES = 1, DATA_TYPE = 2, MISSING_LIST = 3,
LABEL = "LABEL", VALUE_LABELS = "VALUE_LABELS",
JUMP_LIST = "JUMP_LIST", HARD_LIMITS = "HARD_LIMITS",
GROUP_VAR_OBSERVER = "GROUP_VAR_OBSERVER",
GROUP_VAR_DEVICE = "GROUP_VAR_DEVICE",
TIME_VAR = "TIME_VAR",
PART_VAR = "PART_VAR",
STUDY_SEGMENT = "STUDY_SEGMENT",
LOCATION_RANGE = "LOCATION_RANGE",
LOCATION_METRIC = "LOCATION_METRIC",
PROPORTION_RANGE = "PROPORTION_RANGE",
MISSING_LIST_TABLE = "MISSING_LIST_TABLE",
CO_VARS = "CO_VARS",
LONG_LABEL = "LONG_LABEL"
),
RECOMMENDED
)
prep_check_meta_names(
data.frame(
VAR_NAMES = 1, DATA_TYPE = 2, MISSING_LIST = 3,
LABEL = "LABEL", VALUE_LABELS = "VALUE_LABELS",
JUMP_LIST = "JUMP_LIST", HARD_LIMITS = "HARD_LIMITS",
GROUP_VAR_OBSERVER = "GROUP_VAR_OBSERVER",
GROUP_VAR_DEVICE = "GROUP_VAR_DEVICE",
TIME_VAR = "TIME_VAR",
PART_VAR = "PART_VAR",
STUDY_SEGMENT = "STUDY_SEGMENT",
LOCATION_RANGE = "LOCATION_RANGE",
LOCATION_METRIC = "LOCATION_METRIC",
PROPORTION_RANGE = "PROPORTION_RANGE",
DETECTION_LIMITS = "DETECTION_LIMITS", SOFT_LIMITS = "SOFT_LIMITS",
CONTRADICTIONS = "CONTRADICTIONS", DISTRIBUTION = "DISTRIBUTION",
DECIMALS = "DECIMALS", VARIABLE_ROLE = "VARIABLE_ROLE",
DATA_ENTRY_TYPE = "DATA_ENTRY_TYPE",
CO_VARS = "CO_VARS",
END_DIGIT_CHECK = "END_DIGIT_CHECK",
VARIABLE_ORDER = "VARIABLE_ORDER", LONG_LABEL =
"LONG_LABEL", recode = "recode",
MISSING_LIST_TABLE = "MISSING_LIST_TABLE"
),
OPTIONAL
)
# Next one will fail
try(
prep_check_meta_names(data.frame(VAR_NAMES = 1, DATA_TYPE = 2,
MISSING_LIST = 3), TECHNICAL)
)
## End(Not run)
Support function to scan variable labels for applicability
Description
Adjust labels in meta_data to be valid variable names in formulas for
diverse r functions, such as glm
or lme4::lmer
.
Usage
prep_clean_labels(
label_col,
item_level = "item_level",
no_dups = FALSE,
meta_data = item_level,
meta_data_v2
)
Arguments
label_col |
character label attribute to adjust or character vector to
adjust, depending on |
item_level |
data.frame metadata data frame: If |
no_dups |
logical disallow duplicates in input or output vectors of
the function, then, prep_clean_labels would call
|
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Details
Hint: The following is still true, but the functions should be capable of doing potentially needed fixes on-the-fly automatically, so likely you will not need this function any more.
Currently, labels as given by label_col
arguments in the most functions
are directly used in formula, so that they become natural part of the
outputs, but different models expect differently strict syntax for such
formulas, especially for valid variable names. prep_clean_labels
removes
all potentially inadmissible characters from variable names (no guarantee,
that some exotic model still rejects the names, but minimizing the number
of exotic characters). However, variable names are modified, may become
unreadable or indistinguishable from other variable names. For the latter
case, a stop
call is possible, controlled by the no_dups
argument.
A warning is emitted, if modifications were necessary.
Value
a data.frame with:
if
meta_data
is set, a list with:modified
meta_data[, label_col]
column
if
meta_data
is not set, adjusted labels that then were directly given in label_col
Examples
## Not run:
meta_data1 <- data.frame(
LABEL =
c(
"syst. Blood pressure (mmHg) 1",
"1st heart frequency in MHz",
"body surface (\\u33A1)"
)
)
print(meta_data1)
print(prep_clean_labels(meta_data1$LABEL))
meta_data1 <- prep_clean_labels("LABEL", meta_data1)
print(meta_data1)
## End(Not run)
Combine two report summaries
Description
Combine two report summaries
Usage
prep_combine_report_summaries(..., summaries_list, amend_segment_names = FALSE)
Arguments
... |
objects returned by prep_extract_summary |
summaries_list |
if given, list of objects returned by prep_extract_summary |
amend_segment_names |
logical use names of the |
Value
combined summaries
See Also
Other summary_functions:
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Verify item-level metadata
Description
are the provided item-level meta_data
plausible given study_data
?
Usage
prep_compare_meta_with_study(
study_data,
label_col,
item_level = "item_level",
meta_data = item_level,
meta_data_v2
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
an invisible()
list with the entries.
-
pred
data.frame metadata predicted fromstudy_data
, reduced to such metadata also available in the provided metadata -
prov
data.frame provided metadata, reduced to such metadata also available in the providedstudy_data
-
ml_error
characterVAR_NAMES
of variables with potentially wrongMISSING_LIST
-
sl_error
characterVAR_NAMES
of variables with potentially wrongSCALE_LEVEL
-
dt_error
characterVAR_NAMES
of variables with potentially wrongDATA_TYPE
Support function to create data.frames of metadata
Description
Create a metadata data frame and map names.
Generally, this function only creates a data.frame, but using
this constructor instead of calling
data.frame(..., stringsAsFactors = FALSE)
, it becomes possible, to adapt
the metadata data.frame in later developments, e.g. if we decide to use
classes for the metadata, or if certain standard names of variable attributes
change. Also, a validity check is possible to implement here.
Usage
prep_create_meta(..., stringsAsFactors = FALSE, level, character.only = FALSE)
Arguments
... |
named column vectors, names will be mapped using WELL_KNOWN_META_VARIABLE_NAMES, if included in WELL_KNOWN_META_VARIABLE_NAMES can also be a data frame, then its column names will be mapped using WELL_KNOWN_META_VARIABLE_NAMES |
stringsAsFactors |
logical if the argument is a list of vectors, a
data frame will be
created. In this case, |
level |
enum level of requirement (see also VARATT_REQUIRE_LEVELS)
set to |
character.only |
logical a logical indicating whether level can be assumed to be character strings. |
Details
For now, this calls data.frame, but it already renames variable attributes,
if they have a different name assigned in WELL_KNOWN_META_VARIABLE_NAMES,
e.g. WELL_KNOWN_META_VARIABLE_NAMES$RECODE
maps to recode
in lower case.
NB: dataquieR
exports all names from WELL_KNOWN_META_VARIABLE_NAME as
symbols, so RECODE
also contains "recode"
.
Value
a data frame with:
metadata attribute names mapped and
metadata checked using prep_check_meta_names and do some more verification about conventions, such as check for valid intervals in limits)
See Also
WELL_KNOWN_META_VARIABLE_NAMES
Instantiate a new metadata file
Description
Instantiate a new metadata file
Usage
prep_create_meta_data_file(
file_name,
study_data,
open = TRUE,
overwrite = FALSE
)
Arguments
file_name |
character file path to write to |
study_data |
data.frame optional, study data to guess metadata from |
open |
logical open the file after creation |
overwrite |
logical overwrite |
Value
invisible(NULL)
Create a factory function for storr
objects for backing
a dataquieR_resultset2
Description
Create a factory function for storr
objects for backing
a dataquieR_resultset2
Usage
prep_create_storr_factory(db_dir = tempfile(), namespace = "objects")
Arguments
db_dir |
character path to the directory for the back-end, if one is created on the fly. |
namespace |
character namespace for the report, so that one back-end can back several reports the returned function will try to create a |
Value
storr
object or NULL
, if package storr
is not available
Get data types from data
Description
Get data types from data
Usage
prep_datatype_from_data(
resp_vars = colnames(study_data),
study_data,
.dont_cast_off_cols = FALSE
)
Arguments
resp_vars |
variable names of the variables to fetch the data type from the data |
study_data |
data.frame the data frame that contains the measurements Hint: Only data frames supported, no URL or file names. |
.dont_cast_off_cols |
logical internal use, only |
Value
vector of data types
Examples
## Not run:
dataquieR::prep_datatype_from_data(cars)
## End(Not run)
Convert two vectors from a code-value-table to a key-value list
Description
Convert two vectors from a code-value-table to a key-value list
Usage
prep_deparse_assignments(
codes,
labels = codes,
split_char = SPLIT_CHAR,
mode = c("numeric_codes", "string_codes")
)
Arguments
codes |
codes, numeric or dates (as default, but string codes can be enabled using the option 'mode', see below) |
labels |
character labels, same length as codes |
split_char |
character split character character to split code assignments |
mode |
character one of two options to insist on numeric or datetime codes (default) or to allow for string codes |
Value
a vector with assignment strings for each row of
cbind(codes, labels)
Get the dataquieR DATA_TYPE
of x
Description
Get the dataquieR DATA_TYPE
of x
Usage
prep_dq_data_type_of(x)
Arguments
x |
object to define the dataquieR data type of |
Value
the dataquieR data type as listed in DATA_TYPES
See Also
Expand code labels across variables
Description
Code labels are copied from other variables, if the code is the same and the label is set only for some variables
Usage
prep_expand_codes(
item_level = "item_level",
suppressWarnings = FALSE,
mix_jumps_and_missings = FALSE,
meta_data_v2,
meta_data = item_level
)
Arguments
item_level |
data.frame the data frame that contains metadata attributes of study data |
suppressWarnings |
logical show warnings, if labels are expanded |
mix_jumps_and_missings |
logical ignore the class of the codes for label expansion, i.e., use missing code labels as jump code labels, if the values are the same. |
meta_data_v2 |
character path to workbook like metadata file, see
|
meta_data |
data.frame old name for |
Value
data.frame an updated metadata data frame.
Examples
## Not run:
meta_data <- prep_get_data_frame("meta_data")
meta_data$JUMP_LIST[meta_data$VAR_NAMES == "v00003"] <- "99980 = NOOP"
md <- prep_expand_codes(meta_data)
md$JUMP_LIST
md$MISSING_LIST
md <- prep_expand_codes(meta_data, mix_jumps_and_missings = TRUE)
md$JUMP_LIST
md$MISSING_LIST
meta_data <- prep_get_data_frame("meta_data")
meta_data$MISSING_LIST[meta_data$VAR_NAMES == "v00003"] <- "99980 = NOOP"
md <- prep_expand_codes(meta_data)
md$JUMP_LIST
md$MISSING_LIST
## End(Not run)
Extract all missing/jump codes from metadata and export a cause-label-data-frame
Description
Extract all missing/jump codes from metadata and export a cause-label-data-frame
Usage
prep_extract_cause_label_df(
item_level = "item_level",
label_col = VAR_NAMES,
meta_data_v2,
meta_data = item_level
)
Arguments
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data_v2 |
character path to workbook like metadata file, see
|
meta_data |
data.frame old name for |
Value
list with the entries
-
meta_data
data.frame a data frame that contains updated metadata – you still need to add a column MISSING_LIST_TABLE and add thecause_label_df
as such to the metadata cache usingprep_add_data_frames()
, manually. -
cause_label_df
data.frame missing code table. If missing codes have labels the respective data frame are specified here, see cause_label_df.
See Also
Extract old function based summary from data quality results
Description
Extract old function based summary from data quality results
Usage
prep_extract_classes_by_functions(r)
Arguments
r |
Value
data.frame long format, compatible with prep_summary_to_classes()
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Extract summary from data quality results
Description
Generic function, currently supports dq_report2 and dataquieR_result
Usage
prep_extract_summary(r, ...)
Arguments
r |
dq_report2 or dataquieR_result object |
... |
further arguments, maybe needed for some implementations |
Value
list with two slots Data
and Table
with data.frames
featuring all metrics columns
from the report or result in x
,
the STUDY_SEGMENT and the VAR_NAMES.
In case of Data
, the columns are formatted nicely but still
with the standardized column names – use
util_translate_indicator_metrics()
to rename them nicely. In
case of Table
, just as they are.
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Extract report summary from reports
Description
Extract report summary from reports
Usage
## S3 method for class 'dataquieR_result'
prep_extract_summary(r, ...)
Arguments
r |
dataquieR_result a result from adq_report2 report |
... |
not used |
Value
list with two slots Data
and Table
with data.frames
featuring all metrics columns
from the report r
, the STUDY_SEGMENT and the VAR_NAMES.
In case of Data
, the columns are formatted nicely but still
with the standardized column names – use
util_translate_indicator_metrics()
to rename them nicely. In
case of Table
, just as they are.
See Also
prep_combine_report_summaries()
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Extract report summary from reports
Description
Extract report summary from reports
Usage
## S3 method for class 'dataquieR_resultset2'
prep_extract_summary(r, ...)
Arguments
r |
dq_report2 a dq_report2 report |
... |
not used |
Value
list with two slots Data
and Table
with data.frames
featuring all metrics columns
from the report r
, the STUDY_SEGMENT and the VAR_NAMES.
In case of Data
, the columns are formatted nicely but still
with the standardized column names – use
util_translate_indicator_metrics()
to rename them nicely. In
case of Table
, just as they are.
See Also
prep_combine_report_summaries()
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Read data from files/URLs
Description
data_frame_name can be a file path or an URL you can append a pipe and a
sheet name for Excel files or object name e.g. for RData
files. Numbers
may also work. All file formats supported by your rio
installation will
work.
Usage
prep_get_data_frame(
data_frame_name,
.data_frame_list = .dataframe_environment(),
keep_types = FALSE,
column_names_only = FALSE
)
Arguments
data_frame_name |
character name of the data frame to read, see details |
.data_frame_list |
environment cache for loaded data frames |
keep_types |
logical keep types as possibly defined in a file, if the
data frame is loaded from one. set |
column_names_only |
logical if TRUE imports only headers (column names) of the data frame and no content (an empty data frame) |
Details
The data frames will be cached automatically, you can define an alternative
environment for this using the argument .data_frame_list
, and you can purge
the cache using prep_purge_data_frame_cache.
Use prep_add_data_frames to manually add data frames to the cache, e.g., if you have loaded them from more complex sources, before.
Value
data.frame a data frame
See Also
Other data-frame-cache:
prep_add_data_frames()
,
prep_list_dataframes()
,
prep_load_folder_with_metadata()
,
prep_load_workbook_like_file()
,
prep_purge_data_frame_cache()
,
prep_remove_from_cache()
Examples
## Not run:
bl <- as.factor(prep_get_data_frame(
paste0("https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus",
"/Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=",
"publicationFile|COVID_Todesfälle_BL|Bundesland"))[[1]])
n <- as.numeric(prep_get_data_frame(paste0(
"https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/",
"Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=",
"publicationFile|COVID_Todesfälle_BL|Anzahl verstorbene",
" COVID-19 Fälle"))[[1]])
plot(bl, n)
# Working names would be to date (2022-10-21), e.g.:
#
# https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/ \
# Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=publicationFile
# https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/ \
# Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=publicationFile|2
# https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/ \
# Projekte_RKI/COVID-19_Todesfaelle.xlsx?__blob=publicationFile|name
# study_data
# ship
# meta_data
# ship_meta
#
prep_get_data_frame("meta_data | meta_data")
## End(Not run)
Fetch a label for a variable based on its purpose
Description
Fetch a label for a variable based on its purpose
Usage
prep_get_labels(
resp_vars,
item_level = "item_level",
label_col,
max_len = MAX_LABEL_LEN,
label_class = c("SHORT", "LONG"),
label_lang = getOption("dataquieR.lang", ""),
resp_vars_are_var_names_only = FALSE,
resp_vars_match_label_col_only = FALSE,
meta_data = item_level,
meta_data_v2,
force_label_col = getOption("dataquieR.force_label_col",
dataquieR.force_label_col_default)
)
Arguments
resp_vars |
variable list the variable names to fetch for |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
max_len |
integer the maximum label length to return, if not possible w/o causing ambiguous labels, the labels may still be longer |
label_class |
enum SHORT | LONG. which sort of label according to the metadata model should be returned |
label_lang |
character optional language suffix, if available in
the metadata. Can be controlled by the option
|
resp_vars_are_var_names_only |
logical If |
resp_vars_match_label_col_only |
logical If |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
force_label_col |
enum auto | FALSE | TRUE. if |
Value
character suitable labels for each resp_vars
, names of this
vector are VAR_NAMES
Examples
## Not run:
prep_load_workbook_like_file("meta_data_v2")
prep_get_labels("SEX_0", label_class = "SHORT", max_len = 2)
## End(Not run)
Get data frame for a given segment
Description
Get data frame for a given segment
Usage
prep_get_study_data_segment(
segment,
study_data,
item_level = "item_level",
meta_data = item_level,
meta_data_v2,
segment_level,
meta_data_segment = "segment_level"
)
Arguments
segment |
character name of the segment to return data for |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
segment_level |
data.frame alias for |
meta_data_segment |
data.frame – optional: Segment level metadata |
Value
data.frame the data for the segment
Return the logged-in User's Full Name
Description
If whoami
is not installed, the user name from
Sys.info()
is returned.
Usage
prep_get_user_name()
Details
Can be overridden by options or environment:
options(FULLNAME = "Stephan Struckmann")
Sys.setenv(FULLNAME = "Stephan Struckmann")
Value
character the user's name
Get machine variant for snapshot tests
Description
Get machine variant for snapshot tests
Usage
prep_get_variant()
Value
character the variant
Guess encoding of text or text files
Description
Guess encoding of text or text files
Usage
prep_guess_encoding(x, file)
Arguments
x |
character string to guess encoding for |
file |
character file to guess encoding for |
Value
encoding
Prepare a label as part of a link for RMD
files
Description
Prepare a label as part of a link for RMD
files
Usage
prep_link_escape(s, html = FALSE)
Arguments
s |
the label |
html |
prepare the label for direct |
Value
the escaped label
List Loaded Data Frames
Description
List Loaded Data Frames
Usage
prep_list_dataframes()
Value
names of all loaded data frames
See Also
Other data-frame-cache:
prep_add_data_frames()
,
prep_get_data_frame()
,
prep_load_folder_with_metadata()
,
prep_load_workbook_like_file()
,
prep_purge_data_frame_cache()
,
prep_remove_from_cache()
All valid voc:
vocabularies
Description
All valid voc:
vocabularies
Usage
prep_list_voc()
Value
character()
all voc:
suffixes allowed for
prep_get_data_frame()
.
Examples
## Not run:
prep_list_dataframes()
prep_list_voc()
prep_get_data_frame("<ICD10>")
my_voc <-
tibble::tribble(
~ voc, ~ url,
"test", "data:datasets|iris|Species+Sepal.Length")
prep_add_data_frames(`<>` = my_voc)
prep_list_dataframes()
prep_list_voc()
prep_get_data_frame("<test>")
prep_get_data_frame("<ICD10>")
my_voc <-
tibble::tribble(
~ voc, ~ url,
"ICD10", "data:datasets|iris|Species+Sepal.Length")
prep_add_data_frames(`<>` = my_voc)
prep_list_dataframes()
prep_list_voc()
prep_get_data_frame("<ICD10>")
## End(Not run)
Pre-load a folder with named (usually more than) one table(s)
Description
These can thereafter be referred to by their names only. Such files are,
e.g., spreadsheet-workbooks or RData
-files.
Usage
prep_load_folder_with_metadata(folder, keep_types = FALSE, ...)
Arguments
folder |
the folder name to load. |
keep_types |
logical keep types as possibly defined in the file.
set |
... |
arguments passed to [] |
Details
Note, that this function in contrast to prep_get_data_frame does neither support selecting specific sheets/columns from a file.
Value
invisible(the cache environment)
See Also
Other data-frame-cache:
prep_add_data_frames()
,
prep_get_data_frame()
,
prep_list_dataframes()
,
prep_load_workbook_like_file()
,
prep_purge_data_frame_cache()
,
prep_remove_from_cache()
Load a dq_report2
Description
Load a dq_report2
Usage
prep_load_report(file)
Arguments
file |
character the file name to load from |
Value
dataquieR_resultset2 the report
Load a report from a back-end
Description
Load a report from a back-end
Usage
prep_load_report_from_backend(
namespace = "objects",
db_dir,
storr_factory = prep_create_storr_factory(namespace = namespace, db_dir = db_dir)
)
Arguments
namespace |
the namespace to read the report's results from |
db_dir |
character path to the directory for the back-end, if
a |
storr_factory |
a function returning a |
Value
dataquieR_resultset2 the report
See Also
Examples
## Not run:
r <- dataquieR::dq_report2("study_data", meta_data_v2 = "meta_data_v2",
dimensions = NULL)
storr_factory <- prep_create_storr_factory()
r_storr <- prep_set_backend(r, storr_factory)
r_restorr <- prep_set_backend(r_storr, NULL)
r_loaded <- prep_load_report_from_backend(storr_factory)
## End(Not run)
Pre-load a file with named (usually more than) one table(s)
Description
These can thereafter be referred to by their names only. Such files are,
e.g., spreadsheet-workbooks or RData
-files.
Usage
prep_load_workbook_like_file(file, keep_types = FALSE)
Arguments
file |
the file name to load. |
keep_types |
logical keep types as possibly defined in the file.
set |
Details
Note, that this function in contrast to prep_get_data_frame does neither support selecting specific sheets/columns from a file.
Value
invisible(the cache environment)
See Also
Other data-frame-cache:
prep_add_data_frames()
,
prep_get_data_frame()
,
prep_list_dataframes()
,
prep_load_folder_with_metadata()
,
prep_purge_data_frame_cache()
,
prep_remove_from_cache()
Support function to allocate labels to variables
Description
Map variables to certain attributes, e.g. by default their labels.
Usage
prep_map_labels(
x,
item_level = "item_level",
to = LABEL,
from = VAR_NAMES,
ifnotfound,
warn_ambiguous = FALSE,
meta_data_v2,
meta_data = item_level
)
Arguments
x |
character variable names, character vector, see parameter from |
item_level |
data.frame metadata data frame, if, as a |
to |
character variable attribute to map to |
from |
character variable identifier to map from |
ifnotfound |
list A list of values to be used if the item is not found: it will be coerced to a list if necessary. |
warn_ambiguous |
logical print a warning if mapping variables from
|
meta_data_v2 |
character path to workbook like metadata file, see
|
meta_data |
data.frame old name for |
Details
This function basically calls colnames(study_data) <- meta_data$LABEL
,
ensuring correct merging/joining of study data columns to the corresponding
metadata rows, even if the orders differ. If a variable/study_data-column
name is not found in meta_data[[from]]
(default from = VAR_NAMES
),
either stop is called or, if ifnotfound
has been assigned a value, that
value is returned. See mget
, which is internally used by this function.
The function not only maps to the LABEL
column, but to
can be any
metadata variable attribute, so the function can also be used, to get, e.g.
all HARD_LIMITS
from the metadata.
Value
a character vector with:
mapped values
Examples
## Not run:
meta_data <- prep_create_meta(
VAR_NAMES = c("ID", "SEX", "AGE", "DOE"),
LABEL = c("Pseudo-ID", "Gender", "Age", "Examination Date"),
DATA_TYPE = c(DATA_TYPES$INTEGER, DATA_TYPES$INTEGER, DATA_TYPES$INTEGER,
DATA_TYPES$DATETIME),
MISSING_LIST = ""
)
stopifnot(all(prep_map_labels(c("AGE", "DOE"), meta_data) == c("Age",
"Examination Date")))
## End(Not run)
Merge a list of study data frames to one (sparse) study data frame
Description
Merge a list of study data frames to one (sparse) study data frame
Usage
prep_merge_study_data(study_data_list)
Arguments
study_data_list |
list the list |
Value
Convert item-level metadata from v1.0 to v2.0
Description
This function is idempotent..
Usage
prep_meta_data_v1_to_item_level_meta_data(
item_level = "item_level",
verbose = TRUE,
label_col = LABEL,
cause_label_df,
meta_data = item_level
)
Arguments
item_level |
data.frame the old item-level-metadata |
verbose |
logical display all estimated decisions, defaults to |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
cause_label_df |
data.frame missing code table, see cause_label_df. Optional. If this argument is given, you can add missing code tables. |
meta_data |
data.frame old name for |
Details
The options("dataquieR.force_item_specific_missing_codes")
(default
FALSE
) tells the system, to always fill in res_vars
columns to the
MISSING_LIST_TABLE
, even, if the column already exists, but is empty.
Value
data.frame the updated metadata
Support function to identify the levels of a process variable with minimum number of observations
Description
utility function to subset data based on minimum number of observation per level
Usage
prep_min_obs_level(study_data, group_vars, min_obs_in_subgroup)
Arguments
study_data |
data.frame the data frame that contains the measurements |
group_vars |
variable list the name grouping variable |
min_obs_in_subgroup |
integer optional argument if a "group_var" is used. This argument specifies the minimum no. of observations that is required to include a subgroup (level) of the "group_var" in the analysis. Subgroups with less observations are excluded. The default is 30. |
Details
This functions removes observations having fewer than min_obs_in_subgroup
distinct values in a group variable, e.g. blood pressure measurements
performed by an examiner having fewer than e.g. 50 measurements done. It
displays a warning, if samples/rows are removed and returns the modified
study data frame.
Value
a data frame with:
a subsample of original data
Open a data frame in Excel
Description
Open a data frame in Excel
Usage
prep_open_in_excel(dfr)
Arguments
dfr |
the data frame |
Details
if the file cannot be read on function exit, NULL will be returned
Value
potentially modified data frame after dialog was closed
Support function for a parallel pmap
Description
parallel version of purrr::pmap
Usage
prep_pmap(.l, .f, ..., cores = 0)
Arguments
.l |
data.frame with one call per line and one function argument per column |
.f |
|
... |
additional, static arguments for calling |
cores |
number of cpu cores to use or a (named) list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. Set to 0 to run without parallelization. |
Value
list of results of the function calls
Author(s)
S Struckmann
See Also
purrr::pmap
Prepare and verify study data with metadata
Description
This function ensures, that a data frame ds1
with suitable variable
names study_data and meta_data exist as base data.frames.
Usage
prep_prepare_dataframes(
.study_data,
.meta_data,
.label_col,
.replace_hard_limits,
.replace_missings,
.sm_code = NULL,
.allow_empty = FALSE,
.adjust_data_type = TRUE,
.amend_scale_level = TRUE,
.apply_factor_metadata = FALSE,
.apply_factor_metadata_inadm = FALSE,
.internal = rlang::env_inherits(rlang::caller_env(), parent.env(environment()))
)
Arguments
.study_data |
if provided, use this data set as study_data |
.meta_data |
if provided, use this data set as meta_data |
.label_col |
if provided, use this as label_col |
.replace_hard_limits |
replace |
.replace_missings |
replace missing codes, defaults to |
.sm_code |
missing code for |
.allow_empty |
allow |
.adjust_data_type |
ensure that the data type of variables in the study data corresponds to their data type specified in the metadata |
.amend_scale_level |
ensure that |
.apply_factor_metadata |
logical convert categorical variables to labeled factors. |
.apply_factor_metadata_inadm |
logical convert categorical variables
to labeled factors keeping
inadmissible values. Implies, that
.apply_factor_metadata will be set
to |
.internal |
logical internally called, modify caller's environment. |
Details
This function defines ds1
and modifies study_data
and meta_data
in the
environment of its caller (see eval.parent). It also defines or modifies
the object label_col
in the calling environment. Almost all functions
exported by dataquieR
call this function initially, so that aspects common
to all functions live here, e.g. testing, if an argument meta_data
has been
given and features really a data.frame. It verifies the existence of
required metadata attributes (VARATT_REQUIRE_LEVELS). It can also replace
missing codes by NA
s, and calls prep_study2meta to generate a minimum
set of metadata from the study data on the fly (should be amended, so
on-the-fly-calling is not recommended for an instructive use of dataquieR
).
The function also detects tibbles
, which are then converted to base-R
data.frames, which are expected by dataquieR
.
If .internal
is TRUE
, differently from the other utility function that
work in their caller's environment, this function modifies objects in the
calling function's environment. It defines a new object ds1
,
it modifies study_data
and/or meta_data
and label_col
.
Value
ds1
the study data with mapped column names
See Also
acc_margins
Examples
## Not run:
acc_test1 <- function(resp_variable, aux_variable,
time_variable, co_variables,
group_vars, study_data, meta_data) {
prep_prepare_dataframes()
invisible(ds1)
}
acc_test2 <- function(resp_variable, aux_variable,
time_variable, co_variables,
group_vars, study_data, meta_data, label_col) {
ds1 <- prep_prepare_dataframes(study_data, meta_data)
invisible(ds1)
}
environment(acc_test1) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)
environment(acc_test2) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)
acc_test3 <- function(resp_variable, aux_variable, time_variable,
co_variables, group_vars, study_data, meta_data,
label_col) {
prep_prepare_dataframes()
invisible(ds1)
}
acc_test4 <- function(resp_variable, aux_variable, time_variable,
co_variables, group_vars, study_data, meta_data,
label_col) {
ds1 <- prep_prepare_dataframes(study_data, meta_data)
invisible(ds1)
}
environment(acc_test3) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)
environment(acc_test4) <- asNamespace("dataquieR")
# perform this inside the package (not needed for functions that have been
# integrated with the package already)
meta_data <- prep_get_data_frame("meta_data")
study_data <- prep_get_data_frame("study_data")
try(acc_test1())
try(acc_test2())
acc_test1(study_data = study_data)
try(acc_test1(meta_data = meta_data))
try(acc_test2(study_data = 12, meta_data = meta_data))
print(head(acc_test1(study_data = study_data, meta_data = meta_data)))
print(head(acc_test2(study_data = study_data, meta_data = meta_data)))
print(head(acc_test3(study_data = study_data, meta_data = meta_data)))
print(head(acc_test3(study_data = study_data, meta_data = meta_data,
label_col = LABEL)))
print(head(acc_test4(study_data = study_data, meta_data = meta_data)))
print(head(acc_test4(study_data = study_data, meta_data = meta_data,
label_col = LABEL)))
try(acc_test2(study_data = NULL, meta_data = meta_data))
## End(Not run)
Clear data frame cache
Description
Clear data frame cache
Usage
prep_purge_data_frame_cache()
Value
nothing
See Also
Other data-frame-cache:
prep_add_data_frames()
,
prep_get_data_frame()
,
prep_list_dataframes()
,
prep_load_folder_with_metadata()
,
prep_load_workbook_like_file()
,
prep_remove_from_cache()
Remove a specified element from the data frame cache
Description
Remove a specified element from the data frame cache
Usage
prep_remove_from_cache(object_to_remove)
Arguments
object_to_remove |
character name of the object to be removed as character string (quoted), or character vector containing the names of the objects to remove from the cache |
Value
nothing
See Also
Other data-frame-cache:
prep_add_data_frames()
,
prep_get_data_frame()
,
prep_list_dataframes()
,
prep_load_folder_with_metadata()
,
prep_load_workbook_like_file()
,
prep_purge_data_frame_cache()
Examples
## Not run:
prep_load_workbook_like_file("meta_data_v2") #load metadata in the cache
ls(.dataframe_environment()) #get the list of dataframes in the cache
#remove cross-item_level from the cache
prep_remove_from_cache("cross-item_level")
#remove dataframe_level and expected_id from the cache
prep_remove_from_cache(c("dataframe_level", "expected_id"))
#remove missing_table and segment_level from the cache
x<- c("missing_table", "segment_level")
prep_remove_from_cache(x)
## End(Not run)
Create a ggplot2
pie chart
Description
Create a ggplot2
pie chart
Usage
prep_render_pie_chart_from_summaryclasses_ggplot2(
data,
meta_data = "item_level"
)
Arguments
data |
data as returned by |
meta_data |
Value
a ggplot2::ggplot2 plot
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Create a plotly
pie chart
Description
Create a plotly
pie chart
Usage
prep_render_pie_chart_from_summaryclasses_plotly(
data,
meta_data = "item_level"
)
Arguments
data |
data as returned by |
meta_data |
Value
a htmltools
compatible object
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Guess the data type of a vector
Description
Guess the data type of a vector
Usage
prep_robust_guess_data_type(x, k = 50, it = 200)
Arguments
x |
a vector with characters |
k |
numeric sample size, if less than |
it |
integer number of iterations when taking samples |
Value
a guess of the data type of x
. An attribute orig_type
is also
attached to give the more detailed guess returned by readr::guess_parser()
.
Algorithm
This function takes x
and tries to guess the data type of random subsets of
this vector using readr::guess_parser()
. The RNG is initialized with a
constant, so the function stays deterministic. It does such sub-sample based
checks it
times, the majority of the detected datatype determines the
guessed data type.
Save a dq_report2
Description
Save a dq_report2
Usage
prep_save_report(report, file, compression_level = 3)
Arguments
report |
dataquieR_resultset2 the report |
file |
character the file name to write to |
compression_level |
integer from=0 to=9. Compression level. 9 is very slow. |
Value
invisible(NULL)
Heuristics to amend a SCALE_LEVEL column and a UNIT column in the metadata
Description
...if missing
Usage
prep_scalelevel_from_data_and_metadata(
resp_vars = lifecycle::deprecated(),
study_data,
item_level = "item_level",
label_col = LABEL,
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list deprecated, the function always addresses all variables. |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
data.frame modified metadata
Examples
## Not run:
prep_load_workbook_like_file("meta_data_v2")
prep_scalelevel_from_data_and_metadata(study_data = "study_data")
## End(Not run)
Change the back-end of a report
Description
with this function, you can move a report from/to a storr
storage.
Usage
prep_set_backend(r, storr_factory = NULL, amend = FALSE)
Arguments
r |
dataquieR_resultset2 the report |
storr_factory |
|
amend |
logical if there is already data in. |
Value
dataquieR_resultset2 but now with the desired back-end
Guess a metadata data frame from study data.
Description
Guess a minimum metadata data frame from study data. Minimum required variable attributes are:
Usage
prep_study2meta(
study_data,
level = c(VARATT_REQUIRE_LEVELS$REQUIRED, VARATT_REQUIRE_LEVELS$RECOMMENDED),
cumulative = TRUE,
convert_factors = FALSE,
guess_missing_codes = getOption("dataquieR.guess_missing_codes",
dataquieR.guess_missing_codes_default)
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
level |
enum levels to provide (see also VARATT_REQUIRE_LEVELS) |
cumulative |
logical include attributes of all levels up to level |
convert_factors |
logical convert factor columns to coded integers. if selected, then also the study data will be updated and returned. |
guess_missing_codes |
logical try to guess missing codes from the data |
Details
dataquieR:::util_get_var_att_names_of_level(VARATT_REQUIRE_LEVELS$REQUIRED) #> VAR_NAMES DATA_TYPE MISSING_LIST_TABLE #> "VAR_NAMES" "DATA_TYPE" "MISSING_LIST_TABLE"
The function also tries to detect missing codes.
Value
a meta_data data frame or a list with study data and metadata, if
convert_factors == TRUE
.
Examples
## Not run:
dataquieR::prep_study2meta(Orange, convert_factors = FALSE)
## End(Not run)
Classify metrics from a report summary table
Description
Classify metrics from a report summary table
Usage
prep_summary_to_classes(report_summary)
Arguments
report_summary |
|
Value
data.frame classes for the report summary table, long format
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Prepare a label as part of a title text for RMD
files
Description
Prepare a label as part of a title text for RMD
files
Usage
prep_title_escape(s, html = FALSE)
Arguments
s |
the label |
html |
prepare the label for direct |
Value
the escaped label
Remove data disclosing details
Description
new function: no warranty, so far.
Usage
prep_undisclose(x)
Arguments
x |
an object to un-disclose, a |
Value
undisclosed object
Combine all missing and value lists to one big table
Description
Combine all missing and value lists to one big table
Usage
prep_unsplit_val_tabs(meta_data = "item_level", val_tab = NULL)
Arguments
meta_data |
data.frame item level meta data to be used, defaults to
|
val_tab |
character name of the table being created: This table will
be added to the data frame cache (or overwritten). If |
Value
data.frame the combined table
Get value labels from data
Description
Detects factors and converts them to compatible metadata/study data.
Usage
prep_valuelabels_from_data(resp_vars = colnames(study_data), study_data)
Arguments
resp_vars |
variable names of the variables to fetch the value labels from the data |
study_data |
data.frame the data frame that contains the measurements |
Value
a list with:
-
VALUE_LABELS
: vector of value labels and modified study data -
ModifiedStudyData
: study data with factors as integers
Examples
## Not run:
dataquieR::prep_datatype_from_data(iris)
## End(Not run)
Print a DataSlot
object
Description
Print a DataSlot
object
Usage
## S3 method for class 'DataSlot'
print(x, ...)
Arguments
x |
the object |
... |
not used |
Value
see print
print implementation for the class ReportSummaryTable
Description
Use this function to print results objects of the class
ReportSummaryTable
.
Usage
## S3 method for class 'ReportSummaryTable'
print(
x,
relative = lifecycle::deprecated(),
dt = FALSE,
fillContainer = FALSE,
displayValues = FALSE,
view = TRUE,
...,
flip_mode = "auto"
)
Arguments
x |
|
relative |
deprecated |
dt |
logical use |
fillContainer |
logical if |
displayValues |
logical if |
view |
logical if |
... |
not used, yet |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
Value
the printed object
See Also
base::print
Print a Slot
object
Description
displays all warnings and stuff. then it prints x
.
Usage
## S3 method for class 'Slot'
print(x, ...)
Arguments
x |
the object |
... |
not used |
Value
calls the next print method
Print a StudyDataSlot
object
Description
Print a StudyDataSlot
object
Usage
## S3 method for class 'StudyDataSlot'
print(x, ...)
Arguments
x |
the object |
... |
not used |
Value
see print
Print a TableSlot
object
Description
Print a TableSlot
object
Usage
## S3 method for class 'TableSlot'
print(x, ...)
Arguments
x |
the object |
... |
not used |
Value
see print
Print a dataquieR result returned by dq_report2
Description
Print a dataquieR result returned by dq_report2
Usage
## S3 method for class 'dataquieR_result'
print(x, ...)
Arguments
x |
list a dataquieR result from dq_report2 or util_eval_to_dataquieR_result |
... |
passed to print. Additionally, the argument |
Value
see print
See Also
Generate a RMarkdown-based report from a dataquieR report
Description
Generate a RMarkdown-based report from a dataquieR report
Usage
## S3 method for class 'dataquieR_resultset'
print(...)
Arguments
... |
deprecated |
Value
deprecated
Generate a HTML-based report from a dataquieR report
Description
Generate a HTML-based report from a dataquieR report
Usage
## S3 method for class 'dataquieR_resultset2'
print(
x,
dir,
view = TRUE,
disable_plotly = FALSE,
block_load_factor = 4,
advanced_options = list(),
dashboard = NA,
...
)
Arguments
x |
|
dir |
character directory to store the rendered report's files, a temporary one, if omitted. Directory will be created, if missing, files may be overwritten inside that directory |
view |
logical display the report |
disable_plotly |
logical do not use |
block_load_factor |
numeric multiply size of parallel compute blocks by this factor. |
advanced_options |
list options to set during report computation,
see |
dashboard |
logical dashboard mode: |
... |
additional arguments: |
Value
file names of the generated report's HTML files
Print a dataquieR
summary
Description
Print a dataquieR
summary
Usage
## S3 method for class 'dataquieR_summary'
print(
x,
...,
grouped_by = c("call_names", "indicator_metric"),
dont_print = FALSE,
folder_of_report = NULL
)
Arguments
x |
the |
... |
not yet used |
grouped_by |
define the columns of the resulting matrix. It can be either "call_names", one column per function, or "indicator_metric", one column per indicator or both c("call_names", "indicator_metric"). The last combination is the default |
dont_print |
suppress the actual printing, just return a printable
object derived from |
folder_of_report |
a named vector with the location of variable and call_names |
Value
invisible html object
print implementation for the class interval
Description
such objects, for now, only occur in RECCap
rules, so this function
is meant for internal use, mostly – for now.
Usage
## S3 method for class 'interval'
print(x, ...)
Arguments
x |
|
... |
not used yet |
Value
the printed object
See Also
base::print
print a list of dataquieR_result
objects
Description
print a list of dataquieR_result
objects
Usage
## S3 method for class 'list'
print(x, ...)
Arguments
x |
|
... |
passed to other implementations |
Value
undefined
Print a master_result
object
Description
Print a master_result
object
Usage
## S3 method for class 'master_result'
print(x, ...)
Arguments
x |
the object |
... |
not used |
Value
invisible(NULL)
Check applicability of DQ functions on study data
Description
Checks applicability of DQ functions based on study data and metadata characteristics
Usage
pro_applicability_matrix(
study_data,
item_level = "item_level",
split_segments = FALSE,
label_col,
max_vars_per_plot = 20,
meta_data_segment,
meta_data_dataframe,
flip_mode = "noflip",
meta_data_v2,
meta_data = item_level,
segment_level,
dataframe_level
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
split_segments |
logical return one matrix per study segment |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
max_vars_per_plot |
integer from=0. The maximum number of variables per single plot. |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_dataframe |
data.frame – optional: Data frame level metadata |
flip_mode |
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the |
meta_data_v2 |
character path to workbook like metadata file, see
|
meta_data |
data.frame old name for |
segment_level |
data.frame alias for |
dataframe_level |
data.frame alias for |
Details
This is a preparatory support function that compares study data with associated metadata. A prerequisite of this function is that the no. of columns in the study data complies with the no. of rows in the metadata.
For each existing R-implementation, the function searches for necessary static metadata and returns a heatmap like matrix indicating the applicability of each data quality implementation.
In addition, the data type defined in the metadata is compared with the observed data type in the study data.
Value
a list with:
-
SummaryTable
: data frame about the applicability of each indicator function (each function in a column). its integer values can be one of the following four categories: 0. Non-matching datatype + Incomplete metadata, 1. Non-matching datatype + complete metadata, 2. Matching datatype + Incomplete metadata, 3. Matching datatype + complete metadata, 4. Not applicable according to data type -
ApplicabilityPlot
: ggplot2::ggplot2 heatmap plot, graphical representation ofSummaryTable
-
ApplicabilityPlotList
: list of plots per (maybe artificial) segment -
ReportSummaryTable
: data frame underlyingApplicabilityPlot
Combine ReportSummaryTable
outputs
Description
Using this rbind
implementation, you can combine different
heatmap-like results of the class ReportSummaryTable
.
Usage
## S3 method for class 'ReportSummaryTable'
rbind(...)
Arguments
... |
|
See Also
Return names of result slots (e.g., 3rd dimension of dataquieR results)
Description
Return names of result slots (e.g., 3rd dimension of dataquieR results)
Usage
resnames(x)
Arguments
x |
the objects |
Value
character vector with names
Return names of result slots (e.g., 3rd dimension of dataquieR results)
Description
Return names of result slots (e.g., 3rd dimension of dataquieR results)
Usage
## S3 method for class 'dataquieR_resultset2'
resnames(x)
Arguments
x |
the objects |
Value
character vector with names
Data frame with the study data whose quality is being assessed
Description
Study data is expected in wide format. If should contain all variables for all segments in one large table, even, if some variables are not measured for all observational utils (study participants).
Summarize a dataquieR report
Description
Deprecated
Usage
## S3 method for class 'dataquieR_resultset'
summary(...)
Arguments
... |
Deprecated |
Value
Deprecated
Generate a report summary table
Description
Generate a report summary table
Usage
## S3 method for class 'dataquieR_resultset2'
summary(
object,
aspect = c("applicability", "error", "anamat", "indicator_or_descriptor"),
FUN,
collapse = "\n<br />\n",
...
)
Arguments
object |
a square result set |
aspect |
an aspect/problem category of results |
FUN |
function to apply to the cells of the result table |
collapse |
passed to |
... |
not used |
Value
a summary of a dataquieR
report
Examples
## Not run:
util_html_table(summary(report),
filter = "top", options = list(scrollCollapse = TRUE, scrollY = "75vh"),
is_matrix_table = TRUE, rotate_headers = TRUE, output_format = "HTML"
)
## End(Not run)
Utility function for 3SD deviations rule
Description
This function calculates outliers according to the rule of 3SD deviations.
Usage
util_3SD(x)
Arguments
x |
numeric data to check for outliers |
Value
binary vector
See Also
Other outlier_functions:
util_hubert()
,
util_sigmagap()
,
util_tukey()
Abbreviate snake_case function names to shortened CamelCase
Description
Abbreviate snake_case function names to shortened CamelCase
Usage
util_abbreviate(x)
Arguments
x |
a vector of indicator function names |
Value
abbreviations
See Also
Other process_functions:
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
Abbreviate a vector of strings
Description
Abbreviate a vector of strings
Usage
util_abbreviate_unique(initial, max_value_label_len)
Arguments
initial |
character vector with stuff to abbreviate |
max_value_label_len |
integer maximum length (may not strictly
be met, if not possible keeping a maybe
detected uniqueness of |
Value
character uniquely abbreviated initial
See Also
Other string_functions:
util_filter_names_by_regexps()
,
util_pretty_vector_string()
,
util_set_dQuoteString()
,
util_set_sQuoteString()
,
util_sub_string_left_from_.()
,
util_sub_string_right_from_.()
,
util_translate()
Utility function for smoothed longitudinal trends from logistic regression models
Description
This function is under development. It computes a logistic regression for
binary variables and visualizes smoothed time trends of the residuals by
LOESS or GAM. The function can also be called for non-binary outcome
variables. These will be transformed to binary variables, either using
user-specified groups in the metadata columns RECODE_CASES
and/or
RECODE_CONTROL
(see util_dichotomize
), or it will attempt to recode the
variables automatically. For nominal variables, it will consider the most
frequent category as 'cases' and every other category as 'control', if there
are more than two categories. Nominal variables with only two distinct values
will be transformed by assigning the less frequent category to 'cases' and
the more frequent category to 'control'. For variables of other statistical
data types, values inside the interquartile range are considered as
'control', values outside this range as 'cases'. Variables with few
different values are transformed in a simplified way to obtain two groups.
Usage
util_acc_loess_bin(
resp_vars,
label_col = NULL,
study_data,
item_level = "item_level",
group_vars = NULL,
time_vars,
co_vars = NULL,
min_obs_in_subgroup = 30,
resolution = 80,
plot_format = getOption("dataquieR.acc_loess.plot_format",
dataquieR.acc_loess.plot_format_default),
meta_data = item_level,
n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
dataquieR.max_group_var_levels_in_plot_default),
enable_GAM = getOption("dataquieR.GAM_for_LOESS", dataquieR.GAM_for_LOESS.default),
exclude_constant_subgroups =
getOption("dataquieR.acc_loess.exclude_constant_subgroups",
dataquieR.acc_loess.exclude_constant_subgroups.default),
min_bandwidth = getOption("dataquieR.acc_loess.min_bw",
dataquieR.acc_loess.min_bw.default),
min_proportion = getOption("dataquieR.acc_loess.min_proportion",
dataquieR.acc_loess.min_proportion.default)
)
Arguments
resp_vars |
variable the name of the (binary) measurement variable |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
group_vars |
variable the name of the observer, device or reader variable |
time_vars |
variable the name of the variable giving the time of measurement |
co_vars |
variable list a vector of co-variables, e.g. age and sex for adjustment |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
resolution |
integer the maximum number of time points used for plotting the trend lines |
plot_format |
enum AUTO | COMBINED | FACETS | BOTH. Return the plot
as one combined plot for all groups or as
facet plots (one figure per group). |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
n_group_max |
integer maximum number of categories to be displayed
individually for the grouping variable ( |
enable_GAM |
logical Can LOESS computations be replaced by general additive models to reduce memory consumption for large datasets? |
exclude_constant_subgroups |
logical Should subgroups with constant values be excluded? |
min_bandwidth |
numeric lower limit for the LOESS bandwidth, should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line. |
min_proportion |
numeric lower limit for the proportion of the smaller group (cases or controls) for creating a LOESS figure, should be greater than 0 and less than 0.4. |
Details
Value
a list with:
-
SummaryPlotList
: a plot.
Utility function for smoothes and plots adjusted longitudinal measurements
Description
The following R implementation executes calculations for quality indicator "Unexpected location" (see here. Local regression (LOESS) is a versatile statistical method to explore an averaged course of time series measurements (Cleveland, Devlin, and Grosse 1988). In context of epidemiological data, repeated measurements using the same measurement device or by the same examiner can be considered a time series. LOESS allows to explore changes in these measurements over time.
Usage
util_acc_loess_continuous(
resp_vars,
label_col = NULL,
study_data,
item_level = "item_level",
group_vars = NULL,
time_vars,
co_vars = NULL,
min_obs_in_subgroup = 30,
resolution = 80,
comparison_lines = list(type = c("mean/sd", "quartiles"), color = "grey30", linetype =
2, sd_factor = 0.5),
mark_time_points = getOption("dataquieR.acc_loess.mark_time_points",
dataquieR.acc_loess.mark_time_points_default),
plot_observations = getOption("dataquieR.acc_loess.plot_observations",
dataquieR.acc_loess.plot_observations_default),
plot_format = getOption("dataquieR.acc_loess.plot_format",
dataquieR.acc_loess.plot_format_default),
meta_data = item_level,
n_group_max = getOption("dataquieR.max_group_var_levels_in_plot",
dataquieR.max_group_var_levels_in_plot_default),
enable_GAM = getOption("dataquieR.GAM_for_LOESS", dataquieR.GAM_for_LOESS.default),
exclude_constant_subgroups =
getOption("dataquieR.acc_loess.exclude_constant_subgroups",
dataquieR.acc_loess.exclude_constant_subgroups.default),
min_bandwidth = getOption("dataquieR.acc_loess.min_bw",
dataquieR.acc_loess.min_bw.default)
)
Arguments
resp_vars |
variable the name of the continuous (or binary) measurement variable |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
group_vars |
variable the name of the observer, device or reader variable |
time_vars |
variable the name of the variable giving the time of measurement |
co_vars |
variable list a vector of co-variables for adjustment, for example age and sex. Can be NULL (default) for no adjustment. |
min_obs_in_subgroup |
integer (optional argument) If |
resolution |
integer the maximum number of time points used for plotting the trend lines |
comparison_lines |
list type and style of lines with which trend
lines are to be compared. Can be mean +/- 0.5
standard deviation (the factor can be specified
differently in |
mark_time_points |
logical mark time points with observations (caution, there may be many marks) |
plot_observations |
logical show observations as scatter plot in the
background. If there are |
plot_format |
enum AUTO | COMBINED | FACETS | BOTH. Return the plot
as one combined plot for all groups or as
facet plots (one figure per group). |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
n_group_max |
integer maximum number of categories to be displayed
individually for the grouping variable ( |
enable_GAM |
logical Can LOESS computations be replaced by general additive models to reduce memory consumption for large datasets? |
exclude_constant_subgroups |
logical Should subgroups with constant values be excluded? |
min_bandwidth |
numeric lower limit for the LOESS bandwidth, should be greater than 0 and less than or equal to 1. In general, increasing the bandwidth leads to a smoother trend line. |
Details
If mark_time_points
or plot_observations
is selected, but would result in
plotting more than 400 points, only a sample of the data will be displayed.
Limitations
The application of LOESS requires model fitting, i.e. the smoothness
of a model is subject to a smoothing parameter (span).
Particularly in the presence of interval-based missing data, high
variability of measurements combined with a low number of
observations in one level of the group_vars
may distort the fit.
Since our approach handles data without knowledge
of such underlying characteristics, finding the best fit is complicated if
computational costs should be minimal. The default of
LOESS in R uses a span of 0.75, which provides in most cases reasonable fits.
The function util_acc_loess_continuous
adapts the span for
each level of the group_vars
(with at least as many observations as specified in min_obs_in_subgroup
and with at least three time points) based on the respective
number of observations.
LOESS consumes a lot of memory for larger datasets.
That is why util_acc_loess_continuous
switches to a generalized additive model with integrated smoothness
estimation (gam
by mgcv
) if there are 1000 observations or more for
at least one level of the group_vars
(similar to geom_smooth
from ggplot2
).
Value
a list with:
-
SummaryPlotList
: list with two plots ifplot_format = "BOTH"
, otherwise one of the two figures described below:-
Loess_fits_facets
: The plot contains LOESS-smoothed curves for each level of thegroup_vars
in a separate panel. Added trend lines represent mean and standard deviation or quartiles (specified incomparison_lines
) for moving windows over the whole data. -
Loess_fits_combined
: This plot combines all curves into one panel. Given a low number of levels in thegroup_vars
, this plot eases comparisons. However, if the number increases this plot may be too crowded and unclear.
-
See Also
Estimates variance components
Description
Variance based models and intraclass correlations (ICC) are approaches to examine the impact of so-called process variables on the measurements. This implementation is model-based.
NB: The term ICC is frequently used to describe the agreement between
different observers, examiners or even devices. In respective settings a good
agreement is pursued. ICC-values can vary between [-1;1]
and an ICC close
to 1 is desired (Koo and Li 2016, Müller and Büttner 1994).
However, in multi-level analysis the ICC is interpreted differently. Please see Snijders et al. (Sniders and Bosker 1999). In this context the proportion of variance explained by respective group levels indicate an influence of (at least one) level of the respective group_vars. An ICC close to 0 is desired.
Usage
util_acc_varcomp(
resp_vars = NULL,
label_col = NULL,
study_data,
item_level = "item_level",
group_vars,
co_vars = NULL,
min_obs_in_subgroup = 30,
min_subgroups = 5,
meta_data = item_level,
meta_data_v2
)
Arguments
resp_vars |
variable list the names of the continuous measurement variables |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
study_data |
data.frame the data frame that contains the measurements |
item_level |
data.frame the data frame that contains metadata attributes of study data |
group_vars |
variable list the names of the resp. observer, device or reader variables |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
min_obs_in_subgroup |
integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of observations that is required to include a subgroup (level) of the "group_var" in the analysis. Subgroups with fewer observations are excluded. The default is 30. |
min_subgroups |
integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of subgroups (levels) included "group_var". If the variable defined in "group_var" has fewer subgroups it is not used for analysis. The default is 5. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Value
a list with:
-
SummaryTable
: data frame with ICCs perrvs
-
SummaryData
: data frame with ICCs perrvs
-
ScalarValue_max_icc
: maximum variance contribution value by group_vars -
ScalarValue_argmax_icc
: variable with maximum variance contribution by group_vars
ALGORITHM OF THIS IMPLEMENTATION:
This implementation is yet restricted to data of type float.
Missing codes are removed from resp_vars (if defined in the metadata)
Deviations from limits, as defined in the metadata, are removed
A linear mixed-effects model is estimated for resp_vars using co_vars and group_vars for adjustment.
An output data frame is generated for group_vars indicating the ICC.
See Also
Adjust the data types of study data, if needed
Description
Adjust the data types of study data, if needed
Usage
util_adjust_data_type(study_data, meta_data, relevant_vars_for_warnings)
Arguments
study_data |
data.frame the study data |
meta_data |
meta_data |
relevant_vars_for_warnings |
Value
data.frame modified study data
Place all geom_texts also in plotly
right from the x position
Description
Place all geom_texts also in plotly
right from the x position
Usage
util_adjust_geom_text_for_plotly(plotly)
Arguments
plotly |
the |
Value
modified plotly
-built object
Create a caption from an alias name of a dq_report2
result
Description
Create a caption from an alias name of a dq_report2
result
Usage
util_alias2caption(alias, long = FALSE)
Arguments
alias |
alias name |
long |
return result based on |
Value
caption
See Also
Other reporting_functions:
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
All indicator functions of dataquieR
Description
All indicator functions of dataquieR
Usage
util_all_ind_functions()
Value
character names of all indicator functions
Get all PART_VARS
for a response variable (from item-level metadata)
Description
Get all PART_VARS
for a response variable (from item-level metadata)
Usage
util_all_intro_vars_for_rv(
rv,
study_data,
meta_data,
label_col = LABEL,
expected_observations = c("HIERARCHY", "ALL", "SEGMENT")
)
Arguments
rv |
character the response variable's name |
study_data |
|
meta_data |
|
label_col |
character the metadata attribute to map |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. How should
|
Value
character all PART_VARS
for rv
from item level metadata.
For expected_observations = HIERARCHY
, the more general PART_VARS
(i.e., up, in the hierarchy) are more left in the vector, e.g.:
PART_STUDY, PART_PHYSICAL_EXAMINATIONS, PART_BLOODPRESSURE
See Also
Other missing_functions:
util_count_expected_observations()
,
util_filter_missing_list_table_for_rv()
,
util_get_code_list()
,
util_is_na_0_empty_or_false()
,
util_observation_expected()
,
util_remove_empty_rows()
,
util_replace_codes_by_NA()
convenience function to abbreviate all(util_is_integer(...))
Description
convenience function to abbreviate all(util_is_integer(...))
Usage
util_all_is_integer(x)
Arguments
x |
the object to test |
Value
TRUE
, if all entries are integer-like, FALSE
otherwise
See Also
Other process_functions:
util_abbreviate()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
Test, if package anytime
is installed
Description
Test, if package anytime
is installed
Usage
util_anytime_installed()
Value
TRUE
if anytime
is installed.
See Also
https://forum.posit.co/t/how-can-i-make-testthat-think-i-dont-have-a-package-installed/33441/2
utility function for the applicability of contradiction checks
Description
Test for applicability of contradiction checks
Usage
util_app_cd(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
See Also
utility function for the applicability of contradiction checks
Description
Test for applicability of contradiction checks
Usage
util_app_con_contradictions_redcap(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
See Also
utility function for the applicability of of distribution plots
Description
Test for applicability of distribution plots
Usage
util_app_dc(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function to test for applicability of detection limits checks
Description
Test for applicability of detection limits checks
Usage
util_app_dl(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function for the applicability of of end digits preferences checks
Description
Test for applicability of end digits preferences checks
Usage
util_app_ed(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function to test for applicability of hard limits checks
Description
Test for applicability of hard limits checks
Usage
util_app_hl(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function for the applicability of categorical admissibility
Description
Test for applicability of categorical admissibility
Usage
util_app_iac(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function for the applicability of numeric admissibility
Description
Test for applicability of numeric admissibility
Usage
util_app_iav(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function applicability of item missingness
Description
Test for applicability of item missingness
Usage
util_app_im(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
See Also
utility function for applicability of LOESS smoothed time course plots
Description
Test for applicability of LOESS smoothed time course plots
Usage
util_app_loess(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function to test for applicability of marginal means plots
Description
Test for applicability of detection limits checks
Usage
util_app_mar(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1 = matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function applicability of multivariate outlier detection
Description
Test for applicability of multivariate outlier detection
Usage
util_app_mol(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function for the applicability of outlier detection
Description
Test for applicability of univariate outlier detection
Usage
util_app_ol(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function to test for applicability of soft limits checks
Description
Test for applicability of soft limits checks
Usage
util_app_sl(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility function applicability of segment missingness
Description
Test for applicability of segment missingness
Usage
util_app_sm(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
See Also
utility function applicability of distribution function's shape or scale check
Description
Test for applicability of checks for deviation form expected probability distribution shapes/scales
Usage
util_app_sos(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
utility applicability variance components
Description
Test for applicability of ICC
Usage
util_app_vc(x, dta)
Arguments
x |
data.frame metadata |
dta |
logical vector, 1=matching data type, 0 = non-matching data type |
Value
factor 0-3 for each variable in metadata
0 data type mismatch and not applicable
1 data type mismatches but applicable
2 data type matches but not applicable
3 data type matches and applicable
4 not applicable because of not suitable data type
See Also
Convert a category to an ordered factor (1:5
)
Description
Convert a category to an ordered factor (1:5
)
Usage
util_as_cat(category)
Arguments
category |
vector with categories |
Value
an ordered factor
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Convert a category to a number (1:5
)
Description
Convert a category to a number (1:5
)
Usage
util_as_integer_cat(category)
Arguments
category |
vector with categories |
Value
an integer
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Convert factors to label-corresponding numeric values
Description
Converts a vector factor aware of numeric values not being scrambled.
Usage
util_as_numeric(v, warn)
Arguments
v |
the vector |
warn |
if not missing: character with error message stating conversion error |
Value
the converted vector
Return the pre-computed plotly
from a dataquieR
result
Description
Return the pre-computed plotly
from a dataquieR
result
Usage
util_as_plotly_from_res(res, ...)
Arguments
res |
the |
... |
not used |
Value
a plotly
object
Convert x
to valid missing codes
Description
Convert x
to valid missing codes
Usage
util_as_valid_missing_codes(x)
Arguments
x |
character a vector of values |
Value
converted x
See Also
Other robustness_functions:
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
utility function to assign labels to levels
Description
function to assign labels to levels of a variable
Usage
util_assign_levlabs(
variable,
string_of_levlabs,
splitchar,
assignchar,
ordered = TRUE,
variable_name = "",
warn_if_inadmissible = TRUE
)
Arguments
variable |
vector vector with values of a study variable |
string_of_levlabs |
character len=1. value labels,
e.g. |
splitchar |
character len=1. splitting character(s) in
|
assignchar |
character len=1. assignment operator character(s) in
|
ordered |
the function converts |
variable_name |
character the name of the variable being converted for warning messages |
warn_if_inadmissible |
logical warn on con_inadmissible_categorical values |
Details
DEPRECATED from v2.5.0
Value
a factor with labels assigned to categorical variables (if available)
See Also
Other data_management:
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Attach attributes to an object and return it
Description
Attach attributes to an object and return it
Usage
util_attach_attr(x, ...)
Arguments
x |
the object |
... |
named arguments, each becomes an attributes |
Value
x
, having the desired attributes attached
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
Put in back-ticks
Description
also escape potential back-ticks in x
Usage
util_bQuote(x)
Arguments
x |
a string |
Value
x in back-ticks
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
utility function to set string in backticks
Description
Quote a set of variable names with backticks
Usage
util_backtickQuote(x)
Arguments
x |
variable names |
Value
quoted variable names
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
Utility function to create bar plots
Description
A helper function for simple bar plots. The layout is intended for data with positive numbers only (e.g., counts/frequencies).
Usage
util_bar_plot(
plot_data,
cat_var,
num_var,
relative = FALSE,
show_numbers = TRUE,
fill_var = NULL,
colors = "#2166AC",
show_color_legend = FALSE,
flip = FALSE
)
Arguments
plot_data |
the data for the plot. It should consist of one column
specifying the categories, and a second column giving the
respective numbers / counts per category. It may contain
another column to specify the coloring of the bars
( |
cat_var |
column name of the categorical variable in |
num_var |
column name of the numerical variable in |
relative |
if |
show_numbers |
if |
fill_var |
column name of the variable in |
colors |
vector of colors, or a single color |
show_color_legend |
if |
flip |
if |
Value
a bar plot
Data frame leaves haven
Description
if df
is/contains a haven
labelled
or tibble
object, convert it to
a base R data frame
Usage
util_cast_off(df, symb, .dont_cast_off_cols = FALSE)
Arguments
df |
data.frame may have or contain non-standard classes |
symb |
character name of the data frame for error messages |
.dont_cast_off_cols |
logical internal use, only. |
Value
data.frame having all known special things removed
Verify the data type of a value
Description
Function to verify the data type of a value.
Usage
util_check_data_type(
x,
type,
check_convertible = FALSE,
threshold_value = 0,
return_percentages = FALSE,
check_conversion_stable = FALSE,
robust_na = FALSE
)
Arguments
x |
the value |
type |
expected data type |
check_convertible |
logical also try, if a conversion to the declared data type would work. |
threshold_value |
numeric from=0 to=100. percentage of failing conversions allowed. |
return_percentages |
logical return the percentage of mismatches. |
check_conversion_stable |
logical do not distinguish convertible from convertible, but with issues |
robust_na |
logical treat white-space-only-values as |
Value
if return_percentages
: if not check_convertible
, the percentage
of mismatches instead of logical value,
if check_convertible
, return a named
vector with the percentages of all cases
(names of the vector are
match
, convertible_mismatch_stable
,
convertible_mismatch_unstable
,
nonconvertible_mismatch
)
if not return_percentages
: if check_convertible
is FALSE
,
logical whether x
is of the expected type
if check_convertible
is TRUE
integer with the states 0, 1, 2, 3
: 0 = Mismatch, not convertible
1 = Match
2 = Mismatch, but convertible
3 = Mismatch, convertible,
but with issues (e.g.,
loss of decimal places)
See Also
Other data_management:
util_assign_levlabs()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Check data for observer levels
Description
Check data for observer levels
Usage
util_check_group_levels(
study_data,
group_vars,
min_obs_in_subgroup = -Inf,
max_obs_in_subgroup = +Inf,
min_subgroups = -Inf,
max_subgroups = +Inf
)
Arguments
study_data |
data.frame the data frame that contains the measurements |
group_vars |
variable the name of the observer, device or reader variable |
min_obs_in_subgroup |
integer from=0. optional argument if
|
max_obs_in_subgroup |
integer from=0. optional argument if
|
min_subgroups |
integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of subgroups (levels) included "group_var". If the variable defined in "group_var" has fewer subgroups it is split for analysis. |
max_subgroups |
integer from=0. optional argument if a "group_var" is used. This argument specifies the maximum no. of subgroups (levels) included "group_var". If the variable defined in "group_var" has more subgroups it is split for analysis. |
Value
modified study data frame
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Examples
## Not run:
study_data <- prep_get_data_frame("study_data")
meta_data <- prep_get_data_frame("meta_data")
prep_prepare_dataframes(.label_col = LABEL)
util_check_group_levels(ds1, "CENTER_0")
dim(util_check_group_levels(ds1, "USR_BP_0", min_obs_in_subgroup = 400))
## End(Not run)
Check for one value only
Description
utility function to identify variables with one value only.
Usage
util_check_one_unique_value(x)
Arguments
x |
vector with values |
Value
logical(1): TRUE, if – except NA – exactly only one value
is observed in x
,
FALSE otherwise
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Get Function called for a Call Name
Description
get aliases from report attributes and then replace them by the actual function name
Usage
util_cll_nm2fkt_nm(cll_names, report)
Arguments
cll_names |
character then systematic function call name to fetch its function name |
report |
dataquieR_resultset2 the report |
Value
character the function name
Return hex code colors from color names or STATAReporter
syntax
Description
Return hex code colors from color names or STATAReporter
syntax
Usage
util_col2rgb(colors)
Arguments
colors |
the colors, e.g.,"255 0 0" or "red" or "#ff0000" |
Value
character vector with colors using HTML
hexadecimal encoding, e..g,
"#ff0000" for "red"
Get description for a call
Description
Get description for a call
Usage
util_col_description(cn)
Arguments
cn |
the call name |
Value
the description
Collect all errors, warnings, or messages so that they are combined for a combined result
Description
Collect all errors, warnings, or messages so that they are combined for a combined result
Usage
util_collapse_msgs(class, all_of_f)
Create a data frame containing all the results from summaries of reports
Description
Create a data frame containing all the results from summaries of reports
Usage
util_combine_list_report_summaries(
to_combine,
type = c("unique_vars", "repeated_vars")
)
Arguments
to_combine |
vector a list containing the summaries of reports
obtained with |
type |
character if |
Value
a summary of summaries of dataquieR
reports
Combine results for Single Variables
Description
to, e.g., a data frame with one row per variable or a similar heat-map,
see print.ReportSummaryTable()
.
Usage
util_combine_res(all_of_f)
Arguments
all_of_f |
all results of a function |
Value
row-bound combined results
Combine two value lists
Description
Combine two value lists
Usage
util_combine_value_label_tables(vlt1, vlt2)
Arguments
vlt1 |
|
vlt2 |
Value
Examples
## Not run:
util_combine_value_label_tables(
tibble::tribble(~ CODE_VALUE, ~ CODE_LABEL, 17L, "Test", 19L, "Test", 17L, "TestX"),
tibble::tribble(~ CODE_VALUE, ~ CODE_LABEL, 17L, "Test", 19L, "Test", 17L, "TestX"))
## End(Not run)
Compares study data data types with the ones expected according to the metadata
Description
Utility function to compare data type of study data with those defined in metadata
Usage
util_compare_meta_with_study(
sdf,
mdf,
label_col,
check_convertible = FALSE,
threshold_value = 0,
return_percentages = FALSE,
check_conversion_stable = FALSE
)
Arguments
sdf |
the data.frame of study data |
mdf |
the data.frame of associated static metadata |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
check_convertible |
logical also try, if a conversion to the declared data type would work. |
threshold_value |
numeric from=0 to=100. percentage failing
conversions allowed if |
return_percentages |
logical return the percentage of mismatches. |
check_conversion_stable |
logical do not distinguish convertible from convertible, but with issues |
Value
for return_percentages == FALSE
: if check_convertible
is FALSE
,
a binary vector (0, 1)
if data type applies,
if check_convertible
is TRUE`` a vector with the states
0, 1, 2, 3: 0 = Mismatch, not convertible 1 = Match 2 = Mismatch, but convertible 3 = Mismatch, convertible, but with issues (e.g., loss of decimal places) for
return_percentages == TRUE': a data frame with percentages of
non-matching datatypes according, each column is a variable, the
rows follow the vectors returned by util_check_data_type.
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Remove specific classes from a ggplot plot_env
environment
Description
Useful to remove large objects before writing to disk with qs
or rds
.
Also deletes parent environment of the plot environment.
Also deletes unneeded variables
Usage
util_compress_ggplots_in_res(r)
Arguments
r |
the object |
See Also
Compute SE.Skewness
Description
Compute SE.Skewness
Usage
util_compute_SE_skewness(x, skewness = util_compute_skewness(x))
Arguments
x |
data |
skewness |
if already known |
Value
the standard error of skewness
Compute Kurtosis
Description
Compute Kurtosis
Usage
util_compute_kurtosis(x)
Arguments
x |
data |
Value
the Kurtosis
Compute the Skewness
Description
Compute the Skewness
Usage
util_compute_skewness(x)
Arguments
x |
data |
Value
the Skewness
Produce a condition function
Description
Produce a condition function
Usage
util_condition_constructor_factory(
.condition_type = c("error", "warning", "message")
)
Arguments
.condition_type |
character the type of the conditions being created and signaled by the function, "error", "warning", or "message" |
See Also
Other condition_functions:
util_deparse1()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_suppress_warnings()
,
util_warning()
Extract condition from try error
Description
Extract condition from try error
Usage
util_condition_from_try_error(x)
Arguments
x |
the try-error object |
Value
condition of the try-error
Can a vector be converted to a defined DATA_TYPE
Description
the function also checks, if the conversion is perfect, or if something
is lost (e.g., decimal places), or something is strange (like arbitrary
suffixes in a date, just note, that
as.POSIXct("2020-01-01 12:00:00 CET asdf")
does not fail in R
), but
util_conversion_stable("2020-01-01 12:00:00 CET asdf", DATA_TYPES$DATETIME)
will.
Usage
util_conversion_stable(vector, data_type, return_percentages = FALSE)
Arguments
vector |
vector input vector, |
data_type |
enum The type, to what the conversion should be tried. |
return_percentages |
logical return the percentage of stable conversions or matches. |
Details
HINT:
util_conversion_stable(.Machine$integer.max + 1, DATA_TYPES$INTEGER)
seems
to work correctly, although is.integer(.Machine$integer.max + 1)
returns FALSE
.
Value
numeric ratio of convertible entries in vector
return a flip term for ggplot2
plots, if desired.
Description
return a flip term for ggplot2
plots, if desired.
Usage
util_coord_flip(w, h, p, ref_env, ...)
Arguments
w |
width of the plot to determine its aspect ratio |
h |
height of the plot to determine its aspect ratio |
p |
the |
ref_env |
environment of the actual entry function, so that the correct formals can be detected. |
... |
additional arguments for |
Value
coord_flip
or coord_cartesian
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
Copy default dependencies to the report's lib directory
Description
Copy default dependencies to the report's lib directory
Usage
util_copy_all_deps(dir, pages, ...)
Arguments
dir |
report directory |
pages |
all pages to write |
... |
additional |
Value
invisible(NULL)
See Also
Other reporting_functions:
util_alias2caption()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Check referred variables
Description
This function operates in the environment of its caller
(using eval.parent, similar to Function like C-Preprocessor-Macros
).
Different from the other utility function that work
in the caller's environment (prep_prepare_dataframes), It has no side
effects except that the argument
of the calling function specified in arg_name
is normalized (set to its
default or a general default if missing, variable names being all white
space replaced by NAs).
It expects two objects in the caller's environment: ds1
and meta_data
.
meta_data
is the metadata data frame and ds1
is produced by a preceding
call of prep_prepare_dataframes using meta_data
and study_data
.
So this function can only be used after calling the function
prep_prepare_dataframes
Usage
util_correct_variable_use(
arg_name,
allow_na,
allow_more_than_one,
allow_null,
allow_all_obs_na,
allow_any_obs_na,
min_distinct_values,
need_type,
need_scale,
role = "",
overwrite = TRUE,
do_not_stop = FALSE,
remove_not_found = TRUE
)
util_correct_variable_use2(
arg_name,
allow_na,
allow_more_than_one,
allow_null,
allow_all_obs_na,
allow_any_obs_na,
min_distinct_values,
need_type,
need_scale,
role = arg_name,
overwrite = TRUE,
do_not_stop = FALSE,
remove_not_found = TRUE
)
Arguments
arg_name |
character Name of a function argument of the caller of util_correct_variable_use |
allow_na |
logical default = FALSE. allow NAs in the variable names
argument given in |
allow_more_than_one |
logical default = FALSE. allow more than one
variable names in |
allow_null |
logical default = FALSE. allow an empty variable name
vector in the argument |
allow_all_obs_na |
logical default = TRUE. check observations for not
being all |
allow_any_obs_na |
logical default = TRUE. check observations for
being complete without any |
min_distinct_values |
integer Minimum number of distinct observed values of a study variable |
need_type |
character if not |
need_scale |
character if not |
role |
character variable-argument role. Set different defaults for
all |
overwrite |
logical overwrite vector of variable names
to match the labels given in |
do_not_stop |
logical do not throw an error, if one of the variables
violates |
remove_not_found |
TODO: Not yet implemented |
Details
util_correct_variable_use and util_correct_variable_use2 differ only in
the default of the argument role
.
util_correct_variable_use and util_correct_variable_use2 put strong
effort on producing compressible
error messages to the caller's caller (who is typically an end user of
a dataquieR
function).
The function ensures, that a specified argument of its caller that refers variable names (one or more as character vector) matches some expectations.
This function accesses the caller's environment!
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Count Expected Observations
Description
Count participants, if an observation was expected, given the
PART_VARS
from item-level metadata
Usage
util_count_expected_observations(
resp_vars,
study_data,
meta_data,
label_col = LABEL,
expected_observations = c("HIERARCHY", "ALL", "SEGMENT")
)
Arguments
resp_vars |
character the response variables, for that a value may be expected |
study_data |
|
meta_data |
|
label_col |
character mapping attribute |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. How should
|
Value
a vector with the number of expected observations for each
resp_vars
.
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_filter_missing_list_table_for_rv()
,
util_get_code_list()
,
util_is_na_0_empty_or_false()
,
util_observation_expected()
,
util_remove_empty_rows()
,
util_replace_codes_by_NA()
Create an HTML file for the dq_report2
Description
Create an HTML file for the dq_report2
Usage
util_create_page_file(
page_nr,
pages,
rendered_pages,
dir,
template_file,
report,
logo,
loading,
packageName,
deps,
progress_msg,
progress,
title,
by_report
)
Arguments
page_nr |
the number of the page being created |
pages |
list with all page-contents named by their desired file names |
rendered_pages |
list with all rendered ( |
dir |
target directory |
template_file |
the report template file to use |
report |
the output of dq_report2 |
logo |
logo |
loading |
loading animation div |
packageName |
the name of the current package |
deps |
dependencies, as pre-processed by
|
progress_msg |
closure to call with progress information |
progress |
closure to call with progress information |
title |
character the web browser's window name |
by_report |
logical this report html is part of a set of reports, add a back-link |
Value
invisible(file_name)
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Create an overview of the reports created with dq_report_by
Description
Create an overview of the reports created with dq_report_by
Usage
util_create_report_by_overview(
output_dir,
strata_column,
segment_column,
strata_column_label,
subgroup,
mod_label
)
Arguments
output_dir |
character the directory in which all reports are searched and the overview is saved |
strata_column |
character name of a study variable to stratify the report by. It can be null |
segment_column |
character name of a metadata attribute usable to split the report in sections of variables. It can be null |
strata_column_label |
character the label of the variable used as strata_column |
subgroup |
character optional, to define subgroups of cases |
mod_label |
list |
Value
an overview of all dataquieR
reports created with dq_report_by
Create a dashboard-table from a report summary
Description
Create a dashboard-table from a report summary
Usage
util_dashboard_table(repsum)
Arguments
repsum |
a report summary from |
See Also
Other html:
util_extract_all_ids()
,
util_generate_pages_from_report()
,
util_get_hovertext()
Data type conversion
Description
Utility function to convert a study variable to match the data type given in the metadata, if possible.
Usage
util_data_type_conversion(x, type)
Arguments
x |
the value |
type |
expected data type |
Value
the transformed values (if possible)
Expression De-Parsing
Description
Turn unevaluated expressions into character strings.
Arguments
expr |
any R expression. |
collapse |
a string, passed to |
width.cutoff |
integer in [20, 500] determining the cutoff (in bytes) at which line-breaking is tried. |
... |
further arguments passed to |
Details
This is a simple utility function for R < 4.0.0 to ensure a string
result (character vector of length one),
typically used in name construction, as util_deparse1(substitute(.))
.
This avoids a dependency on backports
and on R >= 4.0.0.
Value
the deparsed expression
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_suppress_warnings()
,
util_warning()
Detect cores
Description
See parallel::detectCores
for further details.
Usage
util_detect_cores()
Value
number of available CPU cores.
See Also
Other system_functions:
util_user_hint()
,
util_view_file()
Escape characters for HTML in a data frame
Description
Escape characters for HTML in a data frame
Usage
util_df_escape(x)
Arguments
x |
data.frame to be escaped |
Value
data.frame with html escaped content
Utility function to dichotomize variables
Description
This function uses the metadata attributes RECODE_CASES
and/or
RECODE_CONTROL
to dichotomize the data. 'Cases' will be recoded to 1,
'controls' to 0. The recoding can be specified by an interval (for metric
variables) or by a list of categories separated by the 'SPLIT_CHAR'. Recoding
will be used for data quality checks that include a regression model.
Usage
util_dichotomize(study_data, meta_data, label_col = VAR_NAMES)
Arguments
study_data |
study data without jump/missing codes as specified in the code conventions |
meta_data |
metadata as specified in the code conventions |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Utility function to characterize study variables
Description
This function summarizes some properties of measurement variables.
Usage
util_dist_selection(study_data, val_lab = lifecycle::deprecated())
Arguments
study_data |
study data, pre-processed with |
val_lab |
deprecated |
Value
data frame with one row for each variable in the study data and the
following columns:
Variables
contains the names of the variables
IsInteger
contains a check whether the variable contains integer values
only (variables coded as factor will be converted to integers)
IsMultCat
contains a check for variables with integer or string values
whether there are more than two categories
NCategory
contains the number of distinct values for variables with
values coded as integers or strings (excluding NA
and
empty entries)
AnyNegative
contains a check whether the variable contains any negative
values
NDistinct
contains the number of distinct values
PropZeroes
reports the proportion of zeroes
See Also
Other metadata_management:
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Create an environment with several alias names for the study data variables
Description
generates an environment similar to as.environment(ds1)
, but makes
variables available by their VAR_NAME
, LABEL
, and label_col
- names.
Usage
util_ds1_eval_env(study_data, meta_data = "item_level", label_col = LABEL)
Arguments
study_data |
data.frame the data frame that contains the measurements |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata
with labels of variables. If
|
Test, if values of x are empty, i.e. NA or whitespace characters
Description
Test, if values of x are empty, i.e. NA or whitespace characters
Usage
util_empty(x)
Arguments
x |
the vector to test |
Value
a logical vector, same length as x; TRUE, if resp. element in x is "empty"
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
convert a value to character
Description
convert a value to character
Usage
util_ensure_character(x, error = FALSE, error_msg, ...)
Arguments
x |
the value |
error |
logical if |
error_msg |
error message to be displayed, if conversion was not possible |
... |
additional arguments passed to util_error or util_warning
respectively in case of an error, and if an |
Value
as.character(x)
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
similar to match.arg
Description
will only warn and return a cleaned x
.
Usage
util_ensure_in(x, set, err_msg, error = FALSE, applicability_problem = NA)
Arguments
x |
character vector of needles |
set |
character vector representing the haystack |
err_msg |
character optional error message. Use %s twice, once for the missing elements and once for proposals |
error |
logical if |
applicability_problem |
logical error indicates unsuitable resp_vars |
Value
character invisible(intersect(x, set))
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Utility function ensuring valid labels and variable names
Description
Valid labels should not be empty, be unique and do not exceed a certain length.
Usage
util_ensure_label(meta_data, label_col, max_label_len = MAX_LABEL_LEN)
Arguments
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
max_label_len |
integer maximum length for the labels, defaults to 30. |
Value
a list containing the study data, possibly with adapted column names, the metadata, possibly with adapted labels, and a string and a table informing about the changes
Support function to stop, if an optional package is not installed
Description
This function stops, if a package is not installed but needed for using an
optional feature of dataquieR
.
Usage
util_ensure_suggested(
pkg,
goal = ifelse(is.null(rlang::caller_call()), "work", paste("call",
sQuote(rlang::call_name(rlang::caller_call())))),
err = TRUE,
and_import = c()
)
Arguments
pkg |
needed package |
goal |
feature description for error message. |
err |
logical Should the function throw an error (default) or a warning? |
and_import |
import the listed function to the caller's environment |
Value
TRUE
if all packages in pkg
are available, FALSE
if at least
one of the packages is missing.
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Examples
## Not run: # internal use, only
f <- function() {
util_ensure_suggested <- get("util_ensure_suggested",
asNamespace("dataquieR"))
util_ensure_suggested("ggplot2", "Test",
and_import = "(ggplot|geom_.*|aes)")
print(ggplot(cars, aes(x = speed)) + geom_histogram())
}
f()
## End(Not run)
Produce an error message with a useful short stack trace. Then it stops the execution.
Description
Produce an error message with a useful short stack trace. Then it stops the execution.
Usage
util_error(
m,
...,
applicability_problem = NA,
intrinsic_applicability_problem = NA,
integrity_indicator = "none",
level = 0,
immediate,
title = "",
additional_classes = c()
)
Arguments
m |
error message or a condition |
... |
arguments for sprintf on m, if m is a character |
applicability_problem |
logical |
intrinsic_applicability_problem |
logical |
integrity_indicator |
character the message is an integrity problem, here is the indicator abbreviation.. |
level |
integer level of the error message (defaults to 0). Higher levels are more severe. |
immediate |
logical not used. |
additional_classes |
character additional classes the thrown condition object should inherit from, first. |
Value
nothing, its purpose is to stop.
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_suppress_warnings()
,
util_warning()
Evaluate a parsed redcap rule for given study data
Description
also allows to use VAR_NAMES
in the rules,
if other labels have been selected
Usage
util_eval_rule(
rule,
ds1,
meta_data = "item_level",
use_value_labels,
replace_missing_by = "NA",
replace_limits = TRUE
)
Arguments
rule |
the redcap rule (parsed, already) |
ds1 |
the study data as prepared by |
meta_data |
the metadata |
use_value_labels |
map columns with |
replace_missing_by |
enum LABEL | INTERPRET | NA . Missing codes should
be replaced by the missing labels, the
|
replace_limits |
logical replace hard limit violations by |
Value
the result of the parsed rule
See Also
Other redcap:
util_get_redcap_rule_env()
Evaluate an expression and create a dataquieR_result
object from
it's evaluated value
Description
if an error occurs, the function will return a corresponding object representing that error. all conditions will be recorded and replayed, whenever the result is printed by print.dataquieR_result.
Usage
util_eval_to_dataquieR_result(
expression,
env = parent.frame(),
filter_result_slots,
nm,
function_name,
my_call = expression,
my_storr_object = NULL,
init = FALSE,
called_in_pipeline = TRUE
)
Arguments
expression |
the expression |
env |
the environment to evaluate the expression in |
filter_result_slots |
character regular expressions, only if an indicator function's result's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed. |
nm |
character name for the computed result |
function_name |
character name of the function to be executed |
my_call |
the call being executed (equivalent to |
my_storr_object |
a |
init |
logical is this an initial call to compute dummy results? |
called_in_pipeline |
logical if the evaluation should be considered as part of a pipeline. |
Value
a dataquieR_result
object
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Generate a full DQ report, v2
Description
Generate a full DQ report, v2
Usage
util_evaluate_calls(
all_calls,
study_data,
meta_data,
label_col,
meta_data_segment,
meta_data_dataframe,
meta_data_cross_item,
resp_vars,
filter_result_slots,
cores,
debug_parallel,
mode = c("default", "futures", "queue", "parallel"),
mode_args,
my_storr_object = NULL
)
Arguments
all_calls |
list a list of calls |
study_data |
data.frame the data frame that contains the measurements |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_dataframe |
data.frame – optional: Data frame level metadata |
meta_data_cross_item |
data.frame – optional: cross-item level metadata |
resp_vars |
variable list the name of the measurement variables for the report. |
filter_result_slots |
character regular expressions, only if an indicator function's result's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed. |
cores |
integer number of cpu cores to use or a named list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. Can also be a cluster. |
debug_parallel |
logical print blocks currently evaluated in parallel |
mode |
character work mode for parallel execution. default is
"default", the values mean:
- default: use |
mode_args |
list of arguments for the selected |
Value
a dataquieR_resultset2. Can be printed creating a RMarkdown-report.
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Verify, that argument is a data frame
Description
stops with an error, if not. will add the columns, and return the resulting
extended data frame, and also updating the original data frame in the
calling environment, if #' x
is empty (data frames easily break to
0-columns in R, if they have not rows, e.g. using some split
/rbind
pattern)
Usage
util_expect_data_frame(
x,
col_names,
convert_if_possible,
custom_errors,
dont_assign,
keep_types = FALSE
)
Arguments
x |
an object that is verified to be a |
col_names |
column names x must contain or named list of predicates to check the columns (e.g., list(AGE=is.numeric, SEX=is.character)) |
convert_if_possible |
if given, for each column, a lambda can be given
similar to |
custom_errors |
list with error messages, specifically per column. names of the list are column names, values are messages (character). |
dont_assign |
set |
keep_types |
logical keep types as possibly defined in a file, if the
data frame is loaded from one. set |
Value
invisible
data frame
check, if a scalar/vector function argument matches expectations
Description
check, if a scalar/vector function argument matches expectations
Usage
util_expect_scalar(
arg_name,
allow_more_than_one = FALSE,
allow_null = FALSE,
allow_na = FALSE,
min_length = -Inf,
max_length = Inf,
check_type,
convert_if_possible,
conversion_may_replace_NA = FALSE,
dont_assign = FALSE,
error_message
)
Arguments
arg_name |
the argument |
allow_more_than_one |
allow vectors |
allow_null |
allow NULL |
allow_na |
allow |
min_length |
minimum length of the argument's value |
max_length |
maximum length of the argument's value |
check_type |
a predicate function, that must return |
convert_if_possible |
if given, a lambda can be given
similar to |
conversion_may_replace_NA |
if set to |
dont_assign |
set |
error_message |
if |
Value
the value of arg_name – but this is updated in the calling frame anyway.
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Examples
## Not run:
f <- function(x) {
util_expect_scalar(x, check_type = is.integer)
}
f(42L)
try(f(42))
g <- function(x) {
util_expect_scalar(x, check_type = is.integer, convert_if_possible =
as.integer)
}
g(42L)
g(42)
## End(Not run)
Extract all ids from a list of htmltools
objects
Description
Extract all ids from a list of htmltools
objects
Usage
util_extract_all_ids(pages)
Arguments
pages |
the list of objects |
Value
a character vector with valid targets
See Also
Other html:
util_dashboard_table()
,
util_generate_pages_from_report()
,
util_get_hovertext()
Extract columns of a SummaryTable
(or Segment, ...)
Description
Extract columns of a SummaryTable
(or Segment, ...)
Usage
util_extract_indicator_metrics(Table)
Arguments
Table |
data.frame, a table |
Value
data.frame columns with indicator metrics from Table
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
return all matches of an expression
Description
return all matches of an expression
Usage
util_extract_matches(data, pattern)
Arguments
data |
a character vector |
pattern |
a character string containing a regular expression |
Value
A list with matching elements or NULL (in case on non-matching elements)
Author(s)
Josh O'Brien
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_par_pmap()
,
util_setup_rstudio_job()
,
util_suppress_output()
Examples
## Not run: # not exported, so not tested
dat0 <- list("a sentence with citation (Ref. 12), (Ref. 13), and then (Ref. 14)",
"another sentence without reference")
pat <- "Ref. (\\d+)"
util_extract_matches(dat0, pat)
## End(Not run)
Filter a MISSING_LIST_TABLE
for rows matching the variable rv
Description
In MISSING_LIST_TABLE
, a column resp_vars
may be specified. If so,
and if, for a row, this column is not empty, then that row only affects the
one variable specified in that cell
Usage
util_filter_missing_list_table_for_rv(table, rv, rv2 = rv)
Arguments
table |
cause_label_df a data frame with missing codes and
optionally |
rv |
variable the response variable to filter the missing list for specified by a label. |
rv2 |
variable the response variable to filter the missing list for
specified by a |
Value
data.frame the row-wise bound data frames as one data frame
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_count_expected_observations()
,
util_get_code_list()
,
util_is_na_0_empty_or_false()
,
util_observation_expected()
,
util_remove_empty_rows()
,
util_replace_codes_by_NA()
Filter collection based on its names()
using regular expressions
Description
Filter collection based on its names()
using regular expressions
Usage
util_filter_names_by_regexps(collection, regexps)
Arguments
collection |
a named collection (list, vector, ...) |
regexps |
character a vector of regular expressions |
Value
collection
reduced to entries, that's names match at least any
expression from regexps
See Also
Other string_functions:
util_abbreviate_unique()
,
util_pretty_vector_string()
,
util_set_dQuoteString()
,
util_set_sQuoteString()
,
util_sub_string_left_from_.()
,
util_sub_string_right_from_.()
,
util_translate()
Examples
## Not run: # internal function
util_filter_names_by_regexps(iris, c("epa", "eta"))
## End(Not run)
Function that calculated height and width values for script_iframe
Description
Function that calculated height and width values for script_iframe
Usage
util_finalize_sizing_hints(sizing_hints)
Arguments
sizing_hints |
list containing information for setting
the size of the |
Value
a list with figure_type_id, w, and h; sizes are as CSS
, existing
elements are kept, w_in_cm
and h_in_cm
are estimates for the
size in centimeters on a typical computer display (in 2024)
Find externally called function in the stack trace
Description
intended use: error messages for the user
Usage
util_find_external_functions_in_stacktrace(
sfs = rev(sys.frames()),
cls = rev(sys.calls())
)
Arguments
sfs |
reverse sys.frames to search in |
cls |
reverse sys.calls to search in |
Value
vector of logicals stating for each index, if it had been called externally
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_error()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_suppress_warnings()
,
util_warning()
Find first externally called function in the stack trace
Description
intended use: error messages for the user
Usage
util_find_first_externally_called_functions_in_stacktrace(
sfs = rev(sys.frames()),
cls = rev(sys.calls())
)
Arguments
sfs |
reverse sys.frames to search in |
cls |
reverse sys.calls to search in |
Value
reverse sys.frames index of first non-dataquieR function in this stack
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_suppress_warnings()
,
util_warning()
Check, if x
contains valid missing codes
Description
Check, if x
contains valid missing codes
Usage
util_find_free_missing_code(x)
Arguments
x |
a vector of missing codes |
Value
a missing code not in x
See Also
Other metadata_management:
util_dist_selection()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Search for a formal in the stack trace
Description
Similar to dynGet()
, find a symbol in the closest data quality indicator
function and return its value. Can stop()
, if symbol evaluation causes a
stop.
Usage
util_find_indicator_function_in_callers(symbol = "resp_vars")
Arguments
symbol |
symbol to find |
Value
value of the symbol, if available, NULL
otherwise
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_message()
,
util_suppress_warnings()
,
util_warning()
Try hard, to map a variable
Description
does not warn on ambiguities nor if not found (but in the latter case,
it returns ifnotfound
)
Usage
util_find_var_by_meta(
resp_vars,
meta_data = "item_level",
label_col = LABEL,
allowed_sources = c(VAR_NAMES, label_col, LABEL, LONG_LABEL, "ORIGINAL_VAR_NAMES",
"ORIGINAL_LABEL"),
target = VAR_NAMES,
ifnotfound = NA_character_
)
Arguments
resp_vars |
variables to map from |
meta_data |
metadata |
label_col |
label-col to map from, if not |
allowed_sources |
allowed names to map from (as metadata columns) |
target |
metadata attribute to map to |
ifnotfound |
list A list of values to be used if the item is not found: it will be coerced to a list if necessary. |
Value
vector of mapped target names of resp_vars
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Move the first row of a data frame to its column names
Description
Move the first row of a data frame to its column names
Usage
util_first_row_to_colnames(dfr)
Arguments
dfr |
Value
data.frame with first row as column names
Fix results from merge
Description
this function handles the result of merge()
-calls, if no.dups = TRUE
and
suffixes = c("", "")
Usage
util_fix_merge_dups(dfr, stop_if_incompatible = TRUE)
Arguments
dfr |
data frame to fix |
stop_if_incompatible |
logical stop if data frame can not be fixed |
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
RStudio crashes on parallel calls in some versions on Darwin based operating systems with R 4
Description
RStudio crashes on parallel calls in some versions on Darwin based operating systems with R 4
Usage
util_fix_rstudio_bugs()
Value
invisible null
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Ensure, sizing hint sticks at the dqr
, only
Description
Ensure, sizing hint sticks at the dqr
, only
Usage
util_fix_sizing_hints(dqr, x)
Arguments
dqr |
a |
x |
a plot object |
Value
a list with dqr
and x
, but fixed
Fix a storr
object, if it features the factory-attribute
Description
Fix a storr
object, if it features the factory-attribute
Usage
util_fix_storr_object(my_storr_object)
Arguments
my_storr_object |
a |
Value
a (hopefully) working storr_object
See Also
return a single page navigation menu floating on the right
Description
if displayed in a dq_report2
Usage
util_float_index_menu(index_menu_table, object)
Arguments
index_menu_table |
data.frame columns: links, hovers, texts |
object |
|
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Examples
## Not run:
util_float_index_menu(tibble::tribble(
~ links, ~ hovers, ~ texts,
"http://www.google.de/#xxx", "This is Google", "to Google",
"http://www.uni-giessen.de/#xxx", "This is Gießen", "cruising on the A45"
))
## End(Not run)
Plots simple HTML tables with background color scale
Description
Plots simple HTML tables with background color scale
Usage
util_formattable(
tb,
min_val = min(tb, na.rm = TRUE),
max_val = max(tb, na.rm = TRUE),
min_color = c(0, 0, 255),
max_color = c(255, 0, 0),
soften = function(x) stats::plogis(x, location = 0.5, scale = 0.1),
style_header = "font-weight: bold;",
text_color_mode = c("bw", "gs"),
hover_texts = NULL,
escape_all_content = TRUE
)
Arguments
tb |
data.frame the table as data.frame with mostly numbers |
min_val |
numeric minimum value for the numbers in |
max_val |
numeric maximum value for the numbers in |
min_color |
numeric vector with the RGB color values for the minimum color, values between 0 and 255 |
max_color |
numeric vector with the RGB color values for the maximum color, values between 0 and 255 |
soften |
function to be applied to the relative values between 0 and 1 before mapping them to a color |
style_header |
character to be applied to style the HTML header of the table |
text_color_mode |
enum bw | gs. Should the text be displayed in black and white or using a grey scale? In both cases, the color will be adapted to the background. |
hover_texts |
data.frame if not |
escape_all_content |
logical if |
Value
htmltools
compatible object
See Also
Examples
## Not run:
tb <- as.data.frame(matrix(ncol = 5, nrow = 5))
tb[] <- sample(1:100, prod(dim(tb)), replace = TRUE)
tb[, 1] <- paste("case", 1:nrow(tb))
htmltools::browsable(util_formattable(tb))
htmltools::browsable(util_formattable(tb[, -1]))
## End(Not run)
Get description for an indicator function
Description
Get description for an indicator function
Usage
util_function_description(fname)
Arguments
fname |
the function name |
Value
the description
Generate a link to a specific result
Description
for dq_report2
Usage
util_generate_anchor_link(
varname,
callname,
order_context = c("variable", "indicator"),
name,
title
)
Arguments
varname |
variable to create a link to |
callname |
function call to create a link to |
order_context |
link created to variable overview or indicator overview page |
name |
replaces |
title |
optional, replaces auto-generated link title |
Value
the htmltools
tag
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Generate a tag for a specific result
Description
for dq_report2
Usage
util_generate_anchor_tag(
varname,
callname,
order_context = c("variable", "indicator"),
name
)
Arguments
varname |
variable to create an anchor for |
callname |
function call to create an anchor for |
order_context |
anchor created on variable overview or indicator overview page |
name |
replaces |
Value
the htmltools
tag
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Generate an execution/calling plan for computing a report from the metadata
Description
Generate an execution/calling plan for computing a report from the metadata
Usage
util_generate_calls(
dimensions,
meta_data,
label_col,
meta_data_segment,
meta_data_dataframe,
meta_data_cross_item,
specific_args,
arg_overrides,
resp_vars,
filter_indicator_functions
)
Arguments
dimensions |
dimensions Vector of dimensions to address in the report. Allowed values in the vector are Completeness, Consistency, and Accuracy. The generated report will only cover the listed data quality dimensions. Accuracy is computational expensive, so this dimension is not enabled by default. Completeness should be included, if Consistency is included, and Consistency should be included, if Accuracy is included to avoid misleading detections of e.g. missing codes as outliers, please refer to the data quality concept for more details. Integrity is always included. |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_dataframe |
data.frame – optional: Data frame level metadata |
meta_data_cross_item |
data.frame – optional: Cross-item level metadata |
specific_args |
list named list of arguments specifically for one of the called functions, the of the list elements correspond to the indicator functions whose calls should be modified. The elements are lists of arguments. |
arg_overrides |
list arguments to be passed to all called indicator functions if applicable. |
resp_vars |
variables to be respected, |
filter_indicator_functions |
character regular expressions, only if an indicator function's name matches one of these, it'll be used for the report. If of length zero, no filtering is performed. |
Value
a list of calls
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Generate function calls for a given indicator function
Description
new reporting pipeline v2.0
Usage
util_generate_calls_for_function(
fkt,
meta_data,
label_col,
meta_data_segment,
meta_data_dataframe,
meta_data_cross_item,
specific_args,
arg_overrides,
resp_vars
)
Arguments
fkt |
the indicator function's name |
meta_data |
the item level metadata data frame |
label_col |
the label column |
meta_data_segment |
segment level metadata |
meta_data_dataframe |
data frame level metadata |
meta_data_cross_item |
cross-item level metadata |
specific_args |
argument overrides for specific functions |
arg_overrides |
general argument overrides |
resp_vars |
variables to be respected |
Value
function calls for the given function
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Convert a dataquieR report v2 to a named list of web pages
Description
Convert a dataquieR report v2 to a named list of web pages
Usage
util_generate_pages_from_report(
report,
template,
disable_plotly,
progress = progress,
progress_msg = progress_msg,
block_load_factor,
dir,
my_dashboard
)
Arguments
report |
|
template |
character template to use, only the name, not the path |
disable_plotly |
logical do not use |
progress |
|
progress_msg |
|
block_load_factor |
numeric multiply size of parallel compute blocks by this factor. |
dir |
character output directory for potential |
my_dashboard |
list of class |
Value
named list, each entry becomes a file with the name of the entry.
the contents are HTML
objects as used by htmltools
.
See Also
Other html:
util_dashboard_table()
,
util_extract_all_ids()
,
util_get_hovertext()
Examples
## Not run:
devtools::load_all()
prep_load_workbook_like_file("meta_data_v2")
report <- dq_report2("study_data", dimensions = NULL, label_col = "LABEL");
save(report, file = "report_v2.RData")
report <- dq_report2("study_data", label_col = "LABEL");
save(report, file = "report_v2_short.RData")
## End(Not run)
Create a table summarizing the number of indicators and descriptors in the report
Description
Create a table summarizing the number of indicators and descriptors in the report
Usage
util_generate_table_indicators_descriptors(report)
Arguments
report |
a report |
Value
a table containing the number of indicators and descriptors created in the report, separated by data quality dimension.
Return the category for a result
Description
messages do not cause any category, warnings are cat3
, errors are cat5
Usage
util_get_category_for_result(
result,
aspect = c("applicability", "error", "anamat", "indicator_or_descriptor"),
...
)
Arguments
result |
a |
aspect |
an aspect/problem category of results (error, applicability error) |
... |
not used |
Value
a category, see util_as_cat()
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Fetch a missing code list from the metadata
Description
get missing codes from metadata (e.g. MISSING_LIST or JUMP_LIST)
Usage
util_get_code_list(
x,
code_name,
split_char = SPLIT_CHAR,
mdf,
label_col = VAR_NAMES,
warning_if_no_list = TRUE,
warning_if_unsuitable_list = TRUE
)
Arguments
x |
variable the name of the variable to retrieve code lists for. only one variable at a time is supported, not vectorized!! |
code_name |
variable attribute JUMP_LIST or MISSING_LIST: Which codes to retrieve. |
split_char |
character len = 1. Character(s) used to separate
different codes in the metadata, usually |
mdf |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
warning_if_no_list |
logical len = 1. If |
warning_if_unsuitable_list |
logical len = 1. If |
Value
numeric vector of missing codes.
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_count_expected_observations()
,
util_filter_missing_list_table_for_rv()
,
util_is_na_0_empty_or_false()
,
util_observation_expected()
,
util_remove_empty_rows()
,
util_replace_codes_by_NA()
Get colors for each russet DQ
category
Description
Get colors for each russet DQ
category
Usage
util_get_colors()
Value
named vector of colors, names are categories (e.g, "1" to "5")
values are colors as HTML
RGB
hexadecimal strings
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Read additional concept tables
Description
Read additional concept tables
Usage
util_get_concept_info(filename, ...)
Arguments
filename |
RDS-file name without extension to read from |
... |
passed to subset |
Value
a data frame
Get encoding from metadata or guess it from data
Description
Get encoding from metadata or guess it from data
Usage
util_get_encoding(
resp_vars = colnames(study_data),
study_data,
label_col,
meta_data,
meta_data_dataframe
)
Arguments
resp_vars |
variable the names of the measurement variables, if
missing or |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
Value
named vector of valid encoding strings matching resp_vars
Find a foreground color for a background
Description
black or white
Usage
util_get_fg_color(cl)
Arguments
cl |
colors |
Value
black or white for each cl
See Also
Import vector of hover text for tables in the report
Description
Import vector of hover text for tables in the report
Usage
util_get_hovertext(x)
Arguments
x |
name of the tables. They are |
Value
named vector containing the hover text from the file metadata-hovertext.rds
in the inst folder. Names correspond to column names in the metadata
tables
See Also
Other html:
util_dashboard_table()
,
util_extract_all_ids()
,
util_generate_pages_from_report()
Get labels for each russet DQ
category
Description
Get labels for each russet DQ
category
Usage
util_get_labels_grading_class()
Value
named vector of labels, names are categories (e.g, "1" to "5") values are labels
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Return messages/warnings/notes/error messages for a result
Description
Return messages/warnings/notes/error messages for a result
Usage
util_get_message_for_result(
result,
aspect = c("applicability", "error", "anamat", "indicator_or_descriptor"),
collapse = "\n<br />\n",
...
)
Arguments
result |
a |
aspect |
an aspect/problem category of results |
collapse |
either a lambda function or a separator for combining multiple messages for the same result |
... |
not used |
Value
hover texts for results with data quality issues, run-time errors, warnings or notes (aka messages)
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
an environment with functions available for REDcap
rules
Description
an environment with functions available for REDcap
rules
Usage
util_get_redcap_rule_env()
Value
environment
See Also
Other redcap:
util_eval_rule()
Get rule sets for DQ
grading
Description
Get rule sets for DQ
grading
Usage
util_get_rule_sets()
Value
names lists, names are the ruleset names, values are data.frames
featuring the columns GRADING_RULESET
, dqi_parameterstub
,
indicator_metric
, dqi_catnum
and dqi_cat_1
to
dqi_cat_<dqi_catnum>
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Get formats for DQ
categories
Description
Get formats for DQ
categories
Usage
util_get_ruleset_formats()
Value
data.frame columns: categories
(e.g., "1" to "5"),
color
(e.g., "33 102 172", "67 147 195", "227 186 20", "214 96 77",
178 23 43"), label
(e.g., "OK", "unclear", "moderate", "important",
"critical" )
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_thresholds()
,
util_html_table()
,
util_sort_by_order()
Get namespace for attributes
Description
Get namespace for attributes
Usage
util_get_storr_att_namespace(my_storr_object)
Arguments
my_storr_object |
the |
Value
the namespace name
Get the storr
object backing a report
Description
Get the storr
object backing a report
Usage
util_get_storr_object_from_report(r)
Arguments
r |
the dataquieR_resultset2 / report |
Value
the storr
object holding the results or NULL
, if the report
lives in the memory, only
Get namespace specifically for summary attributes for speed-up
Description
Get namespace specifically for summary attributes for speed-up
Usage
util_get_storr_summ_namespace(my_storr_object)
Arguments
my_storr_object |
the |
Value
the namespace name
Get the thresholds for grading
Description
Get the thresholds for grading
Usage
util_get_thresholds(indicator_metric, meta_data)
Arguments
indicator_metric |
which indicator metric to be classified |
meta_data |
the item level metadata |
Value
named list (names are VAR_NAMES
, values are named vectors of
intervals, names in the vectors are the category numbers)
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_html_table()
,
util_sort_by_order()
Get variable attributes of a certain provision level
Description
This function returns all variable attribute names of a certain metadata provision level or of more than one level.
Usage
util_get_var_att_names_of_level(level, cumulative = TRUE)
Arguments
level |
level(s) of requirement |
cumulative |
include all names from more basic levels |
Value
all matching variable attribute names
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Return all variables in the segment segment
Description
Return all variables in the segment segment
Usage
util_get_vars_in_segment(segment, meta_data = "item_level", label_col = LABEL)
Arguments
segment |
character the segment as specified in |
meta_data |
data.frame the metadata |
label_col |
character the metadata attribute used for naming the variables |
Value
vector of variable names
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Get the Table with Known Vocabularies
Description
Get the Table with Known Vocabularies
Usage
util_get_voc_tab(.data_frame_list = .dataframe_environment())
Arguments
.data_frame_list |
environment cache for loaded data frames |
Value
data.frame the (combined) table with known vocabularies
Add labels to ggplot
Description
EXPERIMENTAL
Usage
util_gg_var_label(
...,
meta_data = get("meta_data", parent.frame()),
label_col = get("label_col", parent.frame())
)
Arguments
... |
EXPERIMENTAL |
meta_data |
the metadata |
label_col |
the label columns |
Value
a modified ggplot
Utility function to check whether a variable has no grouping variable assigned
Description
Utility function to check whether a variable has no grouping variable assigned
Usage
util_has_no_group_vars(resp_vars, label_col = LABEL, meta_data = "item_level")
Arguments
resp_vars |
variable list the name of a measurement variable |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame old name for |
Value
boolean
Utility Function Heatmap with 1 Threshold
Description
Function to create heatmap-like plot given one threshold – works for percentages for now.
Usage
util_heatmap_1th(
df,
cat_vars,
values,
threshold,
right_intv,
invert,
cols,
strata
)
Arguments
df |
data.frame with data to display as a heatmap. |
cat_vars |
variable list len=1-2. Variables to group by. Up to 2 group levels supported. |
values |
variable the name of the percentage variable |
threshold |
numeric lowest acceptable value |
right_intv |
logical len=1. If |
invert |
logical len=1. If |
cols |
deprecated, ignored. |
strata |
variable optional, the name of a variable
used for stratification
|
Value
a list with:
-
SummaryPlot
: ggplot2::ggplot object with the heatmap
See Also
Other figure_functions:
util_optimize_histogram_bins()
If on Windows, hide a file
Description
If on Windows, hide a file
Usage
util_hide_file_windows(fn)
Arguments
fn |
the file path + name |
Value
invisible(NULL)
Utility function to create histograms
Description
A helper function for simple histograms.
Usage
util_histogram(
plot_data,
num_var = colnames(plot_data)[1],
fill_var = NULL,
facet_var = NULL,
nbins_max = 100,
colors = "#2166AC",
is_datetime = FALSE
)
Arguments
plot_data |
a |
num_var |
column name of the numerical or datetime variable
in |
fill_var |
column name of the categorical variable in |
facet_var |
column name of the categorical variable in |
nbins_max |
the maximum number of bins for the histogram (see
|
colors |
vector of colors, or a single color |
is_datetime |
if |
Value
a histogram
escape "
Description
escape "
Usage
util_html_attr_quote_escape(s)
Arguments
s |
haystack |
Value
s
with "
replaced by "
Create a dynamic dimension related page for the report
Description
Create a dynamic dimension related page for the report
Usage
util_html_for_dims(
report,
use_plot_ly,
template,
block_load_factor,
repsum,
dir
)
Arguments
report |
dataquieR_resultset2 a |
use_plot_ly |
logical use |
template |
character template to use for the |
block_load_factor |
numeric multiply size of parallel compute blocks by this factor. |
repsum |
the |
dir |
character output directory for potential |
Value
list of arguments for append_single_page()
defined locally in
util_generate_pages_from_report()
.
Create a dynamic single variable page for the report
Description
Create a dynamic single variable page for the report
Usage
util_html_for_var(
report,
cur_var,
use_plot_ly,
template,
note_meta = c(),
rendered_repsum,
dir
)
Arguments
report |
dataquieR_resultset2 a |
cur_var |
character variable name for single variable pages |
use_plot_ly |
logical use |
template |
character template to use for the |
note_meta |
character notes on the metadata for a single variable (if needed) |
rendered_repsum |
the |
dir |
character output directory for potential |
Value
list of arguments for append_single_page()
defined locally in
util_generate_pages_from_report()
.
The jack of all trades device for tables
Description
The jack of all trades device for tables
Usage
util_html_table(
tb,
filter = "top",
columnDefs = NULL,
autoWidth = FALSE,
hideCols = character(0),
rowCallback = DT::JS("function(r,d) {$(r).attr('height', '2em')}"),
copy_row_names_to_column = !is.null(tb) && length(rownames(tb)) == nrow(tb) &&
!is.integer(attr(tb, "row.names")) && !all(seq_len(nrow(tb)) == rownames(tb)),
link_variables = TRUE,
tb_rownames = FALSE,
meta_data,
rotate_headers = FALSE,
fillContainer = TRUE,
...,
colnames,
descs,
options = list(),
is_matrix_table = FALSE,
colnames_aliases2acronyms = is_matrix_table && !cols_are_indicatormetrics,
cols_are_indicatormetrics = FALSE,
label_col = LABEL,
output_format = c("RMD", "HTML"),
dl_fn = "*",
rotate_for_one_row = FALSE,
title = dl_fn,
messageTop = NULL,
messageBottom = NULL,
col_tags = NULL,
searchBuilder = FALSE,
initial_col_tag,
init_search,
additional_init_args,
additional_columnDefs
)
Arguments
tb |
the table as data.frame |
filter |
passed to |
columnDefs |
column specifications for the |
autoWidth |
passed to the |
hideCols |
columns to hide (by name) |
rowCallback |
passed to the |
copy_row_names_to_column |
add a column 0 with |
link_variables |
considering row names being variables, convert row names to links to the variable specific reports |
tb_rownames |
number of columns from the left considered as row-names |
meta_data |
the data dictionary for labels and similar stuff |
rotate_headers |
rotate headers by 90 degrees |
fillContainer |
see |
... |
passed to |
colnames |
column names for the table (defaults to |
descs |
character descriptions of the columns for the hover-box shown
for the column names, if not missing, this overrides
the existing description stuff from known column
names. If you have an attribute "description" of the |
options |
individually overwrites defaults in |
is_matrix_table |
create a heat map like table without padding |
colnames_aliases2acronyms |
abbreviate column names considering being analysis matrix columns by their acronyms defined in square. |
cols_are_indicatormetrics |
logical cannot be |
label_col |
label col used for mapping labels in case of
|
output_format |
target format |
dl_fn |
file name for downloaded table – see https://datatables.net/reference/button/excel |
rotate_for_one_row |
logical rotate one-row-tables |
title |
character title for download formats, see https://datatables.net/extensions/buttons/examples/html5/titleMessage.html |
messageTop |
character subtitle for download formats, see https://datatables.net/extensions/buttons/examples/html5/titleMessage.html |
messageBottom |
character footer for download formats, see https://datatables.net/extensions/buttons/examples/html5/titleMessage.html |
col_tags |
list if not |
searchBuilder |
logical if |
initial_col_tag |
character |
init_search |
list object to initialize |
additional_init_args |
list if not missing or |
additional_columnDefs |
list additional |
Value
the table to be added to an rmd
/´html
file as
htmlwidgets::htmlwidgets
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_sort_by_order()
utility function for the outliers rule of Hubert and Vandervieren 2008
Description
function to calculate outliers according to the rule of Huber et al. This
function requires the package robustbase
Usage
util_hubert(x)
Arguments
x |
numeric data to check for outliers |
Value
binary vector
See Also
Other outlier_functions:
util_3SD()
,
util_sigmagap()
,
util_tukey()
Make it
scalable, if it is a figure
Description
this function writes figures to helper files and embeds these in a returned
object which is a scalable iframe
. it does not change other objects in it
.
Usage
util_iframe_it_if_needed(it, dir, nm, fkt, sizing_hints, ggthumb)
Arguments
it |
|
dir |
character output directory for potential |
nm |
character name for the |
fkt |
character function name of the indicator function that created
|
sizing_hints |
|
ggthumb |
|
Value
htmltools::tag()
compatible object, maybe now in an iframe
Extract all properties of a ReportSummaryTable
Description
Extract all properties of a ReportSummaryTable
Usage
util_init_respum_tab(x)
Arguments
x |
|
Value
list with all properties
Integer breaks for ggplot2
Description
creates integer-only breaks
Usage
util_int_breaks_rounded(x, n = 5)
Arguments
x |
the values |
n |
integer giving the desired number of intervals. Non-integer values are rounded down. |
Value
breaks suitable for scale_*_continuous
' breaks
argument
Author(s)
See Also
Examples
## Not run:
big_numbers1 <- data.frame(x = 1:5, y = c(0:1, 0, 1, 0))
big_numbers2 <- data.frame(x = 1:5, y = c(0:1, 0, 1, 0) + 1000000)
big_numbers_plot1 <- ggplot(big_numbers1, aes(x = x, y = y)) +
geom_point()
big_numbers_plot2 <- ggplot(big_numbers2, aes(x = x, y = y)) +
geom_point()
big_numbers_plot1 + scale_y_continuous()
big_numbers_plot1 + scale_y_continuous(breaks = util_int_breaks_rounded)
big_numbers_plot2 + scale_y_continuous()
big_numbers_plot2 + scale_y_continuous(breaks = util_int_breaks_rounded)
## End(Not run)
Check for duplicated content
Description
This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.
Usage
util_int_duplicate_content_dataframe(
level = c("dataframe"),
identifier_name_list,
id_vars_list,
unique_rows,
meta_data_dataframe = "dataframe_level",
...,
dataframe_level
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
identifier_name_list |
vector the vector that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments. |
id_vars_list |
list the list containing the identifier variables names to be used in the assessment. |
unique_rows |
vector named. for each data frame, either true/false or |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
... |
Not used. |
dataframe_level |
data.frame alias for |
Value
a list with
-
SegmentData
: data frame with the results of the quality check for duplicated entries -
SegmentTable
: data frame with selected duplicated entries check results, used for the data quality report. -
Other
: vector with row indices of duplicated entries, if any, otherwise NULL.
See Also
Other integrity_indicator_functions:
util_int_duplicate_content_segment()
,
util_int_duplicate_ids_dataframe()
,
util_int_duplicate_ids_segment()
,
util_int_unexp_records_set_dataframe()
,
util_int_unexp_records_set_segment()
Check for duplicated content
Description
This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.
Usage
util_int_duplicate_content_segment(
level = c("segment"),
identifier_name_list,
id_vars_list,
unique_rows,
study_data,
meta_data,
meta_data_segment = "segment_level",
segment_level
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
identifier_name_list |
vector the vector that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments. |
id_vars_list |
list the list containing the identifier variables names to be used in the assessment. |
unique_rows |
vector named. for each segment, either true/false or |
study_data |
data.frame the data frame that contains the measurements, mandatory. |
meta_data |
data.frame the data frame that contains metadata attributes of the study data, mandatory. |
meta_data_segment |
data.frame – optional: Segment level metadata |
segment_level |
data.frame alias for |
Value
a list with
-
SegmentData
: data frame with the results of the quality check for duplicated entries -
SegmentTable
: data frame with selected duplicated entries check results, used for the data quality report. -
Other
: vector with row indices of duplicated entries, if any, otherwise NULL.
See Also
Other integrity_indicator_functions:
util_int_duplicate_content_dataframe()
,
util_int_duplicate_ids_dataframe()
,
util_int_duplicate_ids_segment()
,
util_int_unexp_records_set_dataframe()
,
util_int_unexp_records_set_segment()
Check for duplicated IDs
Description
This function tests for duplicates entries in identifiers. It is possible to check duplicated identifiers by study segments or to consider only selected segments.
Usage
util_int_duplicate_ids_dataframe(
level = c("dataframe"),
id_vars_list,
identifier_name_list,
repetitions,
meta_data_dataframe = "dataframe_level",
...,
dataframe_level
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
id_vars_list |
list id variable names for each segment or data frame |
identifier_name_list |
vector the segments or data frame names being assessed |
repetitions |
vector an integer vector indicating the number of allowed repetitions in the |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
... |
not used. |
dataframe_level |
data.frame alias for |
Value
a list with
-
DataframeData
: data frame with the results of the quality check for duplicated identifiers -
DataframeTable
: data frame with selected duplicated identifiers check results, used for the data quality report. -
Other
: named list with inner lists of unique cases containing each the row indices of duplicated identifiers separated by "|" , if any. outer names are names of the data frames
See Also
Other integrity_indicator_functions:
util_int_duplicate_content_dataframe()
,
util_int_duplicate_content_segment()
,
util_int_duplicate_ids_segment()
,
util_int_unexp_records_set_dataframe()
,
util_int_unexp_records_set_segment()
Check for duplicated IDs
Description
This function tests for duplicates entries in identifiers. It is possible to check duplicated identifiers by study segments or to consider only selected segments.
Usage
util_int_duplicate_ids_segment(
level = c("segment"),
id_vars_list,
study_segment,
repetitions,
study_data,
meta_data,
meta_data_segment = "segment_level",
segment_level
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
id_vars_list |
list id variable names for each segment or data frame |
study_segment |
vector the segments or data frame names being assessed |
repetitions |
vector an integer vector indicating the number of allowed repetitions in the id_vars. Currently, no repetitions are supported. |
study_data |
data.frame the data frame that contains the measurements, mandatory. |
meta_data |
data.frame the data frame that contains metadata attributes of the study data, mandatory. |
meta_data_segment |
data.frame – optional: Segment level metadata |
segment_level |
data.frame alias for |
Value
a list with
-
SegmentData
: data frame with the results of the quality check for duplicated identifiers -
SegmentTable
: data frame with selected duplicated identifiers check results, used for the data quality report. -
Other
: named list with inner lists of unique cases containing each the row indices of duplicated identifiers separated by "|" , if any. outer names are names of the segments. Useprep_get_study_data_segment()
to get the data frame the indices refer to.
See Also
Other integrity_indicator_functions:
util_int_duplicate_content_dataframe()
,
util_int_duplicate_content_segment()
,
util_int_duplicate_ids_dataframe()
,
util_int_unexp_records_set_dataframe()
,
util_int_unexp_records_set_segment()
Check for unexpected data record set
Description
This function tests that the identifiers match a provided record set. It is possible to check for unexpected data record sets by study segments or to consider only selected segments.
Usage
util_int_unexp_records_set_dataframe(
level = c("dataframe"),
id_vars_list,
identifier_name_list,
valid_id_table_list,
meta_data_record_check_list,
meta_data_dataframe = "dataframe_level",
...,
dataframe_level
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
id_vars_list |
list the list containing the identifier variables names to be used in the assessment. |
identifier_name_list |
list the list that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments. |
valid_id_table_list |
list the reference list with the identifier variable values. |
meta_data_record_check_list |
character a character vector indicating the type of check to conduct, either "subset" or "exact". |
meta_data_dataframe |
data.frame the data frame that contains the metadata for the data frame level |
... |
not used |
dataframe_level |
data.frame alias for |
Value
a list with
-
SegmentData
: data frame with the results of the quality check for unexpected data elements -
SegmentTable
: data frame with selected unexpected data elements check results, used for the data quality report. -
UnexpectedRecords
: vector with row indices of duplicated records, if any, otherwise NULL.
See Also
Other integrity_indicator_functions:
util_int_duplicate_content_dataframe()
,
util_int_duplicate_content_segment()
,
util_int_duplicate_ids_dataframe()
,
util_int_duplicate_ids_segment()
,
util_int_unexp_records_set_segment()
Check for unexpected data record set
Description
This function tests that the identifiers match a provided record set. It is possible to check for unexpected data record sets by study segments or to consider only selected segments.
Usage
util_int_unexp_records_set_segment(
level = c("segment"),
id_vars_list,
identifier_name_list,
valid_id_table_list,
meta_data_record_check_list,
study_data,
label_col,
meta_data,
item_level,
meta_data_segment = "segment_level",
segment_level
)
Arguments
level |
character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment"). |
id_vars_list |
list the list containing the identifier variables names to be used in the assessment. |
identifier_name_list |
list the list that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments. |
valid_id_table_list |
list the reference list with the identifier variable values. |
meta_data_record_check_list |
character a character vector indicating the type of check to conduct, either "subset" or "exact". |
study_data |
data.frame the data frame that contains the measurements, mandatory. |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
meta_data |
data.frame the data frame that contains metadata attributes of the study data, mandatory. |
item_level |
data.frame the data frame that contains metadata attributes of study data |
meta_data_segment |
data.frame – optional: Segment level metadata |
segment_level |
data.frame alias for |
Value
a list with
-
SegmentData
: data frame with the results of the quality check for unexpected data elements -
SegmentTable
: data frame with selected unexpected data elements check results, used for the data quality report. -
UnexpectedRecords
: vector with row indices of duplicated records, if any, otherwise NULL.
See Also
Other integrity_indicator_functions:
util_int_duplicate_content_dataframe()
,
util_int_duplicate_content_segment()
,
util_int_duplicate_ids_dataframe()
,
util_int_duplicate_ids_segment()
,
util_int_unexp_records_set_dataframe()
Utility function to interpret mathematical interval notation
Description
Utility function to split limit definitions into interpretable elements
Usage
util_interpret_limits(mdata)
Arguments
mdata |
data.frame the data frame that contains metadata attributes of study data |
Value
augments metadata by interpretable limit columns
See Also
Other parser_functions:
util_parse_assignments()
,
util_parse_interval()
,
util_parse_redcap_rule()
Check for integer values
Description
This function checks if a variable is integer.
Usage
util_is_integer(x, tol = .Machine$double.eps^0.5)
Arguments
x |
the object to test |
tol |
precision of the detection. Values deviating more than |
Value
TRUE
or FALSE
See Also
is.integer
Copied from the documentation of is.integer
is.integer detects, if the storage mode of an R-object is
integer. Usually, users want to know, if the values are integer. As suggested
by is.integer's documentation, is.wholenumber
does so.
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Detect falsish values
Description
Detect falsish values
Usage
util_is_na_0_empty_or_false(x)
Arguments
x |
a value/vector of values |
Value
vector of logical values:
TRUE
, wherever x is somehow empty
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_count_expected_observations()
,
util_filter_missing_list_table_for_rv()
,
util_get_code_list()
,
util_observation_expected()
,
util_remove_empty_rows()
,
util_replace_codes_by_NA()
Create a predicate function to check for certain numeric properties
Description
useful, e.g., for util_expect_data_frame and util_expect_scalar. The
generated function returns on TRUE
or FALSE
, even if called with a
vector.
Usage
util_is_numeric_in(
min = -Inf,
max = +Inf,
whole_num = FALSE,
finite = FALSE,
set = NULL
)
Arguments
min |
if given, minimum for numeric values |
max |
if given, maximum for numeric values |
whole_num |
if TRUE, expect a whole number |
finite |
Are |
set |
if given, a set, the value must be in (see util_match_arg) |
Value
a function that checks an x
for the properties.
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Examples
## Not run:
util_is_numeric_in(min = 0)(42)
util_is_numeric_in(min = 43)(42)
util_is_numeric_in(max = 3)(42)
util_is_numeric_in(whole_num = TRUE)(42)
util_is_numeric_in(whole_num = TRUE)(42.1)
util_is_numeric_in(set = c(1, 3, 5))(1)
util_is_numeric_in(set = c(1, 3, 5))(2)
## End(Not run)
Detect un-disclosed ggplot
Description
Detect un-disclosed ggplot
Usage
util_is_svg_object(x)
Arguments
x |
the object to check |
Value
TRUE
or FALSE
Check, if x
is a try-error
Description
Check, if x
is a try-error
Usage
util_is_try_error(x)
Arguments
x |
Value
logical()
if it is a try-error
Check, if x
contains valid missing codes
Description
Check, if x
contains valid missing codes
Usage
util_is_valid_missing_codes(x)
Arguments
x |
a vector of values |
Value
TRUE
or FALSE
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
being called by the active binding function for .manual
Description
being called by the active binding function for .manual
Usage
util_load_manual(
rebuild = FALSE,
target = "inst/manual.RData",
target2 = "inst/indicator_or_descriptor.RData",
man_hash = ""
)
Arguments
rebuild |
rebuild the cache |
target |
file for |
target2 |
file for |
man_hash |
internal use: hash-sum over the manual to prevent rebuild if not changed. |
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
,
util_set_size()
Check for repetitive values using the digits 8 or 9 only
Description
Values not being finite (see is.finite
) are also reported as missing
codes. Also, all missing codes must be composed out of the digits 8 and
9 and they must be the largest values of a variable.
Usage
util_looks_like_missing(x, n_rules = 1)
Arguments
x |
|
n_rules |
|
Value
logical
indicates for each value in x
, if it looks like a
missing code
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_no_value_labels()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Rename columns of a SummaryTable
(or Segment, ...) to look nice
Description
Rename columns of a SummaryTable
(or Segment, ...) to look nice
Usage
util_make_data_slot_from_table_slot(Table)
Arguments
Table |
data.frame, a table |
Value
renamed table
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_order_by_order()
,
util_set_size()
Maps label column metadata on study data variable names
Description
Maps a certain label column from the metadata to the study data frame.
Usage
util_map_all(label_col = VAR_NAMES, study_data, meta_data)
Arguments
label_col |
the variable of the metadata that contains the variable names of the study data |
study_data |
the name of the data frame that contains the measurements |
meta_data |
the name of the data frame that contains metadata attributes of study data |
Value
list with slot df
with a study data frame with mapped column
names
See Also
Other mapping:
util_map_by_largest_prefix()
,
util_map_labels()
,
util_recode()
Map based on largest common prefix
Description
Map based on largest common prefix
Usage
util_map_by_largest_prefix(
needle,
haystack,
split_char = "_",
remove_var_suffix = TRUE
)
Arguments
needle |
character |
haystack |
character items to find the entry sharing the largest
prefix with |
split_char |
character |
remove_var_suffix |
logical |
Value
character(1)
with the fitting function name or NA_character_
See Also
Other mapping:
util_map_all()
,
util_map_labels()
,
util_recode()
Examples
## Not run: # internal function
util_map_by_largest_prefix(
"acc_distributions_loc_ecdf_observer_time",
names(dataquieR:::.manual$titles)
)
util_map_by_largest_prefix(
"acc_distributions_loc_observer_time",
names(dataquieR:::.manual$titles)
)
util_map_by_largest_prefix(
"acc_distributions_loc_ecdf",
names(dataquieR:::.manual$titles)
)
util_map_by_largest_prefix(
"acc_distributions_loc",
names(dataquieR:::.manual$titles)
)
## End(Not run)
Support function to allocate labels to variables
Description
Map variables to certain attributes, e.g. by default their labels.
Usage
util_map_labels(
x,
meta_data = "item_level",
to = LABEL,
from = VAR_NAMES,
ifnotfound,
warn_ambiguous = FALSE
)
Arguments
x |
character variable names, character vector, see parameter from |
meta_data |
data.frame old name for |
to |
character variable attribute to map to |
from |
character variable identifier to map from |
ifnotfound |
list A list of values to be used if the item is not found: it will be coerced to a list if necessary. |
warn_ambiguous |
logical print a warning if mapping variables from
|
Details
This function basically calls colnames(study_data) <- meta_data$LABEL
,
ensuring correct merging/joining of study data columns to the corresponding
metadata rows, even if the orders differ. If a variable/study_data-column
name is not found in meta_data[[from]]
(default from = VAR_NAMES
),
either stop is called or, if ifnotfound
has been assigned a value, that
value is returned. See mget
, which is internally used by this function.
The function not only maps to the LABEL
column, but to
can be any
metadata variable attribute, so the function can also be used, to get, e.g.
all HARD_LIMITS
from the metadata.
Value
a character vector with:
mapped values
See Also
Other mapping:
util_map_all()
,
util_map_by_largest_prefix()
,
util_recode()
Examples
## Not run:
meta_data <- prep_create_meta(
VAR_NAMES = c("ID", "SEX", "AGE", "DOE"),
LABEL = c("Pseudo-ID", "Gender", "Age", "Examination Date"),
DATA_TYPE = c(DATA_TYPES$INTEGER, DATA_TYPES$INTEGER, DATA_TYPES$INTEGER,
DATA_TYPES$DATETIME),
MISSING_LIST = ""
)
stopifnot(all(prep_map_labels(c("AGE", "DOE"), meta_data) == c("Age",
"Examination Date")))
## End(Not run)
Utility function to create a margins plot for binary variables
Description
Utility function to create a margins plot for binary variables
Usage
util_margins_bin(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
threshold_type = NULL,
threshold_value,
min_obs_in_subgroup = 5,
min_obs_in_cat = 5,
caption = NULL,
ds1,
label_col,
adjusted_hint = "",
title = "",
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default),
include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
dataquieR.acc_margins_num_default)
)
Arguments
resp_vars |
variable the name of the binary measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
threshold_type |
enum empirical | user | none. See |
threshold_value |
numeric see |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
min_obs_in_cat |
integer This optional argument specifies the minimum
number of observations that is required to include
a category (level) of the outcome ( |
caption |
string a caption for the plot (optional, typically used to report the coding of cases and control group) |
ds1 |
data.frame the data frame that contains the measurements, after
replacing missing value codes by |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
adjusted_hint |
character hint, if adjusted for |
title |
character title for the plot |
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)? |
include_numbers_in_figures |
logical Should the figure report the number of observations for each level of the grouping variable? |
Value
A table and a matching plot.
Utility function to create a margins plot from linear regression models
Description
Utility function to create a margins plot from linear regression models
Usage
util_margins_lm(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
threshold_type = NULL,
threshold_value,
min_obs_in_subgroup = 5,
ds1,
label_col,
levels = NULL,
adjusted_hint = "",
title = "",
n_violin_max = getOption("dataquieR.max_group_var_levels_with_violins",
dataquieR.max_group_var_levels_with_violins_default),
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default),
include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
dataquieR.acc_margins_num_default)
)
Arguments
resp_vars |
variable the name of the measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
threshold_type |
enum empirical | user | none. See |
threshold_value |
numeric see |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
ds1 |
data.frame the data frame that contains the measurements, after
replacing missing value codes by |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
levels |
|
adjusted_hint |
character hint, if adjusted for |
title |
character title for the plot |
n_violin_max |
integer from=0. This optional argument specifies
the maximum number of levels of the |
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)? |
include_numbers_in_figures |
logical Should the figure report the number of observations for each level of the grouping variable? |
Value
A table and a matching plot.
Utility function to create a plot similar to the margins plots for nominal variables
Description
This function is still under development. It uses the nnet
package to
compute multinomial logistic regression models.
Usage
util_margins_nom(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
min_obs_in_subgroup = 5,
min_obs_in_cat = 5,
ds1,
label_col,
adjusted_hint = "",
title = "",
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default)
)
Arguments
resp_vars |
variable the name of the nominal measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
min_obs_in_cat |
integer This optional argument specifies the minimum
number of observations that is required to include
a category (level) of the outcome ( |
ds1 |
data.frame the data frame that contains the measurements, after
replacing missing value codes by |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
adjusted_hint |
character hint, if adjusted for |
title |
character title for the plot |
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)? |
Value
A table and a matching plot.
Utility function to create a plot similar to the margins plots for ordinal variables
Description
This function is still under development. It uses the ordinal
package to
compute ordered regression models.
Usage
util_margins_ord(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
min_obs_in_subgroup = 5,
min_subgroups = 5,
ds1,
label_col,
adjusted_hint = "",
title = "",
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default)
)
Arguments
resp_vars |
variable the name of the ordinal measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
min_subgroups |
integer from=3. The model provided by the |
ds1 |
data.frame the data frame that contains the measurements, after
replacing missing value codes by |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
adjusted_hint |
character hint, if adjusted for |
title |
character title for the plot |
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)? |
Value
A table and a matching plot.
Utility function to create a margins plot from Poisson regression models
Description
Utility function to create a margins plot from Poisson regression models
Usage
util_margins_poi(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
threshold_type = NULL,
threshold_value,
min_obs_in_subgroup = 5,
ds1,
label_col,
adjusted_hint = "",
title = "",
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default),
include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
dataquieR.acc_margins_num_default)
)
Arguments
resp_vars |
variable the name of the measurement variable |
group_vars |
variable the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
threshold_type |
enum empirical | user | none. See |
threshold_value |
numeric see |
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
ds1 |
data.frame the data frame that contains the measurements, after
replacing missing value codes by |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
adjusted_hint |
character hint, if adjusted for |
title |
character title for the plot |
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations (in the figure)? |
include_numbers_in_figures |
logical Should the figure report the number of observations for each level of the grouping variable? |
Value
A table and a matching plot.
dataquieR
version of match.arg
Description
does not support partial matching, but will display the most likely match as a warning/error.
Usage
util_match_arg(arg, choices, several_ok = FALSE, error = TRUE)
Arguments
arg |
the argument |
choices |
the choices |
several_ok |
allow more than one entry in |
error |
|
Value
"cleaned" arg
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_observations_in_subgroups()
,
util_stop_if_not()
,
util_warn_unordered()
Combine data frames by merging
Description
This is an extension of merge
working for a list of data frames.
Usage
util_merge_data_frame_list(data_frames, id_vars)
Arguments
data_frames |
list of data.frames |
id_vars |
character the variable(s) to merge the data frames by. each of them must exist in all data frames. |
Value
data.frame combination of data frames
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Produce a condition message with a useful short stack trace.
Description
Produce a condition message with a useful short stack trace.
Usage
util_message(
m,
...,
applicability_problem = NA,
intrinsic_applicability_problem = NA,
integrity_indicator = "none",
level = 0,
immediate,
title = "",
additional_classes = c()
)
Arguments
m |
a message or a condition |
... |
arguments for sprintf on m, if m is a character |
applicability_problem |
logical |
intrinsic_applicability_problem |
logical |
integrity_indicator |
character the message is an integrity problem, here is the indicator abbreviation.. |
level |
integer level of the message (defaults to 0). Higher levels are more severe. |
immediate |
logical not used. |
additional_classes |
character additional classes the thrown condition object should inherit from, first. |
Value
condition the condition object, if the execution is not stopped
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_suppress_warnings()
,
util_warning()
Select really numeric variables
Description
Reduce resp_vars
to those, which are either float
or integer
without
VALUE_LABELS, i.e. likely numeric but not a factor
Usage
util_no_value_labels(resp_vars, meta_data, label_col, warn = TRUE, stop = TRUE)
Arguments
resp_vars |
variable list len=1-2. the name of the continuous measurement variable |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
warn |
logical warn about removed variable names |
stop |
logical stop on no matching |
Value
character vector of matching resp_vars
.
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_validate_known_meta()
,
util_validate_missing_lists()
Distribute CODE_LIST_TABLE
in item level metadata
Description
fills the columns MISSING_LIST_TABLE
and VALUE_LABEL_TABLE
from
CODE_LIST_TABLE
, if applicable
Usage
util_normalize_clt(meta_data)
Arguments
meta_data |
data.frame old name for |
Value
meta_data, but CODE_LIST_TABLE
column is distributed to the
columns VALUE_LABEL_TABLE
and MISSING_LIST_TABLE
, respectively.
Normalize and check cross-item-level metadata
Description
Normalize and check cross-item-level metadata
Usage
util_normalize_cross_item(
meta_data = "item_level",
meta_data_cross_item = "cross-item_level",
label_col = LABEL
)
Arguments
meta_data |
|
meta_data_cross_item |
|
label_col |
character label column to use for variable naming |
Value
normalized and checked cross-item-level metadata
See Also
Other meta_data_cross:
ASSOCIATION_DIRECTION
,
ASSOCIATION_FORM
,
ASSOCIATION_METRIC
,
ASSOCIATION_RANGE
,
CHECK_ID
,
CHECK_LABEL
,
CONTRADICTION_TERM
,
CONTRADICTION_TYPE
,
DATA_PREPARATION
,
GOLDSTANDARD
,
MULTIVARIATE_OUTLIER_CHECK
,
MULTIVARIATE_OUTLIER_CHECKTYPE
,
N_RULES
,
REL_VAL
,
VARIABLE_LIST
,
meta_data_cross
Convert VALUE_LABELS
to separate tables
Description
Convert VALUE_LABELS
to separate tables
Usage
util_normalize_value_labels(
meta_data = "item_level",
max_value_label_len = getOption("dataquieR.MAX_VALUE_LABEL_LEN",
dataquieR.MAX_VALUE_LABEL_LEN_default)
)
Arguments
meta_data |
data.frame old name for |
max_value_label_len |
integer maximum length for value labels |
Value
data.frame metadata with VALUE_LABEL_TABLE
instead of
VALUE_LABELS
(or none of these, if absent)
Examples
## Not run:
prep_purge_data_frame_cache()
prep_load_workbook_like_file("meta_data_v2")
util_normalize_value_labels()
prep_add_data_frames(test_labs =
tibble::tribble(~ CODE_VALUE, ~ CODE_LABEL, 17L, "Test", 19L, "Test",
17L, "TestX"))
il <- prep_get_data_frame("item_level")
if (!VALUE_LABEL_TABLE %in% colnames(il)) {
il$VALUE_LABEL_TABLE <- NA_character_
}
il$VALUE_LABEL_TABLE[[1]] <- "test_labs"
il$VALUE_LABELS[[1]] <- "17 = TestY"
prep_add_data_frames(item_level = il)
util_normalize_value_labels()
## End(Not run)
Detect Expected Observations
Description
For each participant, check, if an observation was expected, given the
PART_VARS
from item-level metadata
Usage
util_observation_expected(
rv,
study_data,
meta_data,
label_col = LABEL,
expected_observations = c("HIERARCHY", "ALL", "SEGMENT")
)
Arguments
rv |
character the response variable, for that a value may be expected |
study_data |
|
meta_data |
|
label_col |
character mapping attribute |
expected_observations |
enum HIERARCHY | ALL | SEGMENT. How should
|
Value
a vector with TRUE
or FALSE
for each row of study_data
, if for
study_data[rv]
a value is expected.
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_count_expected_observations()
,
util_filter_missing_list_table_for_rv()
,
util_get_code_list()
,
util_is_na_0_empty_or_false()
,
util_remove_empty_rows()
,
util_replace_codes_by_NA()
Utility function observations in subgroups
Description
This function uses !is.na
to count the number of non-missing observations in subgroups of
the data (list) and in a set of user defined response variables. In some applications it is
required that the number of observations per e.g. factor level is higher than a user-defined
minimum number.
Usage
util_observations_in_subgroups(x, rvs)
Arguments
x |
data frame |
rvs |
variable names |
Value
matrix of flags
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_stop_if_not()
,
util_warn_unordered()
Creates a Link to our Website
Description
i.e., to a vignette on the website
Usage
util_online_ref(fkt_name)
Arguments
fkt_name |
character function name to generate a link for |
Value
character the link
Utility function to compute and optimize bin breaks for histograms
Description
Utility function to compute and optimize bin breaks for histograms
Usage
util_optimize_histogram_bins(
x,
interval_freedman_diaconis = NULL,
nbins_max = 100,
cuts = NULL
)
Arguments
x |
a vector of data values (numeric or datetime) |
interval_freedman_diaconis |
range of values which should be included to
calculate the Freedman-Diaconis bandwidth (e.g., for
|
nbins_max |
the maximum number of bins for the histogram. Strong
outliers can cause too many narrow bins, which might be
even to narrow to be plotted. This also results in large
files and rendering problems. So it is sensible to limit
the number of bins. The function will produce a message if
it reduces the number of bins in such a case. Reasons
could be unspecified missing value codes, or minimum or
maximum values far away from most of the data values, a few
number of unique values, or (for |
cuts |
a vector of values at which breaks between bins should occur |
Value
a list with bin breaks, if needed separated for each segment of the plot
See Also
Other figure_functions:
util_heatmap_1th()
Utility function to distribute points across a time variable
Description
Utility function to distribute points across a time variable
Usage
util_optimize_sequence_across_time_var(
time_var_data,
n_points,
prop_grid = 0.5
)
Arguments
time_var_data |
vector of the data points of the time variable |
n_points |
maximum number of points to distribute across the time variable (minimum: 3) |
prop_grid |
proportion of points given in |
Value
a sequence of points in datetime format
Get the order of a vector with general order given in some other vector
Description
Get the order of a vector with general order given in some other vector
Usage
util_order_by_order(x, order, ...)
Arguments
x |
the vector |
order |
the "order vector |
... |
additional arguments passed to |
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_set_size()
Examples
## Not run:
util_order_by_order(c("a", "b", "a", "c", "d"), letters)
## End(Not run)
Utility function parallel version of purrr::pmap
Description
Parallel version of purrr::pmap
.
Usage
util_par_pmap(
.l,
.f,
...,
cores = list(mode = "socket", cpus = util_detect_cores(), logging = FALSE,
load.balancing = TRUE),
use_cache = FALSE
)
Arguments
.l |
data.frame with one call per line and one function argument per column |
.f |
|
... |
additional, static arguments for calling |
cores |
number of cpu cores to use or a (named) list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller. |
use_cache |
logical set to FALSE to omit re-using already distributed study- and metadata on a parallel cluster |
Value
list of results of the function calls
Author(s)
S Struckmann
See Also
purrr::pmap
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_setup_rstudio_job()
,
util_suppress_output()
Utility function to parse assignments
Description
This function parses labels & level assignments in the format
1 = male | 2 = female
. The function also handles m = male | f = female
,
but this would not match the metadata concept. The split-character can
be given, if not the default from SPLIT_CHAR is to be used, but this
would also violate the metadata concept.
Usage
util_parse_assignments(
text,
split_char = SPLIT_CHAR,
multi_variate_text = FALSE,
split_on_any_split_char = FALSE
)
Arguments
text |
Text to be parsed |
split_char |
Character separating assignments, may be a vector, then
all will be tried and the the most likely matching one will
be returned as attribute |
multi_variate_text |
don't paste text but parse element-wise |
split_on_any_split_char |
split on any split |
Value
the parsed assignments as a named list
See Also
Other parser_functions:
util_interpret_limits()
,
util_parse_interval()
,
util_parse_redcap_rule()
Examples
## Not run:
md <- prep_get_data_frame("meta_data")
vl <- md$VALUE_LABELS
vl[[50]] <- "low<medium < high"
a <- util_parse_assignments(vl, split_char = c(SPLIT_CHAR, "<"),
multi_variate_text = TRUE)
b <- util_parse_assignments(vl, split_char = c(SPLIT_CHAR, "<"),
split_on_any_split_char = TRUE, multi_variate_text = TRUE)
is_ordered <- vapply(a, attr, "split_char", FUN.VALUE = character(1)) == "<"
md$VALUE_LABELS[[50]] <- "low<medium < high"
md$VALUE_LABELS[[51]] <- "1 = low< 2=medium < 3=high"
md$VALUE_LABELS[[49]] <- "2 = medium< 1=low < 3=high" # counter intuitive
with_sl <- prep_scalelevel_from_data_and_metadata(study_data = "study_data",
meta_data = md)
View(with_sl[, union(SCALE_LEVEL, colnames(with_sl))])
## End(Not run)
Utility function to parse intervals
Description
Utility function to parse intervals
Usage
util_parse_interval(int)
Arguments
int |
an interval as string, e.g., "[0;Inf)" |
Value
the parsed interval with elements inc_l
(Is the lower limit
included?), low
(the value of the lower limit), inc_u
(Is the upper
limit included?), upp
(the value of the upper limit)
See Also
Other parser_functions:
util_interpret_limits()
,
util_parse_assignments()
,
util_parse_redcap_rule()
Interpret a REDcap
-style rule and create an expression, that represents this rule
Description
Interpret a REDcap
-style rule and create an expression, that represents this rule
Usage
util_parse_redcap_rule(
rule,
debug = 0,
entry_pred = "REDcapPred",
must_eof = FALSE
)
Arguments
rule |
character |
debug |
integer debug level (0 = off, 1 = log, 2 = breakpoints) |
entry_pred |
character for debugging reasons: The production rule used entry point for the parser |
must_eof |
logical if |
Value
expression the interpreted rule
REDcap
rules 1
REDcap
rules 2
REDcap
rules 3
For resolving left-recursive rules, StackOverflow helps understanding the grammar below, just in case, theoretical computer science is not right in your mind currently.
See Also
Other parser_functions:
util_interpret_limits()
,
util_parse_assignments()
,
util_parse_interval()
Examples
## Not run:
# rules:
# pregnancies <- 9999 ~ SEX == 'm' | is.na(SEX)
# pregnancies <- 9998 ~ AGE < 12 | is.na(AGE)
# pregnancies = 9999 ~ dist > 2 | speed == 0
data.frame(target = "SEX_0",
rule = '[speed] > 5 and [dist] > 42 or 1 = "2"',
CODE = 99999, LABEL = "PREGNANCIES_NOT_ASSESSED FOR MALES",
class = "JUMP")
ModifyiedStudyData <- replace in SEX_0 where SEX_0 is empty, if rule fits
ModifyedMetaData <- add missing codes with labels and class here
subset(study_data, eval(pregnancies[[3]]))
rule <-
paste0('[con_consentdt] <> "" and [sda_osd1dt] <> "" and',
' datediff([con_consentdt],[sda_osd1dt],"d",true) < 0')
x <- data.frame(con_consentdt = c(as.POSIXct("2020-01-01"),
as.POSIXct("2020-10-20")),
sda_osd1dt = c(as.POSIXct("2020-01-20"),
as.POSIXct("2020-10-01")))
eval(util_parse_redcap_rule(paste0(
'[con_consentdt] <> "" and [sda_osd1dt] <> "" and ',
'datediff([con_consentdt],[sda_osd1dt],"d", "Y-M-D",true) < 10')),
x, util_get_redcap_rule_env())
util_parse_redcap_rule("[a] = 12 or [b] = 13")
cars[eval(util_parse_redcap_rule(
rule = '[speed] > 5 and [dist] > 42 or 1 = "2"'), cars,
util_get_redcap_rule_env()), ]
cars[eval(util_parse_redcap_rule(
rule = '[speed] > 5 and [dist] > 42 or 2 = "2"'), cars,
util_get_redcap_rule_env()), ]
cars[eval(util_parse_redcap_rule(
rule = '[speed] > 5 or [dist] > 42 and 1 = "2"'), cars,
util_get_redcap_rule_env()), ]
cars[eval(util_parse_redcap_rule(
rule = '[speed] > 5 or [dist] > 42 and 2 = "2"'), cars,
util_get_redcap_rule_env()), ]
util_parse_redcap_rule(rule = '(1 = "2" or true) and (false)')
eval(util_parse_redcap_rule(rule =
'[dist] > sum(1, +(2, [dist] + 5), [speed]) + 3 + [dist]'),
cars, util_get_redcap_rule_env())
## End(Not run)
Paste strings but keep NA (paste0
)
Description
Paste strings but keep NA (paste0
)
Usage
util_paste0_with_na(...)
Arguments
... |
other arguments passed to |
Value
character pasted strings
Paste strings but keep NA
Description
Paste strings but keep NA
Usage
util_paste_with_na(...)
Arguments
... |
other arguments passed to |
Value
character pasted strings
Plot to un-disclosed ggplot
object
Description
Plot to un-disclosed ggplot
object
Usage
util_plot2svg_object(expr, w = 21.2, h = 15.9, sizing_hints)
Arguments
expr |
plot expression |
w |
width in cm |
h |
height in cm |
Value
ggplot
object, but rendered (no original data included)
Utility function to create plots for categorical variables
Description
Depending on the required level of complexity, this helper function creates various plots for categorical variables. Next to basic bar plots, it also enables group comparisons (for example for device/examiner effects) and longitudinal views.
Usage
util_plot_categorical_vars(
resp_vars,
group_vars = NULL,
time_vars = NULL,
study_data,
meta_data,
n_cat_max = 6,
n_group_max = getOption("dataquieR.max_group_var_levels_in_plot", 20),
n_data_min = 20
)
Arguments
resp_vars |
name of the categorical variable |
group_vars |
name of the grouping variable |
time_vars |
name of the time variable |
study_data |
the data frame that contains the measurements |
meta_data |
the data frame that contains metadata attributes of study data |
n_cat_max |
maximum number of categories to be displayed individually
for the categorical variable ( |
n_group_max |
maximum number of categories to be displayed individually
for the grouping variable ( |
n_data_min |
minimum number of data points to create a time course plot
for an individual category of the |
Value
a figure
Plot a ggplot2
figure without plotly
Description
Plot a ggplot2
figure without plotly
Usage
util_plot_figure_no_plotly(x, sizing_hints = NULL)
Arguments
x |
ggplot2::ggplot2 object |
sizing_hints |
|
Value
htmltools
compatible object
Plot a ggplot2
figure using plotly
Description
Plot a ggplot2
figure using plotly
Usage
util_plot_figure_plotly(x, sizing_hints = NULL)
Arguments
x |
ggplot2::ggplot2 object |
sizing_hints |
|
Value
htmltools
compatible object
Replacement for htmltools::plotTag
Description
the function is specifically designed for fully scalable SVG
figures.
Usage
util_plot_svg_to_uri(expr, w = 800, h = 600)
Arguments
expr |
plot expression |
w |
width |
h |
height
|
Value
htmltools
compatible object
Plotly
to un-disclosed ggplot
object
Description
Plotly
to un-disclosed ggplot
object
Usage
util_plotly2svg_object(plotly, w = 21.2, h = 15.9, sizing_hints)
Arguments
plotly |
the object |
w |
width in cm |
h |
height in cm |
Value
ggplot
object, but rendered (no original data included)
Utility function to prepare the metadata for location checks
Description
Utility function to prepare the metadata for location checks
Usage
util_prep_location_check(
resp_vars,
meta_data,
report_problems = c("error", "warning", "message"),
label_col = VAR_NAMES
)
Arguments
resp_vars |
variable list the names of the measurement variables |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
report_problems |
enum Should missing metadata information be reported as error, warning or message? |
Value
a list with the location metric (mean or median) and expected range for the location check
See Also
Other lookup_functions:
util_prep_proportion_check()
,
util_variable_references()
Utility function to prepare the metadata for proportion checks
Description
Utility function to prepare the metadata for proportion checks
Usage
util_prep_proportion_check(
resp_vars,
meta_data,
ds1,
report_problems = c("error", "warning", "message"),
label_col = attr(ds1, "label_col")
)
Arguments
resp_vars |
variable list the names of the measurement variables |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
ds1 |
data.frame the data frame that contains the measurements
(hint: missing value codes should be excluded,
so the function should be called with |
report_problems |
enum Should missing metadata information be reported as error, warning or message? |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
Value
a list with the expected range for the proportion check
See Also
Other lookup_functions:
util_prep_location_check()
,
util_variable_references()
Convert single dataquieR
result to an htmltools
compatible object
Description
Convert single dataquieR
result to an htmltools
compatible object
Usage
util_pretty_print(
dqr,
nm,
is_single_var,
meta_data,
label_col,
use_plot_ly,
dir,
...
)
Arguments
dqr |
dataquieR_result an output (indicator) from |
nm |
character the name used in the report, the alias name of the function call plus the variable name |
is_single_var |
logical we are creating a single variable overview page or an indicator summary page |
meta_data |
meta_data the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
use_plot_ly |
logical use |
dir |
character output directory for potential |
... |
further arguments passed through, if applicable |
Value
htmltools
compatible object with rendered dqr
Prepare a vector four output
Description
Prepare a vector four output
Usage
util_pretty_vector_string(v, quote = dQuote, n_max = length(v))
Arguments
v |
the vector |
quote |
function, used for quoting – |
n_max |
maximum number of elements of |
Value
the "pretty" collapsed vector as a string.
See Also
Other string_functions:
util_abbreviate_unique()
,
util_filter_names_by_regexps()
,
util_set_dQuoteString()
,
util_set_sQuoteString()
,
util_sub_string_left_from_.()
,
util_sub_string_right_from_.()
,
util_translate()
Bind data frames row-based
Description
if not all data frames share all columns, missing columns will be filled with
NA
s.
Usage
util_rbind(..., data_frames_list = list())
Arguments
... |
data.frame none more more data frames |
data_frames_list |
list optional, a list of data frames |
Value
data.frame all data frames appended
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Examples
## Not run:
util_rbind(head(cars), head(iris))
util_rbind(head(cars), tail(cars))
util_rbind(head(cars)[, "dist", FALSE], tail(cars)[, "speed", FALSE])
## End(Not run)
Can we really be sure to run RStudio
Description
Jetbrain's
Idea
and so on fake to be RStudio
by having RStudio
in
.Platform$GUI
.
Usage
util_really_rstudio()
Value
TRUE
, if really sure to be RStudio
, FALSE
, otherwise.
Map a vector of values based on an assignment table
Description
Map a vector of values based on an assignment table
Usage
util_recode(values, mapping_table, from, to, default = NULL)
Arguments
values |
vector the vector |
mapping_table |
data.frame a table with the mapping table |
from |
character the name of the column with the "old values" |
to |
character the name of the column with the "new values" |
default |
character either one character or on character per value,
being used, if an entry from |
Value
the mapped values
See Also
Other mapping:
util_map_all()
,
util_map_by_largest_prefix()
,
util_map_labels()
For a group of variables (original) the function provides all original plus referred variables in the metadata and a new item_level metadata including information on the original variables and the referred variables
Description
For a group of variables (original) the function provides all original plus referred variables in the metadata and a new item_level metadata including information on the original variables and the referred variables
Usage
util_referred_vars(
resp_vars,
id_vars = character(0),
vars_in_subgroup = character(0),
meta_data,
meta_data_segment = NULL,
meta_data_dataframe = NULL,
meta_data_cross_item = NULL,
meta_data_item_computation = NULL,
strata_column = NULL
)
Arguments
resp_vars |
variable list the name of the original variables. |
id_vars |
variable a vector containing the name/s of the variables containing ids |
vars_in_subgroup |
variable a vector containing the name/s of the variable/s mentioned inside the subgroup rule |
meta_data |
data.frame old name for |
meta_data_segment |
data.frame – optional: Segment level metadata |
meta_data_dataframe |
data.frame – optional if |
meta_data_cross_item |
data.frame – optional: Cross-item level metadata |
meta_data_item_computation |
data.frame – optional: Computed items metadata |
strata_column |
variable name of a study variable used to stratify the report by and to add as referred variable |
Value
a named list containing the referred variables and a new item_level metadata including information on the original variables and the referred variables
removes empty rows from x
Description
removes empty rows from x
Usage
util_remove_empty_rows(x, id_vars = character(0))
Arguments
x |
data.frame a data frame to be cleaned |
id_vars |
character column names, that will be treated as empty |
Value
data.frame reduced x
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_count_expected_observations()
,
util_filter_missing_list_table_for_rv()
,
util_get_code_list()
,
util_is_na_0_empty_or_false()
,
util_observation_expected()
,
util_replace_codes_by_NA()
remove all records, that have at least one NA
in any of the given variables
Description
remove all records, that have at least one NA
in any of the given variables
Usage
util_remove_na_records(study_data, vars = colnames(study_data))
Arguments
study_data |
the study data frame |
vars |
the variables being checked for |
Value
modified study_data data frame
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Examples
## Not run:
dta <- iris
dim(util_remove_na_records(dta))
dta$Species[4:6] <- NA
dim(util_remove_na_records(dta))
dim(util_remove_na_records(dta, c("Sepal.Length", "Petal.Length")))
## End(Not run)
Render a table summarizing dataquieR results
Description
Render a table summarizing dataquieR results
Usage
util_render_table_dataquieR_summary(
x,
grouped_by = c("call_names", "indicator_metric"),
folder_of_report = NULL,
var_uniquenames = NULL
)
Arguments
x |
a report summary ( |
grouped_by |
define the columns of the resulting matrix. It can be either
"call_names", one column per function, or "indicator_metric",
one column per indicator or both
|
folder_of_report |
a named vector with the location of variable and call_names |
var_uniquenames |
a data frame with the original variable names and the unique names in case of reports created with dq_report_by containing the same variable in several reports (e.g., creation of reports by sex) |
Value
something, htmltools
can render
Utility function to replace missing codes by NA
s
Description
Substitute all missing codes in a data.frame by NA
.
Usage
util_replace_codes_by_NA(
study_data,
meta_data = "item_level",
split_char = SPLIT_CHAR,
sm_code = NULL
)
Arguments
study_data |
Study data including jump/missing codes as specified in the code conventions |
meta_data |
Metadata as specified in the code conventions |
split_char |
Character separating missing codes |
sm_code |
missing code for Codes are expected to be numeric. |
Value
a list with a modified data frame and some counts
See Also
Other missing_functions:
util_all_intro_vars_for_rv()
,
util_count_expected_observations()
,
util_filter_missing_list_table_for_rv()
,
util_get_code_list()
,
util_is_na_0_empty_or_false()
,
util_observation_expected()
,
util_remove_empty_rows()
Replace limit violations (HARD_LIMITS) by NAs
Description
Replace limit violations (HARD_LIMITS) by NAs
Usage
util_replace_hard_limit_violations(study_data, meta_data, label_col)
Arguments
study_data |
|
meta_data |
|
label_col |
variable attribute the name of the column in the metadata with labels of variables |
Value
modified study_data
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_round_to_decimal_places()
,
util_study_var2factor()
,
util_table_of_vct()
Import a data frame
Description
see rio::import
, but with argument keep_types
and modified error
handling.
Usage
util_rio_import(fn, keep_types, ...)
Arguments
fn |
the file name to load. |
keep_types |
logical keep types as possibly defined in the file.
set |
... |
additional arguments for rio::import |
Value
data.frame as in rio::import
Import list of data frames
Description
see rio::import_list
, but with argument keep_types
and modified error
handling.
Usage
util_rio_import_list(fn, keep_types, ...)
Arguments
fn |
the file name to load. |
keep_types |
logical keep types as possibly defined in the file.
set |
... |
additional arguments for rio::import_list |
Value
list as in rio::import_list
Round number of decimal places to 3 if the values are between 0.001 and 9999.999 otherwise (if at least one value of the vector is outside this limits) use scientific notation for all the values in a vector
Description
Round number of decimal places to 3 if the values are between 0.001 and 9999.999 otherwise (if at least one value of the vector is outside this limits) use scientific notation for all the values in a vector
Usage
util_round_to_decimal_places(x, digits = 3)
Arguments
x |
a numeric vector to be rounded |
digits |
a numeric value indicating the number of desired decimal places |
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_study_var2factor()
,
util_table_of_vct()
Utility function to put strings in quotes
Description
This function generates usual double-quotes for each element of the character vector
Usage
util_set_dQuoteString(string)
Arguments
string |
Character vector |
Value
quoted string
See Also
Other string_functions:
util_abbreviate_unique()
,
util_filter_names_by_regexps()
,
util_pretty_vector_string()
,
util_set_sQuoteString()
,
util_sub_string_left_from_.()
,
util_sub_string_right_from_.()
,
util_translate()
Utility function single quote string
Description
This function generates usual single-quotes for each element of the character vector.
Usage
util_set_sQuoteString(string)
Arguments
string |
Character vector |
Value
quoted string
See Also
Other string_functions:
util_abbreviate_unique()
,
util_filter_names_by_regexps()
,
util_pretty_vector_string()
,
util_set_dQuoteString()
,
util_sub_string_left_from_.()
,
util_sub_string_right_from_.()
,
util_translate()
Attaches attributes about the recommended minimum absolute sizes to the plot p
Description
Attaches attributes about the recommended minimum absolute sizes to the plot p
Usage
util_set_size(p, width_em = NA_integer_, height_em = NA_integer_)
Arguments
p |
ggplot2::ggplot the plot |
width_em |
numeric len=1. the minimum width hint in |
height_em |
numeric len=1. the minimum height in |
Value
p the modified plot
See Also
Other reporting_functions:
util_alias2caption()
,
util_copy_all_deps()
,
util_create_page_file()
,
util_eval_to_dataquieR_result()
,
util_evaluate_calls()
,
util_float_index_menu()
,
util_generate_anchor_link()
,
util_generate_anchor_tag()
,
util_generate_calls()
,
util_generate_calls_for_function()
,
util_load_manual()
,
util_make_data_slot_from_table_slot()
,
util_order_by_order()
Set up an RStudio job
Description
Also defines a progress function and a progress_msg
function in
the caller's environment.
Usage
util_setup_rstudio_job(job_name = "Job")
Arguments
job_name |
a name for the job |
Details
In RStudio
its job system will be used, for shiny::withProgress
based calls, this will require min and max being set to 0 and 1 (defaults).
If cli
is available, it will be used, in all other cases, just message
s
will be created.
Value
list: the progress
function and the progress_msg
function
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_suppress_output()
Examples
## Not run:
test <- function() {
util_setup_rstudio_job("xx")
Sys.sleep(5)
progress(50)
progress_msg("halfway through")
Sys.sleep(5)
progress(100)
Sys.sleep(1)
}
test()
## End(Not run)
Utility function outliers according to the rule of Huber et al.
Description
This function calculates outliers according to the rule of Huber et al.
Usage
util_sigmagap(x)
Arguments
x |
numeric data to check for outliers |
Value
binary vector
See Also
Other outlier_functions:
util_3SD()
,
util_hubert()
,
util_tukey()
Sort a vector by order given in some other vector
Description
Sort a vector by order given in some other vector
Usage
util_sort_by_order(x, order, ...)
Arguments
x |
the vector |
order |
the "order vector |
... |
additional arguments passed to |
See Also
Other summary_functions:
prep_combine_report_summaries()
,
prep_extract_classes_by_functions()
,
prep_extract_summary()
,
prep_extract_summary.dataquieR_result()
,
prep_extract_summary.dataquieR_resultset2()
,
prep_render_pie_chart_from_summaryclasses_ggplot2()
,
prep_render_pie_chart_from_summaryclasses_plotly()
,
prep_summary_to_classes()
,
util_as_cat()
,
util_as_integer_cat()
,
util_extract_indicator_metrics()
,
util_get_category_for_result()
,
util_get_colors()
,
util_get_labels_grading_class()
,
util_get_message_for_result()
,
util_get_rule_sets()
,
util_get_ruleset_formats()
,
util_get_thresholds()
,
util_html_table()
Examples
## Not run:
util_sort_by_order(c("a", "b", "a", "c", "d"), letters)
## End(Not run)
Split table with mixed code/missing lists to single tables
Description
resulting tables are populated to the data frame cache.
Usage
util_split_val_tab(val_tab = CODE_LIST_TABLE)
Arguments
val_tab |
data.frame tables in one long data frame. |
Value
invisible(NULL)
Compute something comparable from an ordered
Description
interpolates categories of an ordinal variable
Usage
util_standardise_ordinal_codes(codes, maxlevel_old, maxlevel_new)
Arguments
codes |
|
maxlevel_old |
|
maxlevel_new |
|
Value
integer()
n
values in {1, ..., maxlevel_new}
String check for results/combined results
Description
detect, if x
starts with <prefix>.
or equals <prefix>
,
if results have been combined
Usage
util_startsWith_prefix._or_equals_prefix(x, prefix, sep = ".")
Arguments
x |
character haystack |
prefix |
character needle |
sep |
character separation string |
Value
logical if entries in x start with prefix-DOT/equal to prefix
Verify assumptions made by the code, that must be TRUE
Description
Verify assumptions made by the code, that must be TRUE
Usage
util_stop_if_not(..., label, label_only)
Arguments
... |
see |
label |
character a label for the assumptions, can be missing |
label_only |
logical if |
Value
invisible(FALSE)
, if not stopped.
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_warn_unordered()
Create a storr
object with a storr_factory
attribute
Description
also does basic validity checks
Usage
util_storr_factory(my_storr_object, my_storr_factory)
Arguments
my_storr_object |
a |
my_storr_factory |
a function creating the/a |
Value
storr
-object with the factory attribute and (hopefully) valid.
Create a storr
-object using the factory
Description
also performs checks.
Usage
util_storr_object(
my_storr_factory = function() {
storr::storr_environment()
}
)
Arguments
my_storr_factory |
a function returning a |
Value
a storr
object
Utility function for judging whether a character vector does not appear to be a categorical variable
Description
The function considers the following properties:
the maximum number of characters (to identify free text fields with long entries),
the relative frequency of punctuation and space characters per element (to identify, e.g., JSON or XML elements, which are structured by those characters),
the relative frequency of elements (categorical variables would have a low proportion of unique values in comparison to other variables).
Usage
util_string_is_not_categorical(vec)
Arguments
vec |
a character vector |
Value
TRUE or FALSE
Convert a study variable to a factor
Description
Convert a study variable to a factor
Usage
util_study_var2factor(
resp_vars = NULL,
study_data,
meta_data = "item_level",
label_col = LABEL,
assume_consistent_codes = TRUE,
have_cause_label_df = FALSE,
code_name = c(JUMP_LIST, MISSING_LIST),
include_sysmiss = TRUE
)
Arguments
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
assume_consistent_codes |
logical assume, that missing codes are consistent for all variables |
have_cause_label_df |
logical is a missing-code table available |
code_name |
character all lists from the meta_data to use for the coding. |
include_sysmiss |
logical add also a factor level for data values
that were |
Value
study_data
converted to factors using the coding provided in
code_name
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_table_of_vct()
Get sub-string left from first .
Description
Get sub-string left from first .
Usage
util_sub_string_left_from_.(x)
Arguments
x |
the string with a least one |
See Also
Other string_functions:
util_abbreviate_unique()
,
util_filter_names_by_regexps()
,
util_pretty_vector_string()
,
util_set_dQuoteString()
,
util_set_sQuoteString()
,
util_sub_string_right_from_.()
,
util_translate()
Examples
## Not run:
util_sub_string_left_from_.(c("a.b", "asdf.xyz", "asdf.jkl.zuio"))
## End(Not run)
Get sub-string right from first .
Description
Get sub-string right from first .
Usage
util_sub_string_right_from_.(x)
Arguments
x |
the string with a least one |
See Also
Other string_functions:
util_abbreviate_unique()
,
util_filter_names_by_regexps()
,
util_pretty_vector_string()
,
util_set_dQuoteString()
,
util_set_sQuoteString()
,
util_sub_string_left_from_.()
,
util_translate()
Examples
## Not run:
util_sub_string_right_from_.(c("a.b", "asdf.xyz"))
util_sub_string_right_from_.(c("a.b", "asdf.xy.z"))
util_sub_string_right_from_.(c("ab", "asdxy.z"))
## End(Not run)
Suppress any output to stdout
using sink()
Description
Suppress any output to stdout
using sink()
Usage
util_suppress_output(expr)
Arguments
expr |
Value
invisible()
result of expr
See Also
Other process_functions:
util_abbreviate()
,
util_all_is_integer()
,
util_attach_attr()
,
util_bQuote()
,
util_backtickQuote()
,
util_coord_flip()
,
util_extract_matches()
,
util_par_pmap()
,
util_setup_rstudio_job()
Suppress warnings conditionally
Description
Suppress warnings conditionally
Usage
util_suppress_warnings(expr, classes = "warning")
Arguments
expr |
expression to evaluate |
classes |
character classes of warning-conditions to suppress |
Value
the result of expr
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_warning()
Tabulate a vector
Description
does the same as as.data.frame(table(x))
but
guarantees a data frame with two columns is returned
Usage
util_table_of_vct(Var1)
Arguments
Var1 |
vector to ta tabulate |
Value
a data frame with columns Var1
and Freq
See Also
Other data_management:
util_assign_levlabs()
,
util_check_data_type()
,
util_check_group_levels()
,
util_compare_meta_with_study()
,
util_dichotomize()
,
util_fix_merge_dups()
,
util_merge_data_frame_list()
,
util_rbind()
,
util_remove_na_records()
,
util_replace_hard_limit_violations()
,
util_round_to_decimal_places()
,
util_study_var2factor()
Rotate 1-row data frames to key-value data frames
Description
if nrow(tb) > 1
, util_table_rotator
just returns tb
.
Usage
util_table_rotator(tb)
Arguments
tb |
data.frame a data frame |
Value
data.frame but transposed
Get a translation
Description
Get a translation
Usage
util_translate(keys, ns = "general", lang = getOption("dataquieR.lang", ""))
Arguments
keys |
character translation keys |
ns |
character translation namespace |
lang |
character language to translate to |
Value
character translations
See Also
Other string_functions:
util_abbreviate_unique()
,
util_filter_names_by_regexps()
,
util_pretty_vector_string()
,
util_set_dQuoteString()
,
util_set_sQuoteString()
,
util_sub_string_left_from_.()
,
util_sub_string_right_from_.()
Translate standard column names to readable ones
Description
TODO: Duplicate of util_make_data_slot_from_table_slot ??
Usage
util_translate_indicator_metrics(
colnames,
short = FALSE,
long = TRUE,
ignore_unknown = FALSE
)
Arguments
colnames |
character the names to translate |
short |
logical include unit letter in output |
long |
logical include unit description in output |
ignore_unknown |
logical do not replace unknown indicator metrics
by |
Value
translated names
Utility function Tukey outlier rule
Description
This function calculates outliers according to the rule of Tukey.
Usage
util_tukey(x)
Arguments
x |
numeric data to check for outliers |
Value
binary vector
See Also
Other outlier_functions:
util_3SD()
,
util_hubert()
,
util_sigmagap()
Remove tables referred to by metadata and use SVG
for most figures
Description
Remove tables referred to by metadata and use SVG
for most figures
Usage
util_undisclose(x, ...)
Arguments
x |
an object to un-disclose |
... |
further arguments, used for pointing to the |
Value
undisclosed object
Detect base unit from composite units
Description
Detect base unit from composite units
Usage
util_unit2baseunit(
unit,
warn_ambiguities = !exists("warn_ambiguities", .unit2baseunitenv),
unique = TRUE
)
Arguments
unit |
character a unit |
warn_ambiguities |
logical warn about all ambiguous units |
unique |
logical choose the more |
Value
character all possible or the preferable (unique set TRUE
)
base units. Can be character(0)
, if unit is invalid
or uniqueness was requested, but even using precedence
rules of SI
-closeness do not help selecting the most
suitable unit.
Examples
## Not run:
util_unit2baseunit("%")
util_unit2baseunit("d%")
# Invalid unit
util_unit2baseunit("aa%")
util_unit2baseunit("aa%", unique = FALSE)
util_unit2baseunit("a%")
# Invalid unit
util_unit2baseunit("e%")
util_unit2baseunit("e%", unique = FALSE)
util_unit2baseunit("E%")
util_unit2baseunit("Eg")
# Invalid unit
util_unit2baseunit("E")
util_unit2baseunit("E", unique = FALSE)
util_unit2baseunit("EC")
util_unit2baseunit("EK")
util_unit2baseunit("µg")
util_unit2baseunit("mg")
util_unit2baseunit("°C")
util_unit2baseunit("k°C")
util_unit2baseunit("kK")
util_unit2baseunit("nK")
# Ambiguous units, if used with unique = FALSE
util_unit2baseunit("kg")
util_unit2baseunit("cd")
util_unit2baseunit("Pa")
util_unit2baseunit("kat")
util_unit2baseunit("min")
# atto atom units or astronomical units, both in state "accepted"
util_unit2baseunit("au")
util_unit2baseunit("au", unique = FALSE)
# astronomical units or micro are, both in state "accepted"
util_unit2baseunit("ua")
util_unit2baseunit("ua", unique = FALSE)
util_unit2baseunit("kt")
# parts per trillion or pico US_liquid_pint, both in state "common",
# but in this case, plain count units will be preferred
util_unit2baseunit("ppt")
util_unit2baseunit("ppt", unique = FALSE)
util_unit2baseunit("ft")
util_unit2baseunit("yd")
util_unit2baseunit("pt")
# actually the same, but both only common, and to my knowledge not-so-common
# gram-force vs. kilogram-force (kilo pond)
util_unit2baseunit("kgf")
util_unit2baseunit("kgf", unique = FALSE)
util_unit2baseunit("at")
util_unit2baseunit("ph")
util_unit2baseunit("nt")
## End(Not run)
Save a hint to the user during package load
Description
Save a hint to the user during package load
Usage
util_user_hint(x)
Arguments
x |
character the hint |
Value
invisible(NULL)
See Also
Other system_functions:
util_detect_cores()
,
util_view_file()
Utility function verifying syntax of known metadata columns
Description
This function goes through metadata columns, dataquieR
supports
and verifies for these, that they follow its metadata conventions.
Usage
util_validate_known_meta(meta_data)
Arguments
meta_data |
data.frame the data frame that contains metadata attributes of study data |
Value
data.frame possibly modified meta_data
, invisible()
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_missing_lists()
Validate code lists for missing and/or jump codes
Description
will warn/stop on problems
Usage
util_validate_missing_lists(
meta_data,
cause_label_df,
assume_consistent_codes = FALSE,
expand_codes = FALSE,
suppressWarnings = FALSE,
label_col
)
Arguments
meta_data |
data.frame the data frame that contains metadata attributes of study data |
cause_label_df |
data.frame missing code table. If missing codes have labels the respective data frame can be specified here, see cause_label_df |
assume_consistent_codes |
logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code will be the same for all variables. |
expand_codes |
logical if TRUE, code labels are copied from other variables, if the code is the same and the label is set somewhere |
suppressWarnings |
logical warn about consistency issues with missing and jump lists |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
Value
list with entries:
-
cause_label_df
updated data frame with labels for missing codes
See Also
Other metadata_management:
util_dist_selection()
,
util_find_free_missing_code()
,
util_find_var_by_meta()
,
util_get_var_att_names_of_level()
,
util_get_vars_in_segment()
,
util_looks_like_missing()
,
util_no_value_labels()
,
util_validate_known_meta()
Verify the class ReportSummaryTable
Description
Verify the class ReportSummaryTable
Usage
util_validate_report_summary_table(tb, meta_data, label_col)
Arguments
tb |
data.frame object to be a |
meta_data |
data.frame the data frame that contains metadata attributes of study data. Used to translate variable names, if given. |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
Value
data.frame maybe fixed ReportSummaryTable
Utility function to compute the rank intraclass correlation
Description
This implementation uses the package rankICC
to compute the rank
intraclass correlation, a nonparametric version of the ICC (Tu et al., 2023).
In contrast to model-based ICC approaches, it is less sensitive to outliers
and skewed distributions. It can be applied to variables with an ordinal,
interval or ratio scale. However, it is not possible to adjust for
covariables with this approach. The calculated ICC can become negative,
like Fisher's ICC.
Usage
util_varcomp_robust(
resp_vars = NULL,
group_vars = NULL,
study_data = study_data,
meta_data = meta_data,
min_obs_in_subgroup = 10,
min_subgroups = 5,
label_col = NULL
)
Arguments
resp_vars |
the name of the response variable |
group_vars |
the name of the grouping variable |
study_data |
the data frame that contains the measurements |
meta_data |
the data frame that contains metadata attributes of study data |
min_obs_in_subgroup |
the minimum number of observations that is
required to include a subgroup (level) of the
grouping variable ( |
min_subgroups |
the minimum number of subgroups (levels) of the
grouping variable ( |
label_col |
the name of the column in the metadata with labels of variables |
Value
a vector from rankICC::rankICC
Find all columns in item-level-metadata, that refer to some other variable
Description
Find all columns in item-level-metadata, that refer to some other variable
Usage
util_variable_references(meta_data = "item_level")
Arguments
meta_data |
data.frame the metadata |
Value
character all column names referring to variables from item-level metadata
See Also
Other lookup_functions:
util_prep_location_check()
,
util_prep_proportion_check()
Verify encoding
Description
Verify encoding
Usage
util_verify_encoding(dt0, ref_encs)
Arguments
dt0 |
data.frame data to verify |
ref_encs |
character names are column names of |
Examples
## Not run:
dt0 <-
prep_get_data_frame(
file.path("~",
"rsync", "nako_mrt_qs$", "exporte", "NAKO_Datensatz_bereinigte_Daten",
"NatCoEdc_Export", "export_mannheim_30.csv"))
util_verify_encoding(dt0)
dt0$mrt_note[[1]] <- iconv("Härbärt", "UTF-8", "cp1252")
util_verify_encoding(dt0)
dt0$mrt_note[[15]] <- iconv("Härbärt", "UTF-8", "cp1252")
util_verify_encoding(dt0)
dt0$mrt_note[[1]] <- "Härbärt"
util_verify_encoding(dt0)
dt0$mrt_note[[17]] <- iconv("Härbärt", "UTF-8", "latin3")
util_verify_encoding(dt0)
## End(Not run)
Test for likely misspelled data frame references
Description
checks, if some data frame names may have typos in their names
Usage
util_verify_names(name_of_study_data = character(0))
Arguments
name_of_study_data |
character names of study data, such are expected |
Value
invisible(NULL)
, messages / warns only.
View a file in most suitable viewer
Description
View a file in most suitable viewer
Usage
util_view_file(file)
Arguments
file |
the file to view |
Value
invisible(file)
See Also
Other system_functions:
util_detect_cores()
,
util_user_hint()
Warn about a problem in varname
, if x
has no natural order
Description
Also warns, if R does not have a comparison operator for x
.
Usage
util_warn_unordered(x, varname)
Arguments
x |
vector of data |
varname |
character len=1. Variable name for warning messages |
Value
invisible(NULL)
See Also
Other robustness_functions:
util_as_valid_missing_codes()
,
util_check_one_unique_value()
,
util_correct_variable_use()
,
util_empty()
,
util_ensure_character()
,
util_ensure_in()
,
util_ensure_suggested()
,
util_expect_scalar()
,
util_fix_rstudio_bugs()
,
util_is_integer()
,
util_is_numeric_in()
,
util_is_valid_missing_codes()
,
util_match_arg()
,
util_observations_in_subgroups()
,
util_stop_if_not()
Produce a warning message with a useful short stack trace.
Description
Produce a warning message with a useful short stack trace.
Usage
util_warning(
m,
...,
applicability_problem = NA,
intrinsic_applicability_problem = NA,
integrity_indicator = "none",
level = 0,
immediate,
title = "",
additional_classes = c()
)
Arguments
m |
warning message or a condition |
... |
arguments for sprintf on m, if m is a character |
applicability_problem |
logical |
intrinsic_applicability_problem |
logical |
integrity_indicator |
character the warning is an integrity problem, here is the indicator abbreviation.. |
level |
integer level of the warning message (defaults to 0). Higher levels are more severe. |
immediate |
logical Display the warning immediately, not only, when the interactive session comes back. |
additional_classes |
character additional classes the thrown condition object should inherit from, first. |
Value
condition the condition object, if the execution is not stopped
See Also
Other condition_functions:
util_condition_constructor_factory()
,
util_deparse1()
,
util_error()
,
util_find_external_functions_in_stacktrace()
,
util_find_first_externally_called_functions_in_stacktrace()
,
util_find_indicator_function_in_callers()
,
util_message()
,
util_suppress_warnings()
Data frame with labels for missing- and jump-codes #' Metadata about value and missing codes
Description
data.frame with the following columns:
-
CODE_VALUE
: numeric | DATETIME Missing or categorical code (the number or date representing a missing/category) -
CODE_LABEL
: character a label for the missing code or category -
CODE_CLASS
: enum JUMP | MISSING. For missing lists: Class of the missing code. -
CODE_INTERPRET
enum I | P | PL | R | BO | NC | O | UH | UO | NE. For missing lists: Class of the missing code according toAAPOR
. -
resp_vars
: character For missing lists: optional, if a missing code is specific for some variables, it is listed for each such variable with one entry inresp_vars
, IfNA
, the code is assumed shared among all variables. For v1.0 metadata, you need to refer toVAR_NAMES
here.
See Also
com_qualified_item_missingness()
com_qualified_segment_missingness()