Title: | Efficient Tabulation with Stata-Like Output |
Version: | 1.0.0 |
Description: | Efficient tabulation with Stata-like output. For each unique value of the variable, it shows the number of observations with that value, proportion of observations with that value, and cumulative proportion, in descending order of frequency. Accepts data.table, tibble, or data.frame as input. Efficient with big data: if you give it a data.table, tab() uses data.table syntax. |
Imports: | assertthat, dplyr, data.table, magrittr, purrr, rlang, stats, stringr, tibble, tidyr |
Depends: | R (≥ 3.4.0) |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Packaged: | 2021-01-06 22:03:57 UTC; cesarlandin |
Author: | Sean Higgins [aut, cre] |
Maintainer: | Sean Higgins <sean.higgins@kellogg.northwestern.edu> |
Repository: | CRAN |
Date/Publication: | 2021-01-08 13:20:02 UTC |
Efficient quantiles
Description
Produces quantiles of the variables.
quantiles
shows quantile values.
Efficient with big data: if you give it a data.table
,
quantiles
uses data.table
syntax.
Usage
quantiles(df, ..., probs = seq(0, 1, 0.1), na.rm = FALSE)
Arguments
df |
A data.table, tibble, or data.frame. |
... |
A column or set of columns (without quotation marks). |
probs |
numeric vector of probabilities with values in [0,1]. |
na.rm |
logical; if true, any NA and NaN's are removed from x before the quantiles are computed. |
Value
Quantile values.
Examples
# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% quantiles(varname)
# data.table: look at top 10% in more detail
a %>% quantiles(varname, probs = seq(0.9, 1, 0.01))
# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% quantiles(varname, na.rm = TRUE)
Efficient tabulation
Description
Produces a tabulation: for each unique group from the variable(s),
tab
shows the number of
observations with that value, proportion of observations with that
value, and cumulative proportion, in descending order of frequency.
Accepts data.table, tibble, or data.frame as input.
Efficient with big data: if you give it a data.table
,
tab
uses data.table
syntax.
Usage
tab(df, ..., by, round)
Arguments
df |
A data.table, tibble, or data.frame. |
... |
A column or set of columns (without quotation marks). |
by |
A variable by which you want to group observations before tabulating (without quotation marks). |
round |
An integer indicating the number of digits for proportion and cumulative proportion. |
Value
Tabulation (frequencies, proportion, cumulative proportion) for each unique value of the variables given in ...
from df
.
Examples
# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% tab(varname)
# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% tab(varname, round = 1)
# data.frame
c <- data.frame(varname = sample.int(20, size = 1000000, replace = TRUE))
c %>% tab(varname)
Count distinct categories
Description
Produces a count of unique categories,
tabcount
shows the number of
unique categories for the selected variable.
Accepts data.table, tibble, or data.frame as input.
Efficient with big data: if you give it a data.table
,
tabcount
uses data.table
syntax.
Usage
tabcount(df, ...)
Arguments
df |
A data.table, tibble, or data.frame |
... |
A column or set of columns (without quotation marks) |
Value
Count of the number of unique groups formed by the variables given in ...
from df
.
Examples
# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% tabcount(varname)
# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% tabcount(varname)