Type: | Package |
Title: | Stratified Sampling and Labeling of Data in R |
Version: | 0.1.0 |
Description: | Provides functions for stratified sampling and assigning custom labels to data, ensuring randomness within groups. The package supports various sampling methods such as stratified, cluster, and systematic sampling. It allows users to apply transformations and customize the sampling process. This package can be useful for statistical analysis and data preparation tasks. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-09-17 17:36:52 UTC; nasru |
Author: | Duan Yuanheng [aut, cre] |
Maintainer: | Duan Yuanheng <yhyuanheng@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-09-18 12:30:02 UTC |
Cluster Sampling and Labeling
Description
This function performs cluster sampling on the dataframe and assigns "Yes" or "No" labels to rows based on selected clusters.
Usage
cluster_labels(df, group_col, yes_percentage)
Arguments
df |
A data frame containing the data. |
group_col |
A character string specifying the column to use for clustering. |
yes_percentage |
A numeric value between 0 and 100 indicating the percentage of clusters to label as "Yes". |
Value
A data frame with an additional column "Clustered_Yes_No" containing the cluster-sampled "Yes"/"No" labels.
Examples
result <- cluster_labels(iris, group_col = "Species", yes_percentage = 50)
Apply Custom Transformation to Data Column
Description
This function allows the user to apply a custom transformation (scaling, normalization, log transform, or custom function) to a specified numeric column.
Usage
custom_transform(df, selected_column, transformation_type)
Arguments
df |
A data frame containing the data. |
selected_column |
A character string specifying the column to be transformed. |
transformation_type |
A character string representing the transformation type: "scale", "normalize", "log", or a custom R function. |
Value
A data frame with the transformed column.
Examples
result <- custom_transform(iris, selected_column = "Sepal.Length", transformation_type = "scale")
Stratify and Assign Custom Labels to Data
Description
This function stratifies data based on a specified grouping column and assigns custom labels according to a given percentage.
Usage
stratified_custom_labels(df, group_col, label_percentage, label1, label2)
Arguments
df |
A data frame to be stratified. |
group_col |
A character string specifying the column name to group by. |
label_percentage |
A numeric value between 0 and 100 indicating the percentage of the first label to assign within each group. |
label1 |
A character string representing the first label. |
label2 |
A character string representing the second label. |
Value
A data frame with an additional column "Custom_Labels" containing the stratified custom labels.
Examples
result <- stratified_custom_labels(iris, group_col = "Species",
label_percentage = 50,
label1 = "High", label2 = "Low")
Stratify and Assign Labels to Data
Description
This function stratifies data based on a specified grouping column and assigns "Yes" or "No" labels according to a given percentage.
Usage
stratified_labels(df, group_col, yes_percentage)
Arguments
df |
A data frame to be stratified. |
group_col |
A character string specifying the column name to group by. |
yes_percentage |
A numeric value between 0 and 100 indicating the percentage of "Yes" labels to assign within each group. |
Value
A data frame with an additional column "Sampled_Yes_No" containing the stratified "Yes"/"No" labels.
Examples
# Example with the iris dataset
result <- stratified_labels(iris, group_col = "Species", yes_percentage = 50)
Systematic Sampling and Labeling
Description
This function performs systematic sampling on the dataframe and assigns "Yes" or "No" labels to rows based on the specified interval.
Usage
systematic_labels(df, group_col, sampling_interval)
Arguments
df |
A data frame containing the data. |
group_col |
A character string specifying the column to use for grouping. |
sampling_interval |
A numeric value representing the interval for systematic sampling. |
Value
A data frame with an additional column "Systematic_Yes_No" containing the systematically sampled "Yes"/"No" labels.
Examples
result <- systematic_labels(iris, group_col = "Species", sampling_interval = 2)