Title: | Dump 'R' Package Source, Documentation, and Vignettes into One File |
Version: | 0.1.0 |
Description: | Dump source code, documentation and vignettes of an 'R' package into a single file. Supports installed packages, tar.gz archives, and package source directories. If the package is not installed, only its source is automatically downloaded from CRAN for processing. The output is a single plain text file or a character vector, which is useful to ingest complete package documentation and source into a large language model (LLM) or pass it further to other tools, such as 'ragnar' https://github.com/tidyverse/ragnar to create a Retrieval-Augmented Generation (RAG) workflow. |
License: | MIT + file LICENSE |
URL: | https://github.com/e-kotov/rdocdump, https://www.ekotov.pro/rdocdump/ |
BugReports: | https://github.com/e-kotov/rdocdump/issues |
Suggests: | curl, quarto, testthat (≥ 3.0.0), withr |
VignetteBuilder: | quarto |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
Language: | en |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-06-17 15:15:54 UTC; ek |
Author: | Egor Kotov |
Maintainer: | Egor Kotov <kotov.egor@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-18 11:50:06 UTC |
Cleanup Temporary Files
Description
Clean up temporary package archive and extracted files according to a keep_files policy.
Usage
cleanup_files(pkg_info, keep_files)
Arguments
pkg_info |
A list returned by |
keep_files |
A
|
Value
Invisibly returns NULL
. If there are any issues with file deletion, warnings are issued.
Combine Rd files into a single character vector. This function reads the Rd files from a package source directory or an installed package and combines them into a single string.
Description
Combine Rd files into a single character vector. This function reads the Rd files from a package source directory or an installed package and combines them into a single string.
Usage
combine_rd(pkg_path, is_installed = FALSE, pkg_name = NULL)
Arguments
pkg_path |
Path to the package source directory or the installed package. |
is_installed |
Logical indicating whether the package is installed ( |
pkg_name |
Optional package name if the package is installed. |
Value
A single string containing the combined Rd documentation.
Helper function to combine package vignettes
Description
Helper function to combine package vignettes
Usage
combine_vignettes(pkg_path)
Arguments
pkg_path |
Path to the package source directory. |
Value
A single string containing the combined vignettes from the package.
Extract code from an installed package using its namespace. This function retrieves all functions from the package namespace and deparses them to get their source code.
Description
Extract code from an installed package using its namespace. This function retrieves all functions from the package namespace and deparses them to get their source code.
Usage
extract_code_installed(pkg_name)
Arguments
pkg_name |
The name of the installed package. |
Value
A single string containing the source code of all functions in the package.
Helper function to extract code from package source files.
This function reads all .R
files in the R
directory and optionally includes files from the tests
directory.
It can also exclude roxygen2 documentation lines.
Description
Helper function to extract code from package source files.
This function reads all .R
files in the R
directory and optionally includes files from the tests
directory.
It can also exclude roxygen2 documentation lines.
Usage
extract_code_source(pkg_path, include_tests = FALSE, include_roxygen = FALSE)
Arguments
pkg_path |
Path to the package source directory. |
include_tests |
|
include_roxygen |
|
Value
A single string containing the source code from the package's R files.
Extract R Source Code from a Package
Description
This function extracts the R source code from a package. For installed packages, it retrieves the package namespace and deparses all functions found in the package. For package source directories or archives (non-installed packages), it reads all .R
files from the R
directory and, optionally, from the tests
directory. Optionally, it can include roxygen2 documentation from these files.
Usage
rdd_extract_code(
pkg,
file = NULL,
include_tests = FALSE,
include_roxygen = FALSE,
force_fetch = FALSE,
cache_path = getOption("rdocdump.cache_path"),
keep_files = "none",
repos = getOption("rdocdump.repos", getOption("repos"))
)
Arguments
pkg |
A
|
file |
Optional. Save path for the output text file. If set, the function will return the path to the file instead of the combined text. Defaults to |
include_tests |
|
include_roxygen |
|
force_fetch |
|
cache_path |
A |
keep_files |
A
|
repos |
A |
Value
A single string containing the combined R source code (and, optionally, roxygen2 documentation) from the package.
Examples
# Extract only R source code (excluding roxygen2 documentation) from an installed package.
code <- rdd_extract_code("splines")
cat(substr(code, 1, 1000))
# Extract R source code including roxygen2 documentation from a package source directory.
# set cache directory for `rdocdump`
rdd_set_cache_path(paste0(tempdir(), "/rdocdump_cache"))
local({
code_with_roxygen <- rdd_extract_code(
"ini",
include_roxygen = TRUE,
force_fetch = TRUE,
repos = c("CRAN" = "https://cran.r-project.org")
)
cat(substr(code_with_roxygen, 1, 1000))
})
# Extract R source code from a package source directory,
# including test files but excluding roxygen2 docs.
local({
code_with_tests <- rdd_extract_code(
"ini",
include_roxygen = TRUE,
include_tests = TRUE,
force_fetch = TRUE,
repos = c("CRAN" = "https://cran.r-project.org")
)
cat(substr(code_with_tests, 1, 1000))
})
# clean cache directory
unlink(getOption("rdocdump.cache_path"), recursive = TRUE, force = TRUE)
Get Current rdocdump
Repository Options
Description
This function returns the current repository URLs used by rdocdump
. The default is set to the CRAN repository at "https://cloud.r-project.org". This does not affect the repositories used by install.packages()
in your current R session and/or project. To set repository options, use rdd_set_repos
.
Usage
rdd_get_repos()
Value
A character vector of repository URLs.
Examples
# Get current rdocdump repository options
rdd_get_repos()
Set rdocdump
Cache Path in the Current R Session
Description
This function sets the cache path used by rdocdump
to store temporary files (downloaded tar.gz archives and/or extracted directories) for the current R session. The cache path is stored in the option "rdocdump.cache_path"
, which can be checked with getOption("rdocdump.cache_path")
. The path is created if it does not exist.
Usage
rdd_set_cache_path(path)
Arguments
path |
A |
Value
Invisibly returns the new cache path.
Examples
# set cache directory for `rdocdump`
rdd_set_cache_path(paste0(tempdir(), "/rdocdump_cache"))
# default cache directory
unlink(getOption("rdocdump.cache_path"), recursive = TRUE)
Set rdocdump
Repository Options
Description
This function sets the package repository URLs used by rdocdump
when fetching package sources. May be useful for setting custom repositories or mirrors. This does not affect the repositories used by install.packages()
in your current R session and/or project.
Usage
rdd_set_repos(repos)
Arguments
repos |
A character vector of repository URLs. |
Value
Invisibly returns the new repository URLs.
Examples
# Set rdocdump repository options
rdd_set_repos(c("CRAN" = "https://cloud.r-project.org"))
Dump Package Source, Documentationm and Vignettes into Plain Text
Description
This function produces a single text output for an R package by processing its documentation (Rd files from the package source or the documentation from already installed packages), vignettes, and/or R source code.
Usage
rdd_to_txt(
pkg,
file = NULL,
content = "all",
force_fetch = FALSE,
keep_files = "none",
cache_path = getOption("rdocdump.cache_path"),
repos = getOption("rdocdump.repos", getOption("repos"))
)
Arguments
pkg |
A
|
file |
Optional. Save path for the output text file. If set, the function will return the path to the file instead of the combined text. Defaults to |
content |
A character vector specifying which components to include in the output. Possible values are:
You can specify multiple options (e.g., |
force_fetch |
|
keep_files |
A
|
cache_path |
A |
repos |
A |
Value
A single string containing the combined package documentation, vignettes, and/or code as specified by the content
argument.
If the file
argument is set, returns the path to the file.
Examples
# Extract documentation for built-in `stats` package (both docs and vignettes).
docs <- rdd_to_txt("splines")
cat(substr(docs, 1, 500))
# set cache directory for `rdocdump`
rdd_set_cache_path(paste0(tempdir(), "/rdocdump_cache"))
# Extract only documentation for rJavaEnv by downloading its source from CRAN
docs <- rdd_to_txt(
"rJavaEnv",
force_fetch = TRUE,
content = "docs",
repos = c("CRAN" = "https://cran.r-project.org")
)
lines <- unlist(strsplit(docs, "\n"))
# Print the first 3 lines
cat(head(lines, 3), sep = "\n")
# Print the last 3 lines
cat(tail(lines, 3), sep = "\n")
# clean cache directory
unlink(getOption("rdocdump.cache_path"), recursive = TRUE, force = TRUE)
Resolve the path to a package directory or tarball
Description
This function resolves the path to a package directory or tarball, handling both installed packages and source packages from CRAN.
Usage
resolve_pkg_path(
pkg,
cache_path = NULL,
force_fetch = FALSE,
repos = getOption("rdocdump.repos", getOption("repos"))
)
Arguments
pkg |
A
|
cache_path |
A |
force_fetch |
|
repos |
A |
Value
A list containing:
-
pkg_path
: Path to the package directory or tarball. -
extracted_path
: Path to the extracted package directory (if applicable). -
tar_path
: Path to the tarball if it was downloaded. -
is_installed
: Logical indicating if the package is installed.