Help for package inDAGO

Title:

A GUI for Dual and Bulk RNA-Sequencing Analysis

Version:

1.0.0

Description:

A 'shiny' app that supports both dual and bulk RNA-seq, with the dual RNA-seq functionality offering the flexibility to perform either a sequential approach (where reads are mapped separately to each genome) or a combined approach (where reads are aligned to a single merged genome). The user-friendly interface automates the analysis process, providing step-by-step guidance, making it easy for users to navigate between different analysis steps, and download intermediate results and publication-ready plots.

License:

GPL (≥ 3)

URL:

https://github.com/inDAGOverse/inDAGO

BugReports:

https://github.com/inDAGOverse/inDAGO/issues

Encoding:

UTF-8

RoxygenNote:

7.3.2

Imports:

bigtabulate, BiocGenerics, Biostrings, bsicons, bslib, callr, checkmate, data.table, dplyr, DT, edgeR, fs, ggplot2, ggrepel, grDevices, heatmaply, Hmisc, htmltools, HTSFilter, limma, magrittr, matrixStats, memuse, methods, paletteer, parallel, pheatmap, plotly, R.devices, readr, reshape2, Rfastp, rintrojs, Rsamtools, Rsubread, rtracklayer, S4Vectors, seqinr, shiny, shinycssloaders, shinyFiles, shinyjs, shinyWidgets, ShortRead, spsComps, stats, tibble, tidyr, tools, upsetjs, UpSetR, utils, XVector

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-06-30 09:20:24 UTC; gaeta

Author:

Carmine Fruggiero [aut, cre], Gaetano Aufiero [aut]

Maintainer:

Carmine Fruggiero <fruggierocarmine3@gmail.com>

Repository:

CRAN

Date/Publication:

2025-07-05 15:00:02 UTC

BaseAverageQualityPlot

Description

BaseAverageQualityPlot

Usage

BaseAverageQualityPlot(input_data)

Arguments

input_data

folder containing data

interactive BaseAverageQualityPlot

Description

interactive BaseAverageQualityPlot

Usage

BaseAverageQualityPlotly(input_data)

Arguments

input_data

folder containing data

BaseCompositionAreaChartPlot

Description

BaseCompositionAreaChartPlot

Usage

BaseCompositionAreaChartPlot(input_data)

Arguments

input_data

folder containing data

BaseCompositionLinePlot

Description

BaseCompositionLinePlot

Usage

BaseCompositionLinePlot(input_data)

Arguments

input_data

folder containing data

BaseQualityBoxplotPlot

Description

BaseQualityBoxplotPlot

Usage

BaseQualityBoxplotPlot(input_data)

Arguments

input_data

folder containing data

Bulk alignment function

Description

Bulk alignment function

Usage

BulkAlignment(
  lalista,
  nodes,
  readsPath,
  GenomeIndex,
  outBam,
  threads,
  outFormat,
  phredScore,
  maxExtractedSubreads,
  consensusVote,
  mismatchMax,
  uniqueOnly,
  maxMultiMapped,
  indelLength,
  fragmentMinLength,
  fragmentMaxLength,
  matesOrientation,
  readOrderConserved,
  coordinatesSorting,
  allJunctions,
  tempfolder
)

Arguments

lalista

list of samples

nodes

logic cores

readsPath

sample folders

GenomeIndex

genome index

outBam

output folder

threads

processes

outFormat

BAM or SAM

phredScore

quality score

maxExtractedSubreads

number of subreads

consensusVote

consensus

mismatchMax

mismatch

uniqueOnly

no multimapping

maxMultiMapped

multimapping

indelLength

indel

fragmentMinLength

fragment minumum length

fragmentMaxLength

fragment maximum length

matesOrientation

mate orientation

readOrderConserved

read order

coordinatesSorting

sorting

allJunctions

junctions

tempfolder

temporary folder

Title

Description

Title

Usage

CombinedAlignment(
  lalista,
  nodes,
  readsPath,
  GenomeConcIndex,
  outBam,
  threads,
  outFormat,
  phredScore,
  maxExtractedSubreads,
  consensusVote,
  mismatchMax,
  uniqueOnly,
  maxMultiMapped,
  indelLength,
  fragmentMinLength,
  fragmentMaxLength,
  matesOrientation,
  readOrderConserved,
  coordinatesSorting,
  allJunctions,
  tempfolder,
  readsAlignedBlock
)

Arguments

lalista

list of samples

nodes

logic cores

readsPath

sample folders

GenomeConcIndex

genome index

outBam

output folder

threads

processes

outFormat

BAM or SAM

phredScore

quality score

maxExtractedSubreads

number of subreads

consensusVote

consensus

mismatchMax

mismatch

uniqueOnly

no multimapping

maxMultiMapped

multimapping

indelLength

indel

fragmentMinLength

fragment minumum length

fragmentMaxLength

fragment maximum length

matesOrientation

mate orientation

readOrderConserved

read order

coordinatesSorting

sorting

allJunctions

junctions

tempfolder

temporary folder

readsAlignedBlock

chunks

CorrPlotHeatmap

Description

Plot a correlation heatmap of top variable genes across samples.

Usage

CorrPlotHeatmap(
  x,
  scale,
  Color,
  type,
  display,
  round_number,
  cutree_rows,
  cutree_cols,
  cluster,
  show_names,
  NumGenes
)

Arguments

x

Numeric matrix of log-CPM values (genes × samples), e.g., from "edgeR::cpm()".

scale

Character. Scaling mode for the heatmap: "row", "column", or "none".

Color

Character. Name of a continuous palette from the "paletteer" package.

type

Character. Correlation method passed to "Hmisc::rcorr()": "pearson", "spearman", or "kendall".

display

Character. Which matrix to display: "correlation" (coefficients) or "pvalue".

round_number

Integer. Number of decimal places to round displayed numbers.

cutree_rows

Integer. Number of clusters to cut for row dendrogram.

cutree_cols

Integer. Number of clusters to cut for column dendrogram.

cluster

Character. Clustering mode: one of "both", "row", "column", or "none".

show_names

Character. One of "both", "row", "column", or "none" to display row/column labels.

NumGenes

Integer. Number of top-variance genes to include in the correlation.

Details

This function selects the highest-variance genes from a log-CPM matrix, computes pairwise correlation coefficients (or p-values) with "Hmisc::rcorr()", and renders a heatmap via "pheatmap", with options for clustering, scaling, and number display.

Compute per-gene variance and select the top "NumGenes".
Subset the matrix and compute correlations (and p-values) via "Hmisc::rcorr()".
Choose to display correlation coefficients or p-values, rounded to "round_number".
Determine clustering and label visibility from cluster and "show_names".
Render the heatmap with "pheatmap::pheatmap()", passing in custom distance, color, clustering, and "display" number settings, saving to a temporary file to suppress autosave.

Value

A "pheatmap" object representing the correlation heatmap with clustering.

CorrPlotHeatmaply

Description

Create an interactive correlation heatmap of top variable genes using Heatmaply.

Usage

CorrPlotHeatmaply(x, Color, type, cluster, scale, show_names, NumGenes)

Arguments

x

Numeric matrix of log-CPM values (genes × samples), e.g., from "edgeR::cpm()".

Color

Character. Name of a continuous palette from the "paletteer" package.

type

Character. Correlation method passed to "Hmisc::rcorr()": "pearson", "spearman", or "kendall".

cluster

Character or logical. Clustering option for dendrogram: "both", "row", "column", or "none".

scale

Character. Scaling mode for the heatmap: "row", "column", or "none".

show_names

Character. One of "both", "row", "column", or "none" to display row/column labels.

NumGenes

Integer. Number of top-variance genes to include in the correlation.

Details

This function selects the highest-variance genes from a log-CPM matrix, computes pairwise correlation coefficients (and p-values) with "Hmisc::rcorr()", and renders an interactive correlation heatmap via "heatmaply::heatmaply_cor()", using clustering and scaling options derived from "pheatmap" call.

Compute per-gene variance and select the top "NumGenes".
Subset the matrix and compute correlations (and p-values) via "Hmisc::rcorr()".
Generate a temporary static heatmap with "pheatmap" to extract dendrograms.
Render an interactive heatmap with "heatmaply::heatmaply_cor()", passing in color, clustering, scaling, tick-label visibility, and point size based on -log10(p-value).

Value

A Plotly object (heatmaply) representing the interactive correlation heatmap.

Server function for DEGs module in Shiny application

Description

Server function for DEGs module in Shiny application

Usage

DEGsServerLogic(id)

Arguments

id

Shiny module identifier

UI function for DEGs module in Shiny application

Description

UI function for DEGs module in Shiny application

Usage

DEGsUserInterface(id)

Arguments

id

Shiny module identifier

Server function for EDA module in Shiny application

Description

Server function for EDA module in Shiny application

Usage

EDAServerLogic(id)

Arguments

id

Shiny module identifier

UI function for EDA module in Shiny application

Description

UI function for EDA module in Shiny application

Usage

EDAUserInterface(id)

Arguments

id

Shiny module identifier

EdgerDEG

Description

Perform differential expression analysis on RNA-seq count data using edgeR.

Usage

EdgerDEG(
  gr,
  WD_samples,
  WD_DEGs,
  colIDgene,
  colCounts,
  skip_preN,
  grContrast,
  filter,
  model,
  normMethod,
  min_count,
  min_total_count,
  large_n,
  min_prop,
  adjustPvalue,
  Th_logFC,
  Th_Pvalue
)

Arguments

gr

Data frame. Sample metadata with columns Samples and Groups.

WD_samples

Character. Directory containing raw count .tab files.

WD_DEGs

Character. Directory in which to write results and logs.

colIDgene

Integer. Column index in each count file for gene IDs.

colCounts

Integer. Column index in each count file for raw counts.

skip_preN

Integer. Number of header lines to skip when reading count files.

grContrast

Data frame. Two-column table with Test and Baseline group names for contrasts.

filter

Character. Filtering method: "filterByExpr" or "HTSFilter".

model

Character. Statistical test: "exactTest", "glmQLFTest", or "glmLRT".

normMethod

Character. Normalization method for edgeR (e.g., "TMM", "RLE").

min_count

Numeric. Minimum count per gene for "filterByExpr".

min_total_count

Numeric. Minimum total count per gene for "filterByExpr".

large_n

Integer. Sample size threshold for "filterByExpr".

min_prop

Numeric. Proportion threshold for "filterByExpr".

adjustPvalue

Character. P-value adjustment method (e.g., "fdr", "holm", "none").

Th_logFC

Numeric. Absolute log-fold-change threshold to call differential expression.

Th_Pvalue

Numeric. Adjusted p-value threshold to call differential expression.

Details

This function reads raw count tables, applies expression filtering (via "filterByExpr" or "HTSFilter"), normalizes library sizes, estimates dispersion, fits statistical models ("exactTest", "glmQLFTest", or "glmLRT"), and writes per-contrast results and diagnostic plots.

Reads in per-sample count files and generate a DGEList.
Builds the design matrix and contrast definitions from "grContrast".
Filters lowly expressed genes, normalizes library sizes, and logs filtering summary.
Estimates dispersion (standard or quasi-likelihood).
Runs chosen differential test per contrast, annotates each gene as "UP", "DOWN", or "NO", and writes CSV output files named by filter, model, and contrast.
Captures and saves BCV and QL dispersion plots as SVGs in WD_DEGs.

Value

A list invisibly returned containing any captured plots and log messages; primary results are written to CSV files in "WD_DEGs".

Filtering

Description

Filter paired-end FASTQ files in parallel based on quality and adapter trimming criteria.

Usage

Filtering(
  Nodes,
  X,
  UploadPath,
  DownloadPath,
  qualityType,
  minLen,
  trim,
  trimValue,
  n,
  Adapters,
  Lpattern,
  Rpattern,
  max.Lmismatch,
  max.Rmismatch,
  kW,
  left,
  right,
  halfwidthAnalysis,
  halfwidth,
  compress
)

Arguments

Nodes

Integer. Number of parallel processing nodes (e.g., CPU cores).

X

List of character vectors. Each element is a character vector of paired file names (e.g., c("sample_1.fq", "sample_2.fq")).

UploadPath

Character. Path to directory containing raw FASTQ files.

DownloadPath

Character. Path to directory where filtered files will be saved.

qualityType

Character. Type of quality score encoding, e.g., "Sanger" or "Illumina".

minLen

Integer. Minimum length of reads to retain after filtering.

trim

Logical. Whether to perform quality-based trimming of reads.

trimValue

Integer. Minimum Phred score threshold for trimming.

n

Integer. Number of reads to stream per chunk (default typically set to 1e6).

Adapters

Logical. Whether to remove adapters from reads.

Lpattern

Character. Adapter sequence to remove from the 5' end (left).

Rpattern

Character. Adapter sequence to remove from the 3' end (right).

max.Lmismatch

Integer. Maximum mismatches allowed for the left adapter.

max.Rmismatch

Integer. Maximum mismatches allowed for the right adapter.

kW

Integer. Minimum number of low-quality scores in a window to trigger trimming (sliding window analysis).

left

Logical. Whether to allow trimming from the left end.

right

Logical. Whether to allow trimming from the right end.

halfwidthAnalysis

Logical. Whether to perform sliding window-based trimming.

halfwidth

Integer. Half-width of the sliding window.

compress

Logical. Whether to compress the output FASTQ files.

Details

This function processes raw paired-end FASTQ files to remove low-quality bases, trim adapters, and filter out short reads. It supports quality-based end trimming, sliding window trimming, and adapter removal. The processing is done in parallel across multiple nodes to enhance performance when working with large datasets.

Paired FASTQ files must be named consistently, distinguished by "_1" and "_2" for forward and reverse reads.
This function uses the "ShortRead" and "Biostrings" packages for FASTQ processing and quality filtering.
Filtered files in FASTQ format".
Log files containing read counts before and after filtering are written per sample.

Value

Filtered FASTQ files written to "DownloadPath"; one log file per sample.

Server function for filtering module in Shiny application

Description

Server function for filtering module in Shiny application

Usage

FilteringServerLogic(id)

Arguments

id

Shiny module identifier

UI function for filtering module in Shiny application

Description

UI function for filtering module in Shiny application

Usage

FilteringUserInterface(id)

Arguments

id

Shiny module identifier

GCcontentDistributionPlot

Description

GCcontentDistributionPlot

Usage

GCcontentDistributionPlot(input_data)

Arguments

input_data

samples folder

interactive GCcontentDistributionPlot

Description

interactive GCcontentDistributionPlot

Usage

GCcontentDistributionPlotly(input_data)

Arguments

input_data

samples folder

GetEdgerY

Description

Calculate and return filtered DGEList object and log-CPM matrices using edgeR and optional HTSFilter

Usage

GetEdgerY(
  gr,
  WDpn,
  colIDgene,
  colCounts,
  skip_preN,
  filterMethod,
  min_count,
  min_total_count,
  large_n,
  min_prop,
  normMethod
)

Arguments

gr

Data frame with sample metadata, including sample names and group labels

WDpn

Directory containing count files (*.tab)

colIDgene

Column index of gene IDs in count files

colCounts

Column index of counts in count files

skip_preN

Number of header lines to skip in count files

filterMethod

Either "filterByExpr" or "HTSFilter"

min_count

Minimum count per gene (filterByExpr)

min_total_count

Minimum total count per gene (filterByExpr)

large_n

Number of samples per group to consider as "large" (filterByExpr)

min_prop

Minimum proportion of samples with expression (filterByExpr)

normMethod

Normalization method (e.g., "TMM", "RLE")

Value

A list with total/kept gene counts, filtered DGEList objects, and log-CPM matrices

HeatmapExp

Description

Plot a heatmap of the top variable genes across samples.

Usage

HeatmapExp(
  x,
  ColorPanel,
  scale,
  cutree_rows,
  cutree_cols,
  cluster,
  show_names,
  NumGenes
)

Arguments

x

Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm().

ColorPanel

Character. Name of a continuous palette from the paletteer package.

scale

Character. Scaling mode for heatmap: "row", "column", or "none".

cutree_rows

Integer. Number of clusters for rows (genes).

cutree_cols

Integer. Number of clusters for columns (samples).

cluster

Character. One of "both", "row", "column", or "none" to specify clustering.

show_names

Character. One of "both", "row", "column", or "none" to show row/col names.

NumGenes

Integer. Number of top-variance genes to include in the heatmap.

Details

This function selects the highest-variance genes from a log-CPM matrix, transposes the data, and renders a heatmap with customizable clustering, scaling, and color palettes using pheatmap.

Compute per-gene variance and select the top "NumGenes".
Transpose the subsetted matrix so samples are rows.
Apply the specified color palette (n = 50) via paletteer::paletteer_c().
Determine clustering and name-display options from "cluster" and "show_names".
Render the heatmap with "pheatmap::pheatmap()", saving to a temporary file to suppress autosave.

Value

A "pheatmap" object containing the heatmap and clustering information.

HeatmapExpPlotly

Description

Create an interactive heatmap of top variable genes using Heatmaply.

Usage

HeatmapExpPlotly(x, ColorPanel, scale, cluster, show_names, NumGenes)

Arguments

x

Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm().

ColorPanel

Character. Name of a continuous palette from the paletteer package.

scale

Character. Scaling mode: "row", "column", or "none".

cluster

Character or logical. Clustering option for dendrogram: "both", "row", "column", or "none".

show_names

Character. One of "both", "row", "column", or "none" to display row/column labels.

NumGenes

Integer. Number of top-variance genes to include in the heatmap.

Details

This function selects the highest-variance genes from a log-CPM matrix, transposes the data, and renders an interactive heatmap via "heatmaply", using "pheatmap" call.

Compute per-gene variance and select the top NumGenes.
Transpose the subsetted matrix so samples are rows.
Generate a temporary static heatmap with pheatmap to extract dendrograms.
Render an interactive heatmap with heatmaply::heatmaply().

Value

A Plotly object (heatmaply) representing the interactive heatmap.

Bulk indexing

Description

Bulk indexing

Usage

IndexingBulk(basename, reference, gappedIndex, indexSplit, memory, TH_subread)

Arguments

basename

output basename

reference

reference genome

gappedIndex

gapped structure

indexSplit

split structure

memory

handling memory

TH_subread

threshold memory usage

Indexing bulk server logic

Description

Indexing bulk server logic

Usage

IndexingBulkServerLogic(id)

Arguments

id

Shiny module identifier

Indexing bulk ui

Description

Indexing bulk ui

Usage

IndexingBulkUserInterface(id)

Arguments

id

Shiny module identifier

Combined indexing

Description

Combined indexing

Usage

IndexingComb(
  basename,
  reference,
  gappedIndex,
  indexSplit,
  memory,
  TH_subread,
  gen1,
  gen2,
  outfolder,
  tempfolder = file.path(fs::path_temp(), "TempDirSum_3738"),
  tag1,
  tag2
)

Arguments

basename

output basename

reference

reference genome

gappedIndex

gapped structure

indexSplit

split structure

memory

handling memory

TH_subread

threshold memory usage

gen1

first reference genome

gen2

second reference genome

outfolder

output folder

tempfolder

temporary folder

tag1

first genome label

tag2

second genome label

Indexing combined server logic

Description

Indexing combined server logic

Usage

IndexingCombinedServerLogic(id)

Arguments

id

Shiny module identifier

Indexing combined ui

Description

Indexing combined ui

Usage

IndexingCombinedUserInterface(id)

Arguments

id

Shiny module identifier

Indexing sequential parallel

Description

Indexing sequential parallel

Usage

IndexingSequentialParallel(
  basename,
  reference,
  gappedIndex,
  indexSplit,
  memory,
  TH_subread
)

Arguments

basename

output basename

reference

reference genome

gappedIndex

gapped structure

indexSplit

split structure

memory

handling memory

TH_subread

threshold memory usage

Indexing sequential progressive

Description

Indexing sequential progressive

Usage

IndexingSequentialProgressive(
  outfolder1,
  outfolder2,
  refgen1,
  refgen2,
  gappedIndex,
  indexSplit,
  memory,
  TH_subread
)

Arguments

outfolder1

first output folder

outfolder2

second output folder

refgen1

first reference genome

refgen2

second reference genome

gappedIndex

gapped structure

indexSplit

split structure

memory

handling memory

TH_subread

threshold memory usage

Indexing sequential server logic

Description

Indexing sequential server logic

Usage

IndexingSequentialServerLogic(id)

Arguments

id

Shiny module identifier

Indexing sequential ui

Description

Indexing sequential ui

Usage

IndexingSequentialUserInterface(id)

Arguments

id

Shiny module identifier

QUALITY CONTROL ANALYSIS

Description

QUALITY CONTROL ANALYSIS

Usage

QualityCheckAnalysis(
  directoryInput,
  inputFormat,
  Nodes,
  ReadsNumber,
  directoryOutput,
  tempFolder
)

Arguments

directoryInput

sample directory

inputFormat

raw read format

Nodes

cores

ReadsNumber

chunk

directoryOutput

output folder

tempFolder

temporary folder

Saturation

Description

Generate a saturation curve plot showing gene detection versus sequencing depth.

Usage

Saturation(matrix, method, max_reads, palette)

Arguments

matrix

Numeric matrix or object coercible to matrix (genes × samples), e.g., log-counts or raw counts. Genes are rows; samples are columns.

method

Character. Estimation method: "division" or "sampling".

max_reads

Numeric. Maximum number of reads to include in the rarefaction (default: Inf).

palette

Character. Name of a discrete color palette from the "paletteer " package for curve colors.

Details

This function estimates how many genes are detected at increasing read depths using a rarefaction-based approach ( "estimate_saturation() from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git"), and plots the saturation curves for each sample. It supports two estimation methods: “division” for a fast analytic approximation and “sampling” for more realistic approach.

Internally, "extract_counts() " (from countSubsetNorm) extracts a counts matrix from various input classes (matrix, DGEList, EList, ExpressionSet).
"estimate_saturation() " (from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git) rarefies each library at multiple depths:

“division” divides counts by scale factors;
“sampling” performs repeated random sampling to simulate read down sampling.

The resulting data frame contains one row per sample per depth, with the number of detected genes ( "sat ") and, for sampling, its variance ( "sat.var ").
The function then plots gene saturation curves ( "sat" vs. "depth") colored by sample.

Extract counts matrix from different types of expression objects

Estimate saturation of genes based on rarefaction of reads

Value

A "ggplot " object showing saturation (genes detected) versus sequencing depth for each sample.

SequenceLengthDistributionPlot

Description

SequenceLengthDistributionPlot

Usage

SequenceLengthDistributionPlot(input_data)

Arguments

input_data

result tables folder

interactive SequenceLengthDistributionPlot

Description

interactive SequenceLengthDistributionPlot

Usage

SequenceLengthDistributionPlotly(input_data)

Arguments

input_data

result tables folder

Sequential alignment function

Description

Sequential alignment function

Usage

SequentialAlignment(
  lalista,
  nodes,
  readsPath,
  GenomeFirstIndex,
  GenomeSecondIndex,
  outBam1,
  outBam2,
  threads,
  outFormat,
  phredScore,
  maxExtractedSubreads,
  consensusVote,
  mismatchMax,
  uniqueOnly,
  maxMultiMapped,
  indelLength,
  fragmentMinLength,
  fragmentMaxLength,
  matesOrientation,
  readOrderConserved,
  coordinatesSorting,
  allJunctions,
  tempfolder,
  readsAlignedBlock
)

Arguments

lalista

list of samples

nodes

logic cores

readsPath

sample folders

GenomeFirstIndex

first genome index

GenomeSecondIndex

second genome index

outBam1

first output folder

outBam2

second output folder

threads

processes

outFormat

BAM or SAM

phredScore

quality score

maxExtractedSubreads

number of subreads

consensusVote

consensus

mismatchMax

mismatch

uniqueOnly

no multimapping

maxMultiMapped

multimapping

indelLength

indel

fragmentMinLength

fragment minumum length

fragmentMaxLength

fragment maximum length

matesOrientation

mate orientation

readOrderConserved

read order

coordinatesSorting

sorting

allJunctions

junctions

tempfolder

temporary folder

readsAlignedBlock

chunks

Summarization

Description

Summarizes read counts from multiple BAM/SAM files in parallel using feature annotations.

Usage

Summarization(
  NodesSum,
  Xsum,
  UploadPathSum,
  DownloadPathSum,
  annot.ext,
  isGTFAnnotationFile,
  GTF.featureType,
  GTF.attrType,
  useMetaFeatures,
  allowMultiOverlap,
  minOverlap,
  fracOverlap,
  fracOverlapFeature,
  largestOverlap,
  countMultiMappingReads,
  fraction,
  minMQS,
  primaryOnly,
  ignoreDup,
  strandSpecific,
  requireBothEndsMapped,
  checkFragLength,
  minFragLength,
  maxFragLength,
  countChimericFragments,
  autosort,
  nthreads,
  tmpDir,
  verbose
)

Arguments

NodesSum

Integer. Number of parallel R nodes (e.g., CPU cores) to spawn.

Xsum

Character vector. Filenames of BAM or SAM files to process.

UploadPathSum

Character. Directory containing the raw input files.

DownloadPathSum

Character. Directory into which all output files will be written.

annot.ext

Character. Path to an external annotation file (e.g., GTF/GFF).

isGTFAnnotationFile

Logical. Should annot.ext be treated as a GTF file?

GTF.featureType

Character. Feature type (e.g., "exon").

GTF.attrType

Character. GTF attribute (e.g., "gene_id").

useMetaFeatures

Logical. Collapse sub-features into meta-features before counting.

allowMultiOverlap

Logical. Allow reads overlapping multiple features to be counted.

minOverlap

Integer. Minimum number of overlapping bases to assign a read.

fracOverlap

Numeric. Minimum fraction of read that must overlap a feature.

fracOverlapFeature

Numeric. Minimum fraction of feature that must be covered by a read.

largestOverlap

Logical. When overlapping multiple features, assign based on largest overlap.

countMultiMappingReads

Logical. Count reads that map to multiple locations.

fraction

Logical. Distribute counts fractionally for multi-mapping reads.

minMQS

Integer. Minimum mapping quality score for reads to be counted.

primaryOnly

Logical. Count only the primary alignments of multi-mapping reads.

ignoreDup

Logical. Exclude PCR duplicates from counting.

strandSpecific

Integer. Strand-specific counting mode (0 = unstranded, 1 = stranded, 2 = reversely stranded).

requireBothEndsMapped

Logical. In paired-end mode, require both mates to map.

checkFragLength

Logical. Enforce fragment length checks on paired-end reads.

minFragLength

Numeric. Minimum fragment length to keep.

maxFragLength

Numeric. Maximum fragment length to keep.

countChimericFragments

Logical. Count discordant or chimeric read pairs.

autosort

Logical. Automatically sort input files if not already sorted.

nthreads

Integer. Number of threads per featureCounts call.

tmpDir

Character. Directory for temporary files (e.g., large intermediate files).

verbose

Logical. Print verbose messages during execution.

Details

This function run Rsubread::featureCounts() on each input file, capturing count statistics, annotation data, and per-sample summary logs. Results are written to the specified output directory.

A socket cluster of NodesSum workers is created.
Each worker invokes featureCounts() on one sample, using the annotation and counting parameters.
Outputs per sample:
- A text summary (⁠*_summary.txt⁠) capturing the console output.
- A CSV of count statistics (⁠*_stat.csv⁠).
- A CSV of feature annotations (⁠*_annotation.csv⁠).
- A tab-delimited count matrix saved under ⁠Counts/<sample>.tab⁠.
The cluster is terminated once all samples complete.

Value

Writes files to DownloadPathSum.

Server function for Summarization module in Shiny application

Description

Server function for Summarization module in Shiny application

Usage

SummarizationServerLogic(id)

Arguments

id

Shiny module identifier

UI function for Summarization module in Shiny application

Description

UI function for Summarization module in Shiny application

Usage

SummarizationUserInterface(id)

Arguments

id

Shiny module identifier

UpSetPlot

Description

Generate an UpSet plot of overlapping DEGs across multiple contrasts.

Usage

UpSetPlot(
  WD_samples,
  Th_logFC,
  Th_Pvalue,
  collapseName,
  nintersects,
  st_significance,
  scale
)

Arguments

WD_samples

Character. Directory containing DEG result CSV files.

Th_logFC

Numeric. Absolute log2 fold-change threshold to include a gene.

Th_Pvalue

Numeric. P-value threshold for significance (0 < Th_Pvalue <= 1).

collapseName

Logical. If TRUE, strip method/model prefixes from file names when labeling sets.

nintersects

Integer. Maximum number of intersections to display.

st_significance

Character. Which p-value to use: "adjustPvalue" (FDR or FWER) or "PValue".

scale

Numeric. Text scaling factor for plot labels and annotations.

Details

This function reads DEG CSV files from a directory, filters genes by log-FC and p-value thresholds (adjusted or raw), optionally simplifies file names, and visualizes the intersections of gene sets using an UpSet plot.

Validates thresholds (Th_logFC >= 0, 0 < Th_Pvalue <= 1).
Lists all CSV files in WD_samples and reads each into a data frame.
Checks for duplicate IDs and standardizes to columns ID, logFC, and adjustPvalue or PValue.
Filters each set of results by |logFC| >= Th_logFC and p-value < Th_Pvalue.
Renames each gene-ID column to the (optionally collapsed) file name.
Converts the list of filtered ID sets to an UpSetR input and calls UpSetR::upset().

Value

An UpSet plot.

UpsetjsPlot

Description

Create an interactive UpSet plot of overlapping DEGs using "UpsetJS".

Usage

UpsetjsPlot(
  WD_samples,
  Th_logFC,
  Th_Pvalue,
  collapseName,
  nintersects,
  st_significance
)

Arguments

WD_samples

Character. Directory containing DEG result CSV files.

Th_logFC

Numeric. Absolute log2 fold-change threshold to include a gene.

Th_Pvalue

Numeric. P-value threshold for significance (0 < Th_Pvalue <= 1).

collapseName

Logical. If TRUE, strip method/model prefixes from file names when labeling sets.

nintersects

Integer. Maximum number of intersections to display.

st_significance

Character. Which p-value to use: "adjustPvalue" (FDR or FWER) or "PValue".

Details

Lists all CSV files in "WD_samples" and reads each into a data frame.
Checks for duplicate IDs and selects "ID", "logFC", and either "adjustPvalue" or "PValue".
Filters each set by "|logFC| >= Th_logFC" and p-value < "Th_Pvalue".
Renames each gene-ID list to the (optionally collapsed) file name.
Feeds the list of gene sets into "upsetjs::upsetjs()"

Value

An interactive "UpsetJS" object.

Server function for workflow module in Shiny application

Description

Server function for workflow module in Shiny application

Usage

WorkflowServerLogic(id)

Arguments

id

Shiny module identifier

UI function for workflow module in Shiny application

Description

UI function for workflow module in Shiny application

Usage

WorkflowUserInterface(id)

Arguments

id

Shiny module identifier

barplotExp

Description

Create a barplot of library sizes per sample, optionally using effective library sizes.

Usage

barplotExp(x, palette, main, selectOrder, effecLibSize)

Arguments

x

A DGEList object from "edgeR".

palette

Character. Name of a discrete color palette from the "paletteer" package.

main

Character. Title for the barplot.

selectOrder

Character. Either "Groups" (order samples by group) or "Samples" (order by sample name).

effecLibSize

Logical. If TRUE, use effective library size (norm factors × raw size); otherwise use raw size.

Details

This function extracts library size information from an "edgeR" "DGEList", computes effective library sizes if requested, orders samples by group or name, and plots library sizes (in millions) colored by group.

Extracts or computes (effecLibSize = TRUE) the library size for each sample.
Orders samples by group or sample name per selectOrder.
Plots bar heights as library size (×10^6) with white fill and colored borders.

Value

A "ggplot" object showing per-sample barplots of library size in millions.

boxplotExp

Description

Generate a boxplot of log-CPM expression values per sample, colored by group.

Usage

boxplotExp(x, y, palette, main, selectOrder)

Arguments

x

A DGEList object from "edgeR".

y

Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm().

palette

Character. Name of a discrete palette from the paletteer package.

main

Character. Title for the boxplot.

selectOrder

Character. Either "Groups" (order samples by group) or "Samples" (order by sample name).

Details

This function orders samples by group or sample name, and produces a ggplot2 boxplot with a horizontal line at the overall median.

Extract sample metadata (Samples, Groups) from "x$samples".
Order columns of y by group or sample name per "selectOrder".
Melt the ordered matrix to long format and join with metadata.
Plot boxplots with no outliers, colored by group, and include a dashed line at the overall median.

Value

A ggplot object showing per-sample boxplots of log-CPM values.

checkMetadata

Description

Validate and extract non-empty annotation fields from a GTF file.

Usage

checkMetadata(gtfPath, typeFilter)

Arguments

gtfPath

Character. Path to the directory or file location of the GTF file.

typeFilter

Character. The feature type to filter on (e.g., "gene", "exon").

Details

This function imports a GTF file, filters entries by a specified feature type, and identifies metadata columns that contain at least one non-missing value.

Imports the GTF into a data frame via "rtracklayer::import()".
Filters rows by "type" == typeFilter.
Tests each column for all-NA or empty-string entries.
Returns names of columns with at least one non-missing, non-empty value.

Value

Character vector of column names in the GTF annotation that are not entirely NA or empty.

COUNTING SEQUENCES

Description

COUNTING SEQUENCES

Usage

counting_Reads(input_data)

Arguments

input_data

sample folder

getDegMerged

Description

Merge multiple DEG result CSVs with GTF annotations into a single data frame.

Usage

getDegMerged(path, gtfPath, columns, collapseName, typeFilter, selectUpDown)

Arguments

path

Character. Directory containing DEG result CSV files.

gtfPath

Character. Path to the GTF annotation file.

columns

Character vector. Names of annotation columns to include from the GTF.

collapseName

Logical. If TRUE, strip method/model prefixes from file names when prefixing columns.

typeFilter

Character. GTF feature type to filter (e.g., "gene" or "transcript").

selectUpDown

Logical. If TRUE, only include IDs with "diffExp" == UP or DOWN.

Details

This function reads all CSV files in a directory, validates presence of required columns ("ID", and optionally "diffExp"), filters for up/down regulated genes if requested, extracts annotation fields from a GTF, and returns a merged table of selected annotation columns alongside all DEG metrics (with optional file-based column prefixes).

Value

A combined data frame

inDAGO

Description

A Shiny app for dual and bulk RNA‑sequencing analysis.

Usage

inDAGO()

Details

This function allows to launch inDAGO Shiny interface.

Value

No return value, called for side effects

Mapping bulk server logic

Description

Mapping bulk server logic

Usage

mappingBulkServerLogic(id)

Arguments

id

Shiny module identifier

Mapping bulk ui

Description

Mapping bulk ui

Usage

mappingBulkUserInterface(id)

Arguments

id

Shiny module identifier

Mapping combined server logic

Description

Mapping combined server logic

Usage

mappingCombinedServerLogic(id)

Arguments

id

Shiny module identifier

Mapping combined ui

Description

Mapping combined ui

Usage

mappingCombinedUserInterface(id)

Arguments

id

Shiny module identifier

Mapping sequential server logic

Description

Mapping sequential server logic

Usage

mappingSequentialServerLogic(id)

Arguments

id

Shiny module identifier

Mapping sequential ui

Description

Mapping sequential ui

Usage

mappingSequentialUserInterface(id)

Arguments

id

Shiny module identifier

mdsPlot

Description

Generate a multidimensional scaling (MDS) plot based on expression data.

Usage

mdsPlot(
  x,
  Sample,
  Group,
  title,
  palette,
  maxOverlaps,
  sizeLabel,
  top,
  gene.selection
)

Arguments

x

DGEList object from edgeR.

Sample

A character vector of sample labels (one per column in "x ").

Group

A factor or character vector specifying the group/class of each sample.

title

Plot title as a character string.

palette

Name of a palette from the "paletteer " package for coloring groups.

maxOverlaps

Maximum number of overlapping labels allowed by "geom_text_repel ".

sizeLabel

Numeric value for label font size.

top

Integer. Number of top most variable genes to include in MDS.

gene.selection

Method for gene selection: one of "pairwise", "common", or "logFC".

Details

This function performs MDS analysis using limma's "plotMDS() " and visualizes the sample relationships in two dimensions using "ggplot2 " and "ggrepel ".

Value

A "ggplot " object representing the MDS plot.

mdsPlottly

Description

Generate an interactive MDS plot using Plotly based on expression data.

Usage

mdsPlottly(x, Sample, Group, title, palette, top, gene.selection)

Arguments

x

A DGEList object from edgeR.

Sample

Character vector. Sample names corresponding to columns of "x ".

Group

Factor or character vector. Group or condition for each sample.

title

Character. Title for the plot.

palette

Character. Name of a discrete palette from the "paletteer " package.

top

Integer. Number of top most variable genes (by logFC) to include in MDS.

gene.selection

Character. Gene selection method: one of ""pairwise" ", ""common" ", or ""logFC" ".

Details

This function computes multidimensional scaling (MDS) coordinates with limma's "plotMDS() " and then renders an interactive scatterplot via "plotly::ggplotly() ".

Compute MDS on the input data with "limma::plotMDS() ".
Extract eigenvalues and first two dimensions for variance annotation.
Build a ggplot2 scatterplot with axis labels showing percent variance explained.
Convert the ggplot to an interactive Plotly graph.

Value

A Plotly object ( "plotly::ggplotly ") representing the interactive MDS scatterplot.

mdsinfo

Description

Compute MDS coordinates for expression data using limma's plotMDS.

Usage

mdsinfo(matrix, top, gene.selection)

Arguments

matrix

A DGEList object.

top

Integer. Number of top most variable genes to include in MDS.

gene.selection

Method for gene selection: one of "pairwise", "common", or "logFC".

Details

This function performs multidimensional scaling (MDS) on a DGEList or log-expression matrix using limma's "plotMDS() " function. It returns the MDS object containing coordinates and eigenvalues without generating a plot.

Value

A list object from "plotMDS() " containing MDS coordinates and eigenvalues.

pcaPlot

Description

Create a PCA scatter plot from log-expression data with sample labels.

Usage

pcaPlot(
  logcounts,
  Sample,
  Group,
  title,
  palette,
  maxOverlaps,
  sizeLabel,
  center,
  scale
)

Arguments

logcounts

Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm.

Sample

Character vector of sample names corresponding to the columns of "logcounts".

Group

Factor or character vector denoting group/condition for each sample.

title

Character. Title for the PCA plot.

palette

Character. Name of a discrete color palette from the "paletteer" package.

maxOverlaps

Integer. Maximum number of overlapping labels allowed by "ggrepel".

sizeLabel

Numeric. Font size for sample labels.

center

Logical. If TRUE, center variables before PCA.

scale

Logical. If TRUE, scale variables to unit variance before PCA.

Details

This function performs Principal Component Analysis (PCA) on a log-count matrix and visualizes the first two principal components using ggplot2 and ggrepel. Each point represents a sample, colored by group, with hover labels.

Transposes the "logcounts" matrix so samples are rows.
Runs PCA via "stats::prcomp()" with centering and scaling options.
Calculates percent variance explained by PC1 and PC2.
Builds a scatter plot with black‐bordered points and non‐overlapping labels.

Value

A "ggplot" object displaying the PCA scatter plot of PC1 vs PC2.

pcaPlottly

Description

Create an interactive PCA scatter plot using Plotly from log-expression data.

Usage

pcaPlottly(logcounts, Sample, Group, title, palette, center, scale)

Arguments

logcounts

Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm.

Sample

Character vector of sample names corresponding to the columns of "logcounts ".

Group

Factor or character vector denoting group/condition for each sample.

title

Character. Title for the PCA plot.

palette

Character. Name of a discrete color palette from the "paletteer" package.

center

Logical. If TRUE, center variables (genes) before PCA.

scale

Logical. If TRUE, scale variables to unit variance before PCA.

Details

This function performs Principal Component Analysis (PCA) on a log-count matrix and generates an interactive plot of the first two principal components via "plotly::ggplotly()".

Transposes the "logcounts " matrix so samples are rows.
Runs PCA with "stats::prcomp() ", using centering and scaling as specified.
Computes percent variance explained by PC1 and PC2.
Builds a ggplot2 scatterplot and converts it to an interactive Plotly graph.

Value

A Plotly object ( "plotly::ggplotly ") representing the interactive PCA scatterplot.

pcainfo

Description

Perform Principal Component Analysis (PCA) on log-expression data.

Usage

pcainfo(logcounts, center, scale)

Arguments

logcounts

Numeric matrix. Log-CPM values (genes × samples), e.g., from edgeR::cpm..

center

Logical. If TRUE, center variables by subtracting the mean (default: TRUE).

scale

Logical. If TRUE, scale variables to unit variance (default: FALSE).

Details

This function transposes a log-count matrix (samples as columns, genes as rows) and runs PCA using "stats::prcomp() ", with options to center and scale variables.

Value

An object of class "prcomp " containing the PCA results, including loadings, scores, and explained variance.

Quality control server logic

Description

Quality control server logic

Usage

qualityControlServerLogic(id)

Arguments

id

Shiny module identifier

Quality control ui

Description

Quality control ui

Usage

qualityControlUserInterface(id)

Arguments

id

Shiny module identifier

volcanoPlot

Description

Create a volcano plot of differential expression results.

Usage

volcanoPlot(
  x,
  palettePoint,
  maxOverlaps,
  sizeLabel,
  Th_logFC,
  Th_Pvalue,
  subsetGenes,
  st_significance
)

Arguments

x

Character. File path to a CSV containing DEG results, with at least columns "ID", "logFC", and one of "PValue", "FDR", or "FWER".

palettePoint

Character. Name of a discrete palette from the "paletteer" package, supplying colors for "UP", "DOWN", and "NO".

maxOverlaps

Integer. Maximum allowed label overlaps passed to "ggrepel::geom_text_repel()".

sizeLabel

Numeric. Font size for gene labels in the plot.

Th_logFC

Numeric. Absolute log2 fold-change threshold to call a gene "UP" or "DOWN".

Th_Pvalue

Numeric. P-value threshold to call significance (uses "FDR"/"FWER" if "st_significance = "adjustPvalue"", otherwise raw "PValue").

subsetGenes

Integer or "Inf". If numeric, only the top "subsetGenes" genes by p-value are shown and labeled.

st_significance

Character. Which p-value column to use: "adjustPvalue" (FDR or FWER) or "PValue".

Details

This function reads a CSV of DEGs, classifies genes as up/down/no change based on log-fold change and p-value thresholds, and plots –log10(p-value) versus log-FC using ggplot2.

Reads the input CSV and checks for duplicate IDs.
Standardizes columns to "ID", "logFC", and "adjustPvalue" or "PValue".
Optionally subsets to the top N genes by p-value.
Classifies each gene as "UP", "DOWN", or "NO" based on thresholds.
Plots points with manual fill, size, and alpha scales, adds threshold lines, and repels labels using "ggrepel".

Value

A "ggplot" object displaying the volcano plot.

volcanoPlottly

Description

Create an interactive volcano plot of differential expression results using "Plotly".

Usage

volcanoPlottly(
  x,
  palettePoint,
  Th_logFC,
  Th_Pvalue,
  subsetGenes,
  st_significance
)

Arguments

x

Character. File path to a CSV containing DEG results, with at least columns "ID", "logFC", and one of "PValue", "FDR", or "FWER".

palettePoint

Character. Name of a discrete palette from the "paletteer" package, supplying colors for "UP", "DOWN", and "NO".

Th_logFC

Numeric. Absolute log2 fold-change threshold to call a gene "UP" or "DOWN".

Th_Pvalue

Numeric. P-value threshold to call significance (uses "FDR"/"FWER" if "st_significance = "adjustPvalue"", otherwise raw "PValue").

subsetGenes

Integer or "Inf". If numeric, only the top "subsetGenes" genes by p-value are included in the plot.

st_significance

Character. Which p-value column to use: "adjustPvalue" (FDR or FWER) or "PValue".

Details

This function reads a CSV of DEGs, classifies genes as up/down/no change based on log-fold change and p-value thresholds, and renders an interactive volcano plot via "plotly::ggplotly()".

Reads the input CSV and checks for duplicate IDs.
Standardizes columns to "ID", "logFC", and "adjustPvalue" or "PValue".
Optionally subsets to the top N genes by p-value.
Classifies each gene as "UP", "DOWN", or "NO" based on thresholds.
Plots points with manual fill, size, and alpha scales, adds threshold lines, and converts to an interactive Plotly graph.

Value

A Plotly object ("plotly::ggplotly") representing the interactive volcano plot.