Title: | Assessing Predisposition Between Phenotypes using Polygenic Scores |
Version: | 1.0.0 |
Description: | Using polygenic scores (PGS, or PRS/GRS for binary outcomes), this package allows to investigate shared predisposition between different conditions, and do fast association analysis, export plots and views of the PGS distribution using 'ggplot2' object. |
Depends: | R (≥ 3.5.0) |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Imports: | ggplot2, stats, utils, MASS, nnet, parallel, ivreg |
LazyData: | true |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-07-15 14:26:52 UTC; vincentp |
Author: | Vincent Pascat |
Maintainer: | Vincent Pascat <vincent.pascat@univ-lille.fr> |
Repository: | CRAN |
Date/Publication: | 2025-07-15 14:40:02 UTC |
Association of a PGS distribution with a Phenotype
Description
assoc()
takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a data frame showing the association of PGS on the Phenotype
Usage
assoc(
df = NULL,
prs_col = "SCORESUM",
phenotype_col = "Phenotype",
scale = TRUE,
covar_col = NA,
verbose = TRUE,
log = ""
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Phenotype column name |
scale |
a boolean specifying if scaling of PGS should be done before testing |
covar_col |
a character vector specifying the covariate column names (facultative) |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
Value
return a data frame showing the association of the PGS on the Phenotype with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either
'Continuous'
,'Ordered Categorical'
,'Categorical'
or'Cases/Controls'
Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either
'Linear regression'
,'Binary logistic regression'
,'Ordinal logistic regression'
or'Multinomial logistic regression'
Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression
SE: standard error of the Beta coefficient (if Phenotype_type is Continuous)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value
Examples
results <- assoc(
df = comorbidData,
prs_col = "ldl_PGS",
phenotype_col = "log_ldl",
scale = TRUE,
covar_col = c("age", "sex", "gen_array")
)
print(results)
Multiple PGS Associations Plot
Description
assocplot()
takes a data frame of associations. Returns plot of the associations
from assoc()
(ggplot2 object or list of ggplot object)
Usage
assocplot(score_table = NULL, axis = "vertical", pval = FALSE)
Arguments
score_table |
a dataframe with association results with at least the following columns:
|
axis |
a character, |
pval |
a parameter specifying information on how to display P-value
|
Value
return either:
a ggplot object representing the association results.
a list of two ggplot objects, accessible by $continuous_phenotype and $discrete_phenotype, if there are both Continuous Phenotypes and Discrete Phenotypes (i.e. "Categorical" or "Cases/Controls")
Centiles Plot from a PGS Association
Description
centileplot()
takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot (ggplot2 object) with centiles (or deciles if not enough individuals)
of PGS in x and Prevalence/Median/Mean of the Phenotype in y
Usage
centileplot(
df = NULL,
prs_col = "SCORESUM",
phenotype_col = "Phenotype",
decile = FALSE,
continuous_metric = NA
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Phenotype column name |
decile |
a boolean specifying if centiles or deciles should be used |
continuous_metric |
a facultative character specifying what metric to
use for continuous Phenotype, only three options: |
Value
return a figure of results in the format ggplot2 object
Mock dataset for comorbidPGS package
Description
A dataset with sets of PGSs, Phenotypes and Covariates to demo the comorbidPGS package
Usage
comorbidData
Format
who
A data frame with 10,000 rows (individuals) and 16 columns:
- ID
Individual's identifier, characters
- sex
Sex of the individuals, binary numeric values
- age
Age of the individuals, numeric value
- gen_array
The genotypic array used for those individuals, factor values
- ethnicity
The ethnicity of individuals, can be also used as Categorical Phenotype, factor values
- brc_PGS, t2d_PGS, ldl_PGS
Three distributions of PGS for Breast Cancer, Type 2 Diabetes and Hypertension respectively; numeric values
- brc, t2d, hypertension
Three Cases/Controls Phenotypes, representing Breast Cancer, Type 2 Diabetes and Hypertension respectively; binary values
- ldl, bmi, sbp
Three Continuous Phenotypes, representing low-density lipoprotein, body-mass index, and systolic blood pressure respectively; numeric values
- log_ldl
A continuous Phenotype, based on log(ldl) to have a normal distribution; numeric values
- sbp_cat
An Ordered Categorical Phenotype, with 3 possible outcomes: low, normal or high systolic blood pressure; factor values
Source
https://github.com/VP-biostat/comorbidPGS
Deciles BoxPlot from a PGS Association with a Continuous Phenotype
Description
decileboxplot()
takes a distribution of PGS, a Continuous Phenotype.
Returns a plot with deciles of PGS in x and Boxplot of the Phenotype in y
Usage
decileboxplot(df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype")
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Continuous Phenotype column name |
Value
return a ggplot object (ggplot2)
Density Plot from a PGS Association
Description
densityplot()
takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot with density of PGS in x by Categories of the Phenotype
Usage
densityplot(
df = NULL,
prs_col = "SCORESUM",
phenotype_col = "Phenotype",
scale = TRUE,
threshold = NA
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Phenotype column name |
scale |
a boolean specifying if scaling of PGS should be done before plotting |
threshold |
a facultative numeric specifying for Continuous Phenotype the Threshold to consider individuals as Cases/Controls as following:
|
Value
return a ggplot object (ggplot2)
Mendelian Randomization Two-Stage Least Square (2SLS) method with external PGS
Description
mr_2sls()
takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame of the result of the Mendelian Randomization 2SLS methods using PGS
Usage
mr_2sls(
df = NULL,
prs_col = "SCORESUM",
exposure_col = NA,
outcome_col = NA,
scale = TRUE,
verbose = TRUE,
log = ""
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
exposure_col |
a character specifying the Exposure (Phenotype) column name |
outcome_col |
a character specifying the Outcome (Phenotype) column name |
scale |
a boolean specifying if scaling of PGS should be done before testing |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
Value
return a data frame with the Mendelian Randomization association result using 2SLS method with the following columns:
PGS: the name of the PGS used
Exposure: the name of Phenotype used as Exposure
Outcome: the name of Phenotype used as Outcome
Method: the MR method used (here 2SLS)
N_cases: if Phenotype_type is Cases/Controls, the number of cases
N_controls: if Phenotype_type is Cases/Controls, the number of controls
N: the number of individuals/samples
MR_estimate: the MR estimate (beta) using the ratio method
SE: the associated standard error (second order)
F_stat: the F-statistic of the Exposure ~ PGS association
Examples
result <- mr_2sls(
df = comorbidData,
prs_col = "ldl_PGS",
exposure_col = "log_ldl",
outcome_col = "bmi",
scale = TRUE
)
print(result)
Mendelian Randomization ratio method with external PGS
Description
mr_ratio()
takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame showing the Mendelian Randomization ratio methods using PGS
Usage
mr_ratio(
df = NULL,
prs_col = "SCORESUM",
exposure_col = NA,
outcome_col = NA,
scale = TRUE,
verbose = TRUE,
log = ""
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
exposure_col |
a character specifying the Exposure (Phenotype) column name |
outcome_col |
a character specifying the Outcome (Phenotype) column name |
scale |
a boolean specifying if scaling of PGS should be done before testing |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
Value
return a data frame with the Mendelian Randomization association result using the ratio method with the following columns:
PGS: the name of the PGS used
Exposure: the name of Phenotype used as Exposure
Outcome: the name of Phenotype used as Outcome
Method: the MR method used (here Ratio)
N_cases: if Phenotype_type is Cases/Controls, the number of cases
N_controls: if Phenotype_type is Cases/Controls, the number of controls
N: the number of individuals/samples
MR_estimate: the MR estimate (beta) using the ratio method
SE: the associated standard error (second order)
F_stat: the F-statistic of the Exposure ~ PGS association
Examples
result <- mr_ratio(
df = comorbidData,
prs_col = "ldl_PGS",
exposure_col = "log_ldl",
outcome_col = "bmi",
scale = TRUE
)
print(result)
Multiple PGS Associations from a Data Frame
Description
multiassoc()
takes a data frame with distribution(s) of PGS and Phenotype(s),
and a table of associations to make from this data frame.
Returns a data frame showing the association results
Usage
multiassoc(
df = NULL,
assoc_table = NULL,
scale = TRUE,
covar_col = NA,
verbose = TRUE,
log = "",
parallel = FALSE,
num_cores = NA
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
assoc_table |
a dataframe or matrix specifying the associations to make from df, with 2 columns: PGS and Phenotype (in this order) |
scale |
a boolean specifying if scaling of PGS should be done before testing |
covar_col |
a character vector specifying the covariate column names (facultative) |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. If parallel = TRUE, the log will be incomplete |
parallel |
a boolean, if TRUE, |
num_cores |
an integer, if parallel = TRUE (default), |
Value
return a data frame showing the association of the PGS(s) on the Phenotype(s) with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either
'Continuous'
,'Ordered Categorical'
,'Categorical'
or'Cases/Controls'
Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either
'Linear regression'
,'Binary logistic regression'
,'Ordinal logistic regression'
or'Multinomial logistic regression'
Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression, OR of logistic regression otherwise
SE: standard error of the related Effect (Beta or OR)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value
Examples
assoc_table <- expand.grid(
c("t2d_PGS", "ldl_PGS"),
c("ethnicity","brc","t2d","log_ldl","sbp_cat")
)
results <- multiassoc(
df = comorbidData,
assoc_table = assoc_table,
covar_col = c("age", "sex", "gen_array"),
parallel = FALSE,
verbose = FALSE
)
print(results)
Multiple PGS Associations from different Phenotypes
Description
multiphenassoc()
takes a distribution of PGS and multiple Phenotypes and eventual confounders.
Returns a data frame showing the association results
Usage
multiphenassoc(
df = NULL,
prs_col = "SCORESUM",
phenotype_col = "Phenotype",
scale = TRUE,
covar_col = NA,
verbose = TRUE,
log = ""
)
Arguments
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character vector specifying the Phenotype column names |
scale |
a boolean specifying if scaling of PGS should be done before testing |
covar_col |
a character vector specifying the covariate column names (facultative) |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
Value
return a data frame showing the association of the PGS on the Phenotypes with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either
'Continuous'
,'Ordered Categorical'
,'Categorical'
or'Cases/Controls'
Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either
'Linear regression'
,'Binary logistic regression'
,'Ordinal logistic regression'
or'Multinomial logistic regression'
Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression
SE: standard error of the Beta coefficient (if Phenotype_type is Continuous)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value