Help for package sureLDA

Type:

Package

Title:

A Novel Multi-Disease Automated Phenotyping Method for the EHR

Version:

0.1.0-1

Description:

A statistical learning method to simultaneously predict a range of target phenotypes using codified and natural language processing (NLP)-derived Electronic Health Record (EHR) data. See Ahuja et al (2020) JAMIA <doi:10.1093/jamia/ocaa079> for details.

URL:

https://github.com/celehs/sureLDA

BugReports:

https://github.com/celehs/sureLDA/issues

License:

GPL-3

Encoding:

UTF-8

RoxygenNote:

7.1.1

Depends:

R (≥ 3.0), Matrix

Imports:

pROC, glmnet, MAP, Rcpp, foreach, doParallel

LinkingTo:

Rcpp, RcppArmadillo

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

LazyData:

true

NeedsCompilation:

yes

Packaged:

2020-11-05 08:29:06 UTC; yuriahuja

Author:

Yuri Ahuja [aut, cre], Tianxi Cai [aut], PARSE LTD [aut]

Maintainer:

Yuri Ahuja <Yuri_Ahuja@hms.harvard.edu>

Repository:

CRAN

Date/Publication:

2020-11-10 10:00:02 UTC

sureLDA: A Novel Multi-Disease Automated Phenotyping Method for the Electronic Health Record

Description

Surrogate-guided ensemble Latent Dirichlet Allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on two surrogate features for each target disease, and then leverages these probabilities to guide the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities.

Simulated Dataset

Description

Click HERE to view details.

Usage

simdata

Format

An object of class list of length 6.

Examples

str(simdata)

Surrogate-guided ensemble Latent Dirichlet Allocation

Description

Surrogate-guided ensemble Latent Dirichlet Allocation

Usage

sureLDA(
  X,
  ICD,
  NLP,
  HU,
  filter,
  prior = "PheNorm",
  weight = "beta",
  nEmpty = 20,
  alpha = 100,
  beta = 100,
  burnin = 50,
  ITER = 150,
  phi = NULL,
  nCores = 1,
  labeled = NULL,
  verbose = FALSE
)

Arguments

X

nPatients x nFeatures matrix of EHR feature counts

ICD

nPatients x nPhenotypes matrix of main ICD surrogate counts

NLP

nPatients x nPhenotypes matrix of main NLP surrogate counts

HU

nPatients-dimensional vector containing the healthcare utilization feature

filter

nPatients x nPhenotypes binary matrix indicating filter-positives

prior

'PheNorm', 'MAP', or nPatients x nPhenotypes matrix of prior probabilities (defaults to PheNorm)

weight

'beta', 'uniform', or nPhenotypes x nFeatures matrix of feature weights (defaults to beta)

nEmpty

Number of 'empty' topics to include in LDA step (defaults to 10)

alpha

LDA Dirichlet hyperparameter for patient-topic distribution (defaults to 100)

beta

LDA Dirichlet hyperparameter for topic-feature distribution (defaults to 100)

burnin

number of burnin Gibbs iterations (defaults to 50)

ITER

number of subsequent iterations for inference (defaults to 150)

phi

(optional) nPhenotypes x nFeatures pre-trained topic-feature distribution matrix

nCores

(optional) Number of parallel cores to use only if phi is provided (defaults to 1)

labeled

(optional) nPatients x nPhenotypes matrix of a priori labels (set missing entries to NA)

verbose

(optional) indicating whether to output verbose progress updates

Value

scores nPatients x nPhenotypes matrix of weighted patient-phenotype assignment counts from LDA step

probs nPatients x nPhenotypes matrix of patient-phenotype posterior probabilities

ensemble Mean of sureLDA posterior and PheNorm/MAP prior

prior nPatients x nPhenotypes matrix of PheNorm/MAP phenotype probability estimates

phi nPhenotypes x nFeatures topic distribution matrix from LDA step

weights nPhenotypes x nFeatures matrix of topic-feature weights