Help for package crso

Title:

Cancer Rule Set Optimization ('crso')

Version:

0.1.1

Author:

Michael Klein <michael.klein@yale.edu>

Maintainer:

Michael Klein <michael.klein@yale.edu>

Description:

An algorithm for identifying candidate driver combinations in cancer. CRSO is based on a theoretical model of cancer in which a cancer rule is defined to be a collection of two or more events (i.e., alterations) that are minimally sufficient to cause cancer. A cancer rule set is a set of cancer rules that collectively are assumed to account for all of ways to cause cancer in the population. In CRSO every event is designated explicitly as a passenger or driver within each patient. Each event is associated with a patient-specific, event-specific passenger penalty, reflecting how unlikely the event would have happened by chance, i.e., as a passenger. CRSO evaluates each rule set by assigning all samples to a rule in the rule set, or to the null rule, and then calculating the total statistical penalty from all unassigned event. CRSO uses a three phase procedure find the best rule set of fixed size K for a range of Ks. A core rule set is then identified from among the best rule sets of size K as the rule set that best balances rule set size and statistical penalty. Users should consult the 'crso' vignette for an example walk through of a full CRSO run. The full description, of the CRSO algorithm is presented in: Klein MI, Cannataro V, Townsend J, Stern DF and Zhao H. "Identifying combinations of cancer driver in individual patients." BioRxiv 674234 [Preprint]. June 19, 2019. <doi:10.1101/674234>. Please cite this article if you use 'crso'.

Depends:

R (≥ 3.5.0), foreach

Imports:

stats, utils

License:

GPL-2

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

6.1.1

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2019-07-04 20:36:20 UTC; michaelklein

Repository:

CRAN

Date/Publication:

2019-07-07 17:00:03 UTC

Make full rule library of all rules that satisfy minimum coverage threshold.

Description

Make full rule library of all rules that satisfy minimum coverage threshold.

Usage

buildRuleLibrary(D, rule.thresh, min.epr)

Arguments

D

Binary matrix of N events and M samples

rule.thresh

Minimum fraction of rules covered. Default is .03

min.epr

minimum events per rule. Default is 2.

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # build rule library
dim(rm.full) # Should be matrix with dimension 60 x 71

Evaluate list of rule set matrices

Description

Evaluate list of rule set matrices

Usage

evaluateListOfIMs(D, Q, rm, im.list)

Arguments

D

binary matrix of events by samples

Q

penalty matrix of events by samples

rm

matrix of rules ordered by phase one

im.list

list of rule set matrices

Value

list of Js for each rule set matrix

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
p2.im.list <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),max.stored=100,
              shouldPrint = TRUE)
p2.performance.list <- evaluateListOfIMs(D,Q,rm.full,p2.im.list)

Get list of best rule sets of size K for all K

Description

Get list of best rule sets of size K for all K

Usage

getBestRsList(rm, tpl, til)

Arguments

rm

binary rule matrix

tpl

list of top performances

til

list of top rule set index matrices

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),
          max.stored=100,shouldPrint = FALSE)
tpl.p2 <- evaluateListOfIMs(D,Q,rm.full,til.p2)
best.rs.list <- getBestRsList(rm = rm.full,tpl = tpl.p2,til = til.p2)

Determine core K from phase 3 tpl and til

Description

Determine core K from phase 3 tpl and til

Usage

getCoreK(D, rm, tpl, til, cov.thresh, perf.thresh)

Arguments

D

input matrix D

rm

binary rule matrix

tpl

list of top performances

til

list of top rule set index matrices

cov.thresh

core coverage threshold, defaults is 95

perf.thresh

core performance threshold, default is 90

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),
          max.stored=100,shouldPrint = FALSE)
tpl.p2 <- evaluateListOfIMs(D,Q,rm.full,til.p2)
core.K <- getCoreK(D,rm.full,tpl.p2,til.p2)
# core.K should be 3 almost always for this example, can run a few time to confirm

Get core rules from phase 3 tpl and til

Description

Get core rules from phase 3 tpl and til

Usage

getCoreRS(D, rm, tpl, til, cov.thresh, perf.thresh)

Arguments

D

input matrix D

rm

binary rule matrix

tpl

list of top performances

til

list of top rule set index matrices

cov.thresh

core coverage threshold, defaults is 95

perf.thresh

core performance threshold, default is 90

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),
          max.stored=100,shouldPrint = FALSE)
tpl.p2 <- evaluateListOfIMs(D,Q,rm.full,til.p2)
core.rs <- getCoreRS(D,rm.full,tpl.p2,til.p2) # core.rs should be r1, r2, r3

Get Generalized Core Duos

Description

Get Generalized Core Duos

Usage

getGCDs(list.subset.cores)

Arguments

list.subset.cores

list of subset cores

Examples

list.subset.cores <- list(c("A.B.C","D.E","A.D"),c("A.C","B.C.D","D.E"),
c("A.B.C","D.E"),c("A.B.C","D.E","B.C.D"))
getGCDs(list.subset.cores) # Confidence column should be 100, 100, 100, 75, 50, 25, 25

Get Generalized Core Events

Description

Get Generalized Core Events

Usage

getGCEs(list.subset.cores)

Arguments

list.subset.cores

list of subset cores

Examples

list.subset.cores <- list(c("A.B.C","D.E","A.D"),
c("A.C","B.C.D","D.E"),c("A.B.C","D.E"),c("A.B.C","D.E","B.C.D"))
getGCEs(list.subset.cores) # Confidence column should be 100, 100, 100, 100, 100

Get Generalized Core Rules

Description

Get Generalized Core Rules

Usage

getGCRs(list.subset.cores)

Arguments

list.subset.cores

list of subset cores

Examples

list.subset.cores <- list(c("A.B.C","D.E","A.D"),c("A.C","B.C.D","D.E"),
c("A.B.C","D.E"),c("A.B.C","D.E","B.C.D"))
getGCRs(list.subset.cores) # Confidence column should be 100, 75, 50, 25, 25

Get pool sizes for phase 2

Description

Get pool sizes for phase 2

Usage

getPoolSizes(rm.ordered, k.max, max.nrs.ee, max.compute)

Arguments

rm.ordered

binary rule matrix ordered from phase 1

k.max

maximum rule set size

max.nrs.ee

max number of rule sets per k

max.compute

maximum raw rule sets considered per k

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
rm.ordered <- rm.full # Skip phase one in this example
getPoolSizes(rm.ordered,k.max = 7,max.nrs.ee = 10000)
# [1] 60  60  40  23  18  16  15

Represent binary rule matrix as strings

Description

Represent binary rule matrix as strings

Usage

getRulesAsStrings(rm)

Arguments

rm

binary rule matrix

Value

vector or rules represented as strings

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
rm.full <- buildRuleLibrary(D,rule.thresh = 0.1) # Small rule library matrix, dimension: 5 x 71
getRulesAsStrings(rm.full)
# output should be: "BRAF-M.CDKN2A-MD"   "CDKN2A-MD.NRAS-M"
# "BRAF-M.PTEN-MD"    "ADAM18-M.BRAF-M" "ADAM18-M.CDKN2A-MD"

Make filtered im list from phase 3 im list

Description

Make filtered im list from phase 3 im list

Usage

makeFilteredImList(D, Q, rm, til, filter.thresh)

Arguments

D

binary matrix of events by samples

Q

penalty matrix of events by samples

rm

matrix of rules ordered by phase one

til

im list from phase 3

filter.thresh

minimum percentage of samples assigned to each rule in rs

Value

filtered top im list

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,
          pool.sizes=c(60,20,20),max.stored=100,shouldPrint = FALSE)
filtered.im.list <- makeFilteredImList(D,Q,rm.full,til.p2,filter.thresh = 0.05)

Order rules according to phase one importance ranking

Description

Order rules according to phase one importance ranking

Usage

makePhaseOneOrderedRM(D, rm.start, spr, Q, trn, n.splits, shouldPrint)

Arguments

D

Binary matrix of N events and M samples

rm.start

Starting binary rule matrix (i.e., rule library)

spr

Random rule sets per rule in each phase one iteration. Default is 40.

Q

Penalty matrix, negative log of passenger probability matrix.

trn

Target rule number for stopping iterating. Default is 16.

n.splits

number of splits for parallelization. Default is all available cpus.

shouldPrint

Print progress updates? Default is TRUE

Value

binary rule matrix ordered by phase one importance ranking

Examples

data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.06) # Rule library matrix, dimension: 36s x 71
rm.ordered <- makePhaseOneOrderedRM(D,rm.full,spr = 1,Q,trn = 34,shouldPrint = TRUE)
# note, for real applications, spr should be at least 40.

Make phase 3 im list from phase 2 im list

Description

Make phase 3 im list from phase 2 im list

Usage

makePhaseThreeImList(D, Q, rm.ordered, til.ee, pool.sizes, max.stored,
  max.nrs.borrow, shouldPrint)

Arguments

D

binary matrix of events by samples

Q

penalty matrix of events by samples

rm.ordered

matrix of rules ordered by phase one

til.ee

list of rule set matrices (im list) from phase two

pool.sizes

pool sizes for phase two

max.stored

max number of rule sets saved

max.nrs.borrow

max number of new rule sets per k, default is 10^5

shouldPrint

Print progress updates? Default is TRUE

Value

phase 3 top im list

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,10,10),
          max.stored=100,shouldPrint = FALSE)
til.p3 <- makePhaseThreeImList(D,Q,rm.ordered = rm.full,til.ee = til.p2, pool.sizes=c(60,20,20),
         max.stored=100,max.nrs.borrow=100,shouldPrint = TRUE)

Output list of top rule sets for each k in 1:k.max

Description

Output list of top rule sets for each k in 1:k.max

Usage

makePhaseTwoImList(D, Q, rm.ordered, k.max, pool.sizes, max.stored,
  shouldPrint)

Arguments

D

binary matrix of events by samples

Q

penalty matrix of events by samples

rm.ordered

matrix of rules ordered by phase one

k.max

max k

pool.sizes

vector of the number of top rules evaluated for each k

max.stored

max number of rule sets saved

shouldPrint

Print progress updates? Default is TRUE

Value

largest n such that n choose k < max.num.rs

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,
         pool.sizes=c(60,20,20),max.stored=100,shouldPrint = TRUE)

Get list of core rules from random subsets of samples

Description

Get list of core rules from random subsets of samples

Usage

makeSubCoreList(D, Q, rm, til, num.subsets, num.evaluated, shouldPrint)

Arguments

D

input matrix D

Q

input matrix Q

rm

binary rule matrix

til

list of top rule set index matrices

num.subsets

number of subset iterations, default is 100

num.evaluated

number of top rs considered per k per iteration, default is 1000

shouldPrint

Print progress updates? Default is TRUE

Examples

library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,
          pool.sizes=c(60,20,20),max.stored=100,shouldPrint = FALSE)
subcore.list <- makeSubCoreList(D,Q,rm.full,til.p2,num.subsets=3,num.evaluated=50)

Example data set derived from TCGA skin cutaneous melanoma (SKCM) data.

Description

A dataset containing the processed inputs used in the melanoma analysis within the CRSO publication.

Usage

skcm.list

Format

A list with 3 items

D: Binary alteration matrix. Rows are candidate driver events, columns are samples.
P: Passenger probability matrix corresponding to D.
cnv.dictionary: Data frame containing copy number genes.

...

Source

Dataset derived from data generated by the TCGA Research Network: https://www.cancer.gov/tcga