Title: | Cancer Rule Set Optimization ('crso') |
Version: | 0.1.1 |
Author: | Michael Klein <michael.klein@yale.edu> |
Maintainer: | Michael Klein <michael.klein@yale.edu> |
Description: | An algorithm for identifying candidate driver combinations in cancer. CRSO is based on a theoretical model of cancer in which a cancer rule is defined to be a collection of two or more events (i.e., alterations) that are minimally sufficient to cause cancer. A cancer rule set is a set of cancer rules that collectively are assumed to account for all of ways to cause cancer in the population. In CRSO every event is designated explicitly as a passenger or driver within each patient. Each event is associated with a patient-specific, event-specific passenger penalty, reflecting how unlikely the event would have happened by chance, i.e., as a passenger. CRSO evaluates each rule set by assigning all samples to a rule in the rule set, or to the null rule, and then calculating the total statistical penalty from all unassigned event. CRSO uses a three phase procedure find the best rule set of fixed size K for a range of Ks. A core rule set is then identified from among the best rule sets of size K as the rule set that best balances rule set size and statistical penalty. Users should consult the 'crso' vignette for an example walk through of a full CRSO run. The full description, of the CRSO algorithm is presented in: Klein MI, Cannataro V, Townsend J, Stern DF and Zhao H. "Identifying combinations of cancer driver in individual patients." BioRxiv 674234 [Preprint]. June 19, 2019. <doi:10.1101/674234>. Please cite this article if you use 'crso'. |
Depends: | R (≥ 3.5.0), foreach |
Imports: | stats, utils |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.1.1 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2019-07-04 20:36:20 UTC; michaelklein |
Repository: | CRAN |
Date/Publication: | 2019-07-07 17:00:03 UTC |
Make full rule library of all rules that satisfy minimum coverage threshold.
Description
Make full rule library of all rules that satisfy minimum coverage threshold.
Usage
buildRuleLibrary(D, rule.thresh, min.epr)
Arguments
D |
Binary matrix of N events and M samples |
rule.thresh |
Minimum fraction of rules covered. Default is .03 |
min.epr |
minimum events per rule. Default is 2. |
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # build rule library
dim(rm.full) # Should be matrix with dimension 60 x 71
Evaluate list of rule set matrices
Description
Evaluate list of rule set matrices
Usage
evaluateListOfIMs(D, Q, rm, im.list)
Arguments
D |
binary matrix of events by samples |
Q |
penalty matrix of events by samples |
rm |
matrix of rules ordered by phase one |
im.list |
list of rule set matrices |
Value
list of Js for each rule set matrix
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
p2.im.list <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),max.stored=100,
shouldPrint = TRUE)
p2.performance.list <- evaluateListOfIMs(D,Q,rm.full,p2.im.list)
Get list of best rule sets of size K for all K
Description
Get list of best rule sets of size K for all K
Usage
getBestRsList(rm, tpl, til)
Arguments
rm |
binary rule matrix |
tpl |
list of top performances |
til |
list of top rule set index matrices |
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),
max.stored=100,shouldPrint = FALSE)
tpl.p2 <- evaluateListOfIMs(D,Q,rm.full,til.p2)
best.rs.list <- getBestRsList(rm = rm.full,tpl = tpl.p2,til = til.p2)
Determine core K from phase 3 tpl and til
Description
Determine core K from phase 3 tpl and til
Usage
getCoreK(D, rm, tpl, til, cov.thresh, perf.thresh)
Arguments
D |
input matrix D |
rm |
binary rule matrix |
tpl |
list of top performances |
til |
list of top rule set index matrices |
cov.thresh |
core coverage threshold, defaults is 95 |
perf.thresh |
core performance threshold, default is 90 |
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),
max.stored=100,shouldPrint = FALSE)
tpl.p2 <- evaluateListOfIMs(D,Q,rm.full,til.p2)
core.K <- getCoreK(D,rm.full,tpl.p2,til.p2)
# core.K should be 3 almost always for this example, can run a few time to confirm
Get core rules from phase 3 tpl and til
Description
Get core rules from phase 3 tpl and til
Usage
getCoreRS(D, rm, tpl, til, cov.thresh, perf.thresh)
Arguments
D |
input matrix D |
rm |
binary rule matrix |
tpl |
list of top performances |
til |
list of top rule set index matrices |
cov.thresh |
core coverage threshold, defaults is 95 |
perf.thresh |
core performance threshold, default is 90 |
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,20,20),
max.stored=100,shouldPrint = FALSE)
tpl.p2 <- evaluateListOfIMs(D,Q,rm.full,til.p2)
core.rs <- getCoreRS(D,rm.full,tpl.p2,til.p2) # core.rs should be r1, r2, r3
Get Generalized Core Duos
Description
Get Generalized Core Duos
Usage
getGCDs(list.subset.cores)
Arguments
list.subset.cores |
list of subset cores |
Examples
list.subset.cores <- list(c("A.B.C","D.E","A.D"),c("A.C","B.C.D","D.E"),
c("A.B.C","D.E"),c("A.B.C","D.E","B.C.D"))
getGCDs(list.subset.cores) # Confidence column should be 100, 100, 100, 75, 50, 25, 25
Get Generalized Core Events
Description
Get Generalized Core Events
Usage
getGCEs(list.subset.cores)
Arguments
list.subset.cores |
list of subset cores |
Examples
list.subset.cores <- list(c("A.B.C","D.E","A.D"),
c("A.C","B.C.D","D.E"),c("A.B.C","D.E"),c("A.B.C","D.E","B.C.D"))
getGCEs(list.subset.cores) # Confidence column should be 100, 100, 100, 100, 100
Get Generalized Core Rules
Description
Get Generalized Core Rules
Usage
getGCRs(list.subset.cores)
Arguments
list.subset.cores |
list of subset cores |
Examples
list.subset.cores <- list(c("A.B.C","D.E","A.D"),c("A.C","B.C.D","D.E"),
c("A.B.C","D.E"),c("A.B.C","D.E","B.C.D"))
getGCRs(list.subset.cores) # Confidence column should be 100, 75, 50, 25, 25
Get pool sizes for phase 2
Description
Get pool sizes for phase 2
Usage
getPoolSizes(rm.ordered, k.max, max.nrs.ee, max.compute)
Arguments
rm.ordered |
binary rule matrix ordered from phase 1 |
k.max |
maximum rule set size |
max.nrs.ee |
max number of rule sets per k |
max.compute |
maximum raw rule sets considered per k |
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
rm.ordered <- rm.full # Skip phase one in this example
getPoolSizes(rm.ordered,k.max = 7,max.nrs.ee = 10000)
# [1] 60 60 40 23 18 16 15
Represent binary rule matrix as strings
Description
Represent binary rule matrix as strings
Usage
getRulesAsStrings(rm)
Arguments
rm |
binary rule matrix |
Value
vector or rules represented as strings
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
rm.full <- buildRuleLibrary(D,rule.thresh = 0.1) # Small rule library matrix, dimension: 5 x 71
getRulesAsStrings(rm.full)
# output should be: "BRAF-M.CDKN2A-MD" "CDKN2A-MD.NRAS-M"
# "BRAF-M.PTEN-MD" "ADAM18-M.BRAF-M" "ADAM18-M.CDKN2A-MD"
Make filtered im list from phase 3 im list
Description
Make filtered im list from phase 3 im list
Usage
makeFilteredImList(D, Q, rm, til, filter.thresh)
Arguments
D |
binary matrix of events by samples |
Q |
penalty matrix of events by samples |
rm |
matrix of rules ordered by phase one |
til |
im list from phase 3 |
filter.thresh |
minimum percentage of samples assigned to each rule in rs |
Value
filtered top im list
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,
pool.sizes=c(60,20,20),max.stored=100,shouldPrint = FALSE)
filtered.im.list <- makeFilteredImList(D,Q,rm.full,til.p2,filter.thresh = 0.05)
Order rules according to phase one importance ranking
Description
Order rules according to phase one importance ranking
Usage
makePhaseOneOrderedRM(D, rm.start, spr, Q, trn, n.splits, shouldPrint)
Arguments
D |
Binary matrix of N events and M samples |
rm.start |
Starting binary rule matrix (i.e., rule library) |
spr |
Random rule sets per rule in each phase one iteration. Default is 40. |
Q |
Penalty matrix, negative log of passenger probability matrix. |
trn |
Target rule number for stopping iterating. Default is 16. |
n.splits |
number of splits for parallelization. Default is all available cpus. |
shouldPrint |
Print progress updates? Default is TRUE |
Value
binary rule matrix ordered by phase one importance ranking
Examples
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.06) # Rule library matrix, dimension: 36s x 71
rm.ordered <- makePhaseOneOrderedRM(D,rm.full,spr = 1,Q,trn = 34,shouldPrint = TRUE)
# note, for real applications, spr should be at least 40.
Make phase 3 im list from phase 2 im list
Description
Make phase 3 im list from phase 2 im list
Usage
makePhaseThreeImList(D, Q, rm.ordered, til.ee, pool.sizes, max.stored,
max.nrs.borrow, shouldPrint)
Arguments
D |
binary matrix of events by samples |
Q |
penalty matrix of events by samples |
rm.ordered |
matrix of rules ordered by phase one |
til.ee |
list of rule set matrices (im list) from phase two |
pool.sizes |
pool sizes for phase two |
max.stored |
max number of rule sets saved |
max.nrs.borrow |
max number of new rule sets per k, default is 10^5 |
shouldPrint |
Print progress updates? Default is TRUE |
Value
phase 3 top im list
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,pool.sizes=c(60,10,10),
max.stored=100,shouldPrint = FALSE)
til.p3 <- makePhaseThreeImList(D,Q,rm.ordered = rm.full,til.ee = til.p2, pool.sizes=c(60,20,20),
max.stored=100,max.nrs.borrow=100,shouldPrint = TRUE)
Output list of top rule sets for each k in 1:k.max
Description
Output list of top rule sets for each k in 1:k.max
Usage
makePhaseTwoImList(D, Q, rm.ordered, k.max, pool.sizes, max.stored,
shouldPrint)
Arguments
D |
binary matrix of events by samples |
Q |
penalty matrix of events by samples |
rm.ordered |
matrix of rules ordered by phase one |
k.max |
max k |
pool.sizes |
vector of the number of top rules evaluated for each k |
max.stored |
max number of rule sets saved |
shouldPrint |
Print progress updates? Default is TRUE |
Value
largest n such that n choose k < max.num.rs
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,
pool.sizes=c(60,20,20),max.stored=100,shouldPrint = TRUE)
Get list of core rules from random subsets of samples
Description
Get list of core rules from random subsets of samples
Usage
makeSubCoreList(D, Q, rm, til, num.subsets, num.evaluated, shouldPrint)
Arguments
D |
input matrix D |
Q |
input matrix Q |
rm |
binary rule matrix |
til |
list of top rule set index matrices |
num.subsets |
number of subset iterations, default is 100 |
num.evaluated |
number of top rs considered per k per iteration, default is 1000 |
shouldPrint |
Print progress updates? Default is TRUE |
Examples
library(crso)
data(skcm)
list2env(skcm.list,envir=globalenv())
Q <- log10(P)
rm.full <- buildRuleLibrary(D,rule.thresh = 0.05) # Rule library matrix, dimension: 60 x 71
til.p2 <- makePhaseTwoImList(D,Q,rm.full,k.max = 3,
pool.sizes=c(60,20,20),max.stored=100,shouldPrint = FALSE)
subcore.list <- makeSubCoreList(D,Q,rm.full,til.p2,num.subsets=3,num.evaluated=50)
Example data set derived from TCGA skin cutaneous melanoma (SKCM) data.
Description
A dataset containing the processed inputs used in the melanoma analysis within the CRSO publication.
Usage
skcm.list
Format
A list with 3 items
- D
Binary alteration matrix. Rows are candidate driver events, columns are samples.
- P
Passenger probability matrix corresponding to D.
- cnv.dictionary
Data frame containing copy number genes.
...
Source
Dataset derived from data generated by the TCGA Research Network: https://www.cancer.gov/tcga