Assign patients to four prostate cancer DNA methylation subtypes
Usage
estimate.subtypes(
methy.data,
subtype.model = "RF",
prop.missing.cutoff = 0.3,
pamr.impute.using.all.cpgs = TRUE,
seed = 123
)
Arguments
- methy.data
A data.frame with patients as rows (rownames give patient ids) and column names give CpG ids.
- subtype.model
Which subtype model to use ('PAMR' or 'RF' for random forest). Although slower, we recommend 'RF' for its increased accuracy and intrinsic imputation for missing values. Further, if some of the required CpGs are completely missing, then you must use 'RF'.
- prop.missing.cutoff
The maximum proportion of missing values allowed for each required CpG.
- pamr.impute.using.all.cpgs
If using
subtype.model = 'PAMR'
, should imputation be done using all CpGs inmethy.data
(TRUE) or only the CpGs required by subtype.model.pamr (FALSE). When TRUE, imputation will be slower and use more memory, but should be more accurate.- seed
integer seed used for imputation.
Value
subtypes
: data.frame with the estimated subtypes and sample IDs (rownames ofmethy.data
)validation
: output from validate.subtype.model.cpgs to check ifmethy.data
contains the required CpGs and whether any CpGs have high missingness.
Examples
### example CpG data
data('example.data');
subtypes <- estimate.subtypes(example.data);
# estimated subtypes
head(subtypes$subtypes);
#> subtype
#> TCGA-CH-5739 MS-3
#> TCGA-HC-7079 MS-2
# validation results:
# length(subtypes$validation$required.cpgs)
# length(subtypes$validation$required.cpgs.with.high.missing)
# length(subtypes$validation$missing.cpgs)