hlaCompareAllele {HIBAG} | R Documentation |
To evaluate the overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value.
hlaCompareAllele(TrueHLA, PredHLA, allele.limit=NULL, call.threshold=NaN, match.threshold=NaN, max.resolution="", output.individual=FALSE, verbose=TRUE)
TrueHLA |
an object of |
PredHLA |
an object of |
allele.limit |
a list of HLA alleles, the validation samples are
limited to those having HLA alleles in |
call.threshold |
the call threshold for posterior probability, i.e.,
call or no call is determined by whether |
match.threshold |
the matching threshold for SNP haplotype similiarity, e.g., use 1% quantile of matching statistics of a training model |
max.resolution |
"2-digit", "4-digit", "6-digit", "8-digit", "allele", "protein", "2", "4", "6", "8", "full" or "": "allele" = "2-digit", "protein" = "4-digit", "full" and "" indicating no limit on resolution |
output.individual |
if TRUE, output accuracy for each individual |
verbose |
if TRUE, show information |
Return a list(overall, confusion, detail)
, or
list(overall, confusion, detail, individual)
if
output.individual=TRUE
.
overall
(data.frame):
total.num.ind |
the total number of individuals |
crt.num.ind |
the number of individuals with correct HLA types |
crt.num.haplo |
the number of chromosomes with correct HLA alleles |
acc.ind |
the proportion of individuals with correctly predicted HLA types (i.e., both of alleles are correct, the accuracy of an individual is 0 or 1.) |
acc.haplo |
the proportion of chromosomes with correctly predicted HLA alleles (i.e., the accuracy of an individual is 0, 0.5 or 1, since an individual has two alleles.) |
call.threshold |
call threshold, if it is |
n.call |
the number of individuals with call |
call.rate |
overall call rate |
confusion
(matrix): a confusion matrix.
detail
(data.frame):
allele |
HLA alleles |
train.num |
the number of training haplotypes |
train.freq |
the training haplotype frequencies |
valid.num |
the number of validation haplotypes |
valid.freq |
the validation haplotype frequencies |
call.rate |
the call rates for HLA alleles |
accuracy |
allele accuracy |
sensitivity |
sensitivity |
specificity |
specificity |
ppv |
positive predictive value |
npv |
negative predictive value |
miscall |
the most likely miss-called alleles |
miscall.prop |
the proportions of the most likely miss-called allele in all miss-called alleles |
individual
(data.frame):
sample.id |
sample id |
true.hla |
the true HLA type |
pred.hla |
the prediction of HLA type |
accuracy |
accuracy, 0, 0.5, or 1 |
Xiuwen Zheng
hlaAttrBagging
, predict.hlaAttrBagClass
,
hlaReport
# make a "hlaAlleleClass" object hla.id <- "A" hla <- hlaAllele(HLA_Type_Table$sample.id, H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")], H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")], locus=hla.id, assembly="hg19") # divide HLA types randomly set.seed(100) hlatab <- hlaSplitAllele(hla, train.prop=0.5) names(hlatab) # "training" "validation" summary(hlatab$training) summary(hlatab$validation) # SNP predictors within the flanking region on each side region <- 500 # kb snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position, hla.id, region*1000, assembly="hg19") length(snpid) # 275 # training and validation genotypes train.geno <- hlaGenoSubset(HapMap_CEU_Geno, snp.sel=match(snpid, HapMap_CEU_Geno$snp.id), samp.sel=match(hlatab$training$value$sample.id, HapMap_CEU_Geno$sample.id)) test.geno <- hlaGenoSubset(HapMap_CEU_Geno, samp.sel=match(hlatab$validation$value$sample.id, HapMap_CEU_Geno$sample.id)) # train a HIBAG model set.seed(100) model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4, verbose.detail=TRUE) summary(model) # validation pred <- predict(model, test.geno) # compare (comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model, call.threshold=0)) (comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model, call.threshold=0.5))