This function requires the R package seSAMe. The goal of data sanitization is to modifiy IDAT files in place, so they can be released to public domain without privacy leak. This will be achieved by deIdentification and reIdentification.
One can find high-quality DNA methylation data on more than 10,000 human samples with the HM450 platform.
##Method 1
This first method of deIdentification masks SNP probe intensity mean by zero. As a consequence, the allele frequency will be 0.5.
deIdentify(res_grn$dest_file, sprintf("%s/deidentified_Grn.idat", dest_dir))
deIdentify(res_red$dest_file, sprintf("%s/deidentified_Red.idat", dest_dir))
betas1 = getBetas(readIDATpair(sprintf("%s/3999492009_R01C01", dest_dir)))
betas2 = getBetas(readIDATpair(sprintf("%s/deidentified", dest_dir)))
## before deIdentify, the rs values will all be different
head(betas1[grep('rs',names(betas1))])
## after deIdentify, all rs values will be 0.5
head(betas2[grep('rs',names(betas2))])
##Method 2
This second method of deIdentification will scramble the intensities using a secret key to help formalize a random number. Therefore, randomize needs to be set to TRUE.
my_secret <- 13412084
set.seed(my_secret)
deIdentify(res_grn$dest_file,
sprintf("%s/deidentified_Grn.idat", dest_dir), randomize=TRUE)
my_secret <- 13412084
set.seed(my_secret)
deIdentify(res_red$dest_file,
sprintf("%s/deidentified_Red.idat", dest_dir), randomize=TRUE)
betas1 = getBetas(readIDATpair(sprintf("%s/3999492009_R01C01", dest_dir)))
betas2 = getBetas(readIDATpair(sprintf("%s/deidentified", dest_dir)))
## before deIdentify, the rs values will all be different
head(betas1[grep('rs',names(betas1))])
## after deIdentify, all rs values will be scrambled
head(betas2[grep('rs',names(betas2))])
To restore order of the deIdentified intensities, one can re-identify IDATs. The reIdentify function can thus restore the scrambled SNP intensities.
my_secret <- 13412084
set.seed(my_secret)
reIdentify(sprintf("%s/deidentified_Grn.idat", dest_dir),
sprintf("%s/reidentified_Grn.idat", dest_dir))
my_secret <- 13412084
set.seed(my_secret)
reIdentify(sprintf("%s/deidentified_Red.idat", dest_dir),
sprintf("%s/reidentified_Red.idat", dest_dir))
betas1 = getBetas(readIDATpair(sprintf("%s/3999492009_R01C01", dest_dir)))
betas2 = getBetas(readIDATpair(sprintf("%s/reidentified", dest_dir)))
## before reIdentify, the values are different
head(betas1[grep('rs',names(betas1))])
## after reIdentify, the values are restored, the same as betas 1
head(betas2[grep('rs',names(betas2))])