setMissingGenotypes {GWASTools} | R Documentation |
setMissingGenotypes
copies an existing GDS or netCDF genotype file to a new
one, setting SNPs in specified regions to missing.
setMissingGenotypes(parent.file, new.file, regions, file.type=c("gds", "ncdf"), sample.include=NULL, compress="LZMA_RA", copy.attributes=TRUE, verbose=TRUE)
parent.file |
Name of the parent file |
new.file |
Name of the new file |
regions |
Data.frame of chromosome regions with columns
|
file.type |
The type of |
sample.include |
Vector of sampleIDs to include in |
compress |
The compression level for variables in a GDS file (see |
copy.attributes |
Logical value specifying whether to copy chromosome attributes to the new file. |
verbose |
Logical value specifying whether to show progress information. |
setMissingGenotypes
removes chromosome regions by setting
SNPs that fall within the anomaly regions to NA
(i.e., the missing value
in the netCDF/GDS file). Optionally, entire samples may be excluded from
the netCDF/GDS file as well: if the sample.include
argument is
given, only the scanIDs in this vector will be written to the new
file, so the sample dimension will be length(sample.include)
.
For regions with whole.chrom=TRUE
, the entire chromosome will
be set to NA
for that sample. For other regions, only the
region between left.base
and right.base
will be set to NA
.
Stephanie Gogarten
gdsSubset
, anomSegStats
for
chromosome anomaly regions
gdsfile <- system.file("extdata", "illumina_geno.gds", package="GWASdata") gds <- GdsGenotypeReader(gdsfile) sample.sel <- getScanID(gds, index=1:10) close(gds) regions <- data.frame("scanID"=sample.sel[1:3], "chromosome"=c(21,22,23), "left.base"=c(14000000, 30000000, NA), "right.base"=c(28000000, 450000000, NA), whole.chrom=c(FALSE, FALSE, TRUE)) newgds <- tempfile() setMissingGenotypes(gdsfile, newgds, regions, file.type="gds", sample.include=sample.sel) file.remove(newgds)