estimateOffTargets {TAPseq} | R Documentation |
Functions to use BLAST to align TAP-seq primers against a genome and chromosome reference to estimate potential off-target binding sites.
createBLASTDb( genome, annot, blastdb, standard_chromosomes = TRUE, tx_id = "transcript_id", tx_name = "transcript_name", gene_name = "gene_name", gene_id = "gene_id", title = "TAP-seq_GT_DB", verbose = FALSE, makeblastdb = getOption("TAPseq.makeblastdb") ) blastPrimers( object, blastdb, max_mismatch = 0, min_aligned = 0.75, primer_targets = c("transcript_id", "transcript_name", "gene_id", "gene_name"), tmpdir = tempdir(), blastn = getOption("TAPseq.blastn") ) ## S4 method for signature 'TsIO' blastPrimers( object, blastdb, max_mismatch = 0, min_aligned = 0.75, primer_targets = c("transcript_id", "transcript_name", "gene_id", "gene_name"), tmpdir = tempdir(), blastn = getOption("TAPseq.blastn") ) ## S4 method for signature 'TsIOList' blastPrimers( object, blastdb, max_mismatch = 0, min_aligned = 0.75, primer_targets = c("transcript_id", "transcript_name", "gene_id", "gene_name"), tmpdir = tempdir(), blastn = getOption("TAPseq.blastn") )
genome |
A |
annot |
A |
blastdb |
TAP-seq BLAST database created with
|
standard_chromosomes |
(logical) Specifies whether only standard chromosomes should be included in output genome sequences (e.g. chr1-22, chrX, chrY, chrM for homo sapiens). |
tx_id, tx_name, gene_name, gene_id |
(character) Column names in annot metadata containing transcript id, transcript name, gene name and gene id information. |
title |
Optional title for BLAST database. |
verbose |
(logical) If |
makeblastdb |
Path to the |
object |
|
max_mismatch |
Maximum number of mismatches allowed for off-target hits (default: 0). |
min_aligned |
Minimum portion of the primer sequence starting from the 3' end that must align for off-target hits (default: 0.75). |
primer_targets |
Specifies what should be used to identify primer targets for off-target
identification. I.e. to what does the |
tmpdir |
Directory needed to store temporary files. |
blastn |
Path (character) to the |
createBLASTDb
creates a BLAST database containing genome and transcriptome sequences,
which is required by blastPrimers
. The created database contains both sequence files for
BLAST and annotations to process the results.
Use blastPrimers
to align designed TAP-seq primers against the created database to
estimate off-target priming potential. Only hits where at least a specified portion of the
sequence involving the 3' end of the primer aligns with not more than a certain number of
mismatches are considered.
blastPrimers
counts the number of genes in which a primer has 1) exonic hits or 2)
intronic hits, or 3) the number of hits in intergenic regions of the genome. The exonic and
intronic counts should be interptreted as: "In how many genes does a primer have exonic
(or intronic) hits?".
If a BLAST hit falls in both intronic and exonic regions of a given gene (i.e. exonic for one transcript, intronic for another transcript), only the exonic hit is counted for that gene. If a primer has for instance 3 BLAST hits in one gene, 2 exonic and 1 intronic, then one exonic hit and one intronic hit is counted for that gene.
If sequence IDs of the designed primers (sequence_id
) refer to
the target gene/transcripts and can be found in the BLAST database annotations via
primer_targets
, then only off-target hits are counted. This is usually the case if input
for primer design was produced from target gene annotations.
For createBLASTDb
a directory containing the BLAST database. For
blastPrimers
a TsIO
or
TsIOList
object with the number of potential off-targets
added to the TAP-seq primer metadata.
createBLASTDb
: Create a genome and transcriptome TAP-seq BLAST database
blastPrimers,TsIO-method
: BLAST primers in a TsIO
object
blastPrimers,TsIOList-method
: BLAST primers in a TsIOList
object
## Not run: library(BSgenome) # human genome (hg38) BSgenome object hg38 <- getBSgenome("BSgenome.Hsapiens.UCSC.hg38") # get annotations for BLAST annot_url <- paste0("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/", "gencode.v32.annotation.gtf.gz") annot <- import(annot_url, format = "gtf") blast_exons <- annot[annot$type == "exon" & annot$gene_type == "protein_coding"] # build BLAST database blastdb <- file.path(tempdir(), "blastdb") createBLASTDb(genome = hg38, annot = blast_exons, blastdb = blastdb) # chr11 primers example data (already contains off-targets, but we can overwrite them) data("chr11_primers") chr11_primers <- chr11_primers[1:3] # only use a small subset for this example # run blast to identify potential off-targets chr11_primers <- blastPrimers(chr11_primers, blastdb = blastdb) tapseq_primers(chr11_primers) # allow 1 mismatch between primer and off-target chr11_primers <- blastPrimers(chr11_primers, blastdb = blastdb, max_mismatch = 1) tapseq_primers(chr11_primers) ## End(Not run)