matchDNAPattern {Biostrings} | R Documentation |
Generic that finds all matches of a pattern in a DNA string. Currently two algorithms are implemented. The default algorithm is an extension of the Boyer-Moore algorithm. The extended algorithm allows some wildcards in addition to the symbols for the bases and gap. The other algorithm is a simple forward search that examines all substrings of the full string of the same length as the pattern from the begining to end.
matchDNAPattern(pattern, x, algorithm, mismatch)
pattern |
An object representing the pattern string. The string in
pattern can use any of the standard DNA pattern letters. See
DNAPatternAlphabet for all valid letters. |
x |
An object representing a DNA string. |
algorithm |
Currently the only valid values are
"boyer-moore" , "forward-search"
and "shift-or" . The forward search algorithm is often as
fast as the more sphisticated Boyer-Moore algorithm when the
patterns being matched are very simple. The shift-or algorithm is
even faster. However, it can only be used for patterns of length at
most 32 or 64 depending on the number of bits in a machine word. The
shift-or algorithm can also do inexact matches for a given number of
mismatches. The default is "shift-or" where valid and "boyer-moore"
otherwise |
mismatch |
An integer, the number of mismatches allowed. The defualt is 0. If the default is non-zero an inexact match algorithm is used for matching. |
An object of class "BioString" with the same length as the number of
matches. Each element in the "BioString" object is a match. To obtain
the start and end points of the matches, use as.matrix
on the
return value. See documentation for the "BioString" class for more
details.
Saikat DebRoy
Dan Gusfield - Algorithms on strings, trees, and sequences
BioString-class
for the type of the return value.
x <- DNAString("AAGCGCGATATG") m1 <- matchDNAPattern("GCNNNAT", x) m1 as.matrix(m1) m2 <- matchDNAPattern("GCNNNAT", x, algorithm="forward-search") m2 as.matrix(m2) data('yeastSEQCHR1') yeast1 <- DNAString(yeastSEQCHR1) PpiI <- "GAACNNNNNCTC" # a restriction enzyme pattern match1.PpiI <- matchDNAPattern(PpiI, yeast1) match2.PpiI <- matchDNAPattern(PpiI, yeast1, algorithm="forward-search") match1.PpiI match2.PpiI match3.PpiI <- matchDNAPattern(PpiI, yeast1, mismatch=1) match3.PpiI