BioString-class {Biostrings} | R Documentation |
Class "BioString", contains an encoded string representing a biological sequence for a particular alphabet (RNA, DNA or amino acid). It represents zero or more substrings of the full string.
Objects can be created by calls of the form
new("BioString", alphabet, end, start, values, initialized, ...)
.
However, it is recommended that users should not call this directly.
For now, use the function NucleotideString
to create
objects of class "BioString" that uses a nucleotide alphabet (RNA or
DNA) and the function DNAString
for objects using DNA
alphabet.
alphabet
:"BioAlphabet"
,
the alphabet used in the sequence. initialized
:"logical"
,
TRUE
if the sequence initialized with values. Users should
not modify this slot directly. offsets
:"matrix"
and storage
mode "integer", this stores (in two columns) the start and end
points of the substrings represented in x
. Rows with the
first value 1
and the second value{0} represent empty
substrings.values
:"externalptr"
, this
internally stores the actual encoded sequence as a vector. As
objects of class "externalptr" are passed by value in R, this
saves copying of long sequences. x
.x
corresponding to index
i
.x
corresponding to the
index i
. The index i
must be of length 1
.x
. type
is not used for now.object
of class "BioString".substr(as.character(x), start, stop)
.substring(as.character(text), first, last)
.x
against pattern
using algorithm
. The pattern
can use the letters A,C,G,T,- (the last being the gap character)
and also the wildcards N (matching A,C,G,T), V (matching A,G,C),
R (matching A,G) and Y (matching C,T).x
are entirely made up of the
letter letter
.
The values
slot of the "BioString" class is of class
"externalptr". It always contains an R vector object in its tag
field. The other fields are not used at present. The vector in the tag
field is either a CHARSXP
or an INTSXP
. The exact type
depends on the length of the alphabet. INTSXP
is used if it is
more than the number of bits in a C char
type and
CHARSXP
is used otherwise.
We use the i
-th bit in the char
or int
(depending
on whether the vector is of type CHARSXP or INTSXP) to represent the
i
-th letter in the alphabet where i=0
represents the
first bit. This effectively means that we can have at most 32
letters (including gap) in our alphabets for all standard computer
architectures.
Saikat DebRoy
BioAlphabet-class
and its subclasses for valid alphabet
objects.
DNAString
for creating objects of class "BioString"
representing DNA sequences.
NucleotideString
for creating objects of class "BioString"
representing DNA or RNA sequences.
new("BioString", DNAAlphabet()) # creates an empty DNA string x <- DNAString("AAGCTANA", gap="N") x as.character(x) substr(x, 2, 4) substring(x, 1, seq(length=nchar(x))) # all prefixes of x substring(x, seq(length=nchar(x)), nchar(x)) # all suffixes of x matchDNAPattern("GC", x) x <- substring(x, 1:3, 3:5) x[1:2] x[-3] # same as x[1:2] x[[3]]