BSgenome-class           package:BSgenome           R Documentation

_T_h_e _B_S_g_e_n_o_m_e _c_l_a_s_s

_D_e_s_c_r_i_p_t_i_o_n:

     A container for the complete genome sequence of a given specie.

_D_e_t_a_i_l_s:

     [TODO: Put some details here]

_A_c_c_e_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a BSgenome object.

      'seqnames(x)': Returns the index of the single sequences
          contained in 'x'. Each single sequence is stored in a BString
          (or derived) object and comes from a source file (FASTA) with
          a single record. The names returned by 'seqnames(x)' usually
          reflect the names of those source files but a common prefix
          or suffix was eventually removed in order to keep them as
          short as possible.

      'mseqnames(x)': Returns the index of the multiple sequences
          contained in 'x'. Each multiple sequence is stored in a
          BStringViews object and comes from a source file (FASTA) with
          multiple records. The names returned by 'mseqnames(x)'
          usually reflect the names of those source files but a common
          prefix or suffix was eventually removed in order to keep them
          as short as possible.

      'names(x)': Returns the index of all sequences contained in 'x'.
          This is the same as 'c(seqnames(x), mseqnames(x))'.

_S_t_a_n_d_a_r_d _g_e_n_e_r_i_c _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a BSgenome object and 'name' is
     the name of a sequence (character-string).

      'length(x)': Returns the length of 'x', i.e., the number of all
          sequences that it contains. This is the same as
          'length(names(x))'.

      'x[[name]]': [TODO: Document me]

      'x$name': [TODO: Document me]

_O_t_h_e_r _f_u_n_c_t_i_o_n_s _a_n_d _g_e_n_e_r_i_c_s:

     In the code snippets below, 'x' is a BSgenome object and 'name' is
     the name of a sequence (character-string).

      'unload(x, name)': [TODO: Document me]

_A_u_t_h_o_r(_s):

     H. Pages

_S_e_e _A_l_s_o:

     'available.genomes', BString, DNAString, BStringViews, 'getSeq',
     'matchPattern', 'rm', 'gc'

_E_x_a_m_p_l_e_s:

       library(BSgenome.Celegans.UCSC.ce2)   # This doesn't load the chromosome 
                                             # sequences into memory.
       length(Celegans)                      # Number of sequences in this genome.
       Celegans                              # Displays index of all the sequences
                                             # in this genome.
       mem0 <- gc()["Vcells", "(Mb)"]        # Current amount of data in memory (in
                                             # Mb).
       Celegans[["chrV"]]                    # Loads chromosome V into memory (hence
                                             # takes a long time).
       gc()["Vcells", "(Mb)"] - mem0         # Chromosome V occupies 20Mb of memory.
       Celegans[["chrV"]]                    # Much faster (sequence is already in
                                             # memory, hence it's not loaded again).
       Celegans$chrV                         # Equivalent to Celegans[["chrV"]].
       class(Celegans$chrV)                  # Chromosome V (like any other
                                             # chromosome sequence) is a DNAString
                                             # object.
       nchar(Celegans$chrV)                  # Its has 20922231 letters (nucleotides).
       x <- Celegans$chrV                    # Very fast because a BString object
                                             # doesn't contain the sequence, only a
                                             # pointer to the sequence, hence chrV
                                             # seq is not duplicated in memory. But
                                             # we now have 2 objects pointing to the
                                             # same place in memory.
       y <- substr(x, 10, 100)               # A 3rd object pointing to chrV seq.
       
       ## We must remove all references to chrV seq if we want the 20Mb of memory
       ## used by it to be freed (note that it can be hard to keep track of all the
       ## references to a given sequence).
       ## IMPORTANT: The 1st reference to this seq (Celegans$chrV) should be removed
       ## last. This is achieved with unload(). All other references are removed by
       ## just removing the referencing object.
       rm(x)
       rm(y)
       unload(Celegans, "chrV")
       gc()["Vcells", "(Mb)"]

