findORFsFasta {ORFik}R Documentation

Finds Open Reading Frames in fasta files.

Description

Should be used for procaryote genomes or transcript sequences as fasta. Makes no sence for eukaryote whole genomes, since it contains splicing. Searches through each fasta header and reports all ORFs found for BOTH sense (+) and antisense strand (-) in all frames. Name of the header will be used as seqnames of reported ORFs. Each fasta header is treated separately, and name of the sequence will be used as seqname in returned GRanges object. This supports circluar genomes.

Usage

findORFsFasta(filePath, startCodon = startDefinition(1),
  stopCodon = stopDefinition(1), longestORF = TRUE,
  minimumLength = 0, is.circular = FALSE)

Arguments

filePath

(character) Path to the fasta file. Can be both uppercase or lowercase.

startCodon

(character vector) Possible START codons to search for. Check startDefinition for helper function.

stopCodon

(character vector) Possible STOP codons to search for. Check stopDefinition for helper function.

longestORF

(logical) Default TRUE. Keep only the longest ORF per unique (seqname, strand, stopcodon) combination, you can also use function longestORFs after creation of ORFs for same result.

minimumLength

(integer) Default is 0. Which is START + STOP = 6 bp. Minimum length of ORF, without counting 3bp for START and STOP codons. For example minimumLength = 8 will result in size of ORFs to be at least START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search.

is.circular

(logical) Whether the genome in filePath is circular. Prokaryotic genomes are usually circular. Be carefull if you want to extract sequences, remember that seqlengths must be set, else it does not know what last base in sequence is before loop ends!

Details

Remember if you have a fasta file of transcripts (transcript coordinates), delete all negative stranded ORFs afterwards by: orfs <- orfs[strandBool(orfs)] # negative strand orfs make no sense then. Seqnames are created from header by format: >name info, so name must be first after "biggern than" and space between name and info.

Value

(GRanges) object of ORFs mapped from fasta file. Positions are relative to the fasta file.

See Also

Other findORFs: findMapORFs, findORFs, startDefinition, stopDefinition

Examples

# location of the example fasta file
example_genome <- system.file("extdata", "genome.fasta", package = "ORFik")
findORFsFasta(example_genome)


[Package ORFik version 1.4.0 Index]