Contents

#Introduction

Ribosome footprinting, developed by Jonathan Weissman and Nicholas Ingolia1, measures translation by direct quantification of the coding sequence currently bound by the 80S ribosome (ribosome-protected fragments, RPFs).2 In eukaryotes, the size of RPFs is around 28-nt, where the P-site of the ribosome is typically in position 13 from 5’ reads. In bacteria, Allen et. al. get more accurate ID of P-site from 3’ end of reads3.

Schematic representation of ribosome profiling.

There are several packages available in Bioconductor already, including, riboSeqR,4 RiboProfiling5 and ORFik6. These packages are powerful in analyzing the ribosome footprinting data. ORFik package can also seek the new transcription start site using CageSeq data. RiboWaltz7 is another popular package which is based on R and Bioconductor.

To help researchers quickly assess the quality of their ribosome profiling data, we have developed the ribosomeProfilingQC package. The ribosomeProfilingQC package can be sued to easily make diagnostic plots to check the mapping quality and frameshifts. In addition, it can preprocess ribosome profiling data for subsequent differential analysis.

Please note that all following analyses are based on known Open Reading Frame (ORF) annotation.

1 Quick start

Here is an example using ribosomeProfilingQC with a subset of ribo-seq data.

First install ribosomeProfilingQC and other packages required to run the examples. Please note that the example dataset used here is from zebrafish. To run analysis with dataset from a different species or different assembly, please install the corresponding Bsgenome and TxDb. For example, to analyze mouse data aligned to mm10, please install BSgenome.Mmusculus.UCSC.mm10, and TxDb.Mmusculus.UCSC.mm10.knownGene. You can also generate a TxDb object by functions makeTxDbFromGFF from a local gff file, or makeTxDbFromUCSC, makeTxDbFromBiomart, and makeTxDbFromEnsembl, from online resources in GenomicFeatures package.

library(BiocManager)
BiocManager::install(c("ribosomeProfilingQC", 
                       "AnnotationDbi", "Rsamtools",
                       "BSgenome.Drerio.UCSC.danRer10",
                       "TxDb.Drerio.UCSC.danRer10.refGene",
                       "motifStack"))
## load library
library(ribosomeProfilingQC)
library(AnnotationDbi)
library(Rsamtools)

1.1 Load genome

In this manual, we will use the fish genome.

library(BSgenome.Drerio.UCSC.danRer10)
## set genome, Drerio is a shortname for BSgenome.Drerio.UCSC.danRer10
genome <- Drerio

If your assembly is Human hg38 please load the human library,

library(BSgenome.Hsapiens.UCSC.hg38)
genome <- Hsapiens

If your assembly is Mouse mm10 please load the mouse library,

library(BSgenome.Mmusculus.UCSC.mm10)
genome <- Mmusculus

1.2 Prepare annotaiton CDS

The function prepareCDS is used to prepare the information for downstream analysis from a TxDb object.

## which is corresponding to BSgenome.Drerio.UCSC.danRer10
library(TxDb.Drerio.UCSC.danRer10.refGene)
txdb <- TxDb.Drerio.UCSC.danRer10.refGene ## give it a short name
CDS <- prepareCDS(txdb)

If your assembly is Human hg38 please try to load the library,

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene ## give it a short name
CDS <- prepareCDS(txdb)

If your assembly is Mouse mm10 please try to load the library,

library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene ## give it a short name
CDS <- prepareCDS(txdb)

You can also create a TxDb object from a gtf file by GenomicFeatures::makeTxDbFromGFF function. To get GTF file, you can download it from ensembl or get the online file info via AnnotationHub. Here we use a prepared TxDb object for testing.

## Create a small TxDb object which only contain chr1 information.
library(GenomicFeatures)
txdb <- makeTxDbFromGFF(system.file("extdata",
                                    "Danio_rerio.GRCz10.91.chr1.gtf.gz",
                                    package="ribosomeProfilingQC"),
                        organism = "Danio rerio",
                        chrominfo = seqinfo(Drerio)["chr1"],
                        taxonomyId = 7955)
CDS <- prepareCDS(txdb)

1.3 Inputs

The input of ribosomeProfilingQC is bam file. To prepare bam file, different from riboSeqR package which ask reads mapped to transcriptome, ribosomeProfilingQC use the bam file mapped to whole genome. To get correctly mapped reads, first try to map adaptor trimmed sequences to genome assembly by bowtie2 with following parameters: –local –ma 5 –mp 8,4 –rdg 7,7 –rfg 7,7 –fr –nofw and then fileter the reads mapped to rRNA, tRNA, snRNA, snoRNA and misc_RNA from Ensembl and Repeatmasker annotations. After that, map the clean reads to genome assembly by tophat2 with following parameters: –library-type fr-firststrand –transcriptome-index=Transcriptome_data/genome. Because the library type of ribo-seq is usally strand-specific, please make sure to map the reads with correct library type.

library(Rsamtools)
## input the bamFile from the ribosomeProfilingQC package 
bamfilename <- system.file("extdata", "RPF.WT.1.bam",
                           package="ribosomeProfilingQC")
## For your own data, please set bamfilename as your file path.
## For example, your bam file is located at C:\mydata\a.bam
## you want to set bamfilename = "C:\\mydata\\a.bam"
## or you can change your working directory by
## setwd("C:\\mydata")
## and then set bamfilename = "a.bam"
yieldSize <- 10000000
bamfile <- BamFile(bamfilename, yieldSize = yieldSize)

1.4 Estimate P site

As it shown in the above figure, P site of the ribosome is in position 13 (if using RNase I). However, in different experiments, the P site may be shifted due to various experimental conditions such as the choice of enzyme and the cell type. The estimatePsite function can be used to check the P site. The estimatePsite function will search start/stop codons that occur in the reads and the bestPsite function will meta-plot the distribution for each position. The estimatePsite will only use 12, 13 or 14 as best P site candidates when searching from the 5’ end.

estimatePsite(bamfile, CDS, genome)
## [1] 13

It has been shown that for certain enzymes, such as MNase, estimating the P site from the 3’ end works much better3. The estimatePsite will use 15, 16 or 17 as best P site candidates when searching from the 3’ end.

estimatePsite(bamfile, CDS, genome, anchor = "3end")
## [1] -16

1.5 Plot start/stop windows

The readsEndPlot function will plot the 5’ end or 3’ end reads shifted from the start/stop position of CDS. There is no difference when assign the reading frame for most of the reads if you set best P site to 13 or 10 or 16 (from 5’ end). The readsEndPlot can help users to determine the real best Psite. In the example below, the start codon is enriched in position -9 from the 5’ end of reads and in position 19 from the 3’ end of reads. This means there are a lot of ribosome that are docking at the translation start position and most of the reads length are 28 nt.

Ribosome docking at TSS

readsEndPlot(bamfile, CDS, toStartCodon=TRUE)

readsEndPlot(bamfile, CDS, toStartCodon=TRUE, fiveEnd=FALSE)

If you see following distribution, that means lots of gene are in active expression.

Active expression

1.6 Read all P site coordinates

The getPsiteCoordinates function is used to read all P site coordinates. Ideally, the bestpsite should be 13. To test the data quality, we set bestpsite = 13.

pc <- getPsiteCoordinates(bamfile, bestpsite = 13)

1.7 Fragment size distribution

Ribosome-protected fragments should ideally be 27 to 29-nt long. To check the fragment size distribution, use the following function:

readsLen <- summaryReadsLength(pc)

1.7.1 Filter the reads by fragment size

To filter reads by their length for downstream analysis, use the following script:

## for QC we only use reads length 28-29
pc.sub <- pc[pc$qwidth %in% c(28, 29)]

1.8 Sense/antisense strand plot

Most of the reads should be mapped to sense strand because the ribo-seq library is strand-specific.

strandPlot(pc.sub, CDS)

1.9 Genomic element distribution

For ribosome footprinting, most of the reads should map to the CDS region. The readsDistribution function will show the P site locations in different genomic elements: CDS, 5’UTR, 3’UTR, other type exon, intron, promoter, downstream or intergenic region. A high downstream percentage indicates that there is a high percentage of alternative polyAdenylation sites usage from annotation data. A high percentage in intronic regions indicates the possibility of intron-retaining transcripts.

pc.sub <- readsDistribution(pc.sub, txdb, las=2)

1.10 Metagene analysis plot for 5’UTR/CDS/3’UTR

A metagene plot can indicate the reads distribution in 5’UTR, CDS and 3’UTR region.

cvgs.utr5 <- coverageDepth(RPFs = bamfilename, gtf = txdb, region="utr5")
cvgs.CDS <- coverageDepth(RPFs = bamfilename, gtf = txdb, region="cds")
cvgs.utr3 <- coverageDepth(RPFs = bamfilename, gtf = txdb, region="utr3")
metaPlot(cvgs.utr5, cvgs.CDS, cvgs.utr3, sample=1)

1.11 Reading frame

Function assignReadingFrame is used to set the reading frame for the P sites located within known the annotated CDS. The plotDistance2Codon function can be used to plot the reading frame usage in transcription initiation or termination sites. Function plotFrameDensity can be used to collapse all the RPFs in each frame. These plots can help you to double check if the p-site position is correct or not. If it is correct, most of the reads should be assigned to frame0.

pc.sub <- assignReadingFrame(pc.sub, CDS)
plotDistance2Codon(pc.sub)

plotFrameDensity(pc.sub)

To determine how many of raw reads are mapping with P sites in frame 0.

pc <- assignReadingFrame(pc, CDS)
plotFrameDensity(pc)

Function plotTranscript can be used to view the reading frame distribution for given transcripts.

plotTranscript(pc.sub, c("ENSDART00000161781", "ENSDART00000166968",
                         "ENSDART00000040204", "ENSDART00000124837"))

1.12 ORFscore vs coverageRate

ORFscore2 can be used to quantify the biased distribution of RPFs toward the first frame of a given CDS. Coverage rate for whole CDS can help researchers to check the RPFs distribution along whole CDS. Coverage is determined by measuring the proportion of in-frame CDS positions with >= 1 reads. If coverage is about 1, the whole CDS is covered by active ribosomes.

cvg <- frameCounts(pc.sub, coverageRate=TRUE)
ORFscore <- getORFscore(pc.sub)
#plot(cvg[names(ORFscore)], ORFscore,
#     xlab="coverage ORF1", ylab="ORF score",
#     type="p", pch=16, cex=.5, xlim=c(0, 1))

2 Bad case

Here we show ribosome footprinting data that is poor quality data and should not bed used for downstream analyses.

bamfilename <- system.file("extdata", "RPF.chr1.bad.bam",
                           package="ribosomeProfilingQC")
yieldSize <- 10000000
bamfile <- BamFile(bamfilename, yieldSize = yieldSize)
pc <- getPsiteCoordinates(bamfile, bestpsite = 13)
pc.sub <- pc[pc$qwidth %in% c(27, 28, 29)]
## here will show most of the reads mapped to antisense strand
## which indicate that there may have some issue in mapping step
strandPlot(pc.sub, CDS)

## here will show most of the reads mapped to inter-genic region 
pc.sub <- readsDistribution(pc.sub, txdb, las=2)

## If we assign wrong P site postion
pc <- getPsiteCoordinates(bamfile, 12)
pc.sub <- pc[pc$qwidth %in% c(27, 28, 29)]
pc.sub <- assignReadingFrame(pc.sub, CDS)
plotDistance2Codon(pc.sub)

plotFrameDensity(pc.sub)

3 Prepare for downstream analysis

3.1 RPFs only

3.1.1 Count for RPFs

Downstream analysis including differential analysis, comparison with RNAseq, and so on. Function frameCounts will generate a count vector for each transcript or gene, which can be used for differential analysis. countReads can be used for count multiple files of ribo-seq.

library(ribosomeProfilingQC)
library(AnnotationDbi)
path <- system.file("extdata", package="ribosomeProfilingQC")
RPFs <- dir(path, "RPF.*?\\.[12].bam$", full.names=TRUE)
gtf <- file.path(path, "Danio_rerio.GRCz10.91.chr1.gtf.gz")
cnts <- countReads(RPFs, gtf=gtf, level="gene",
                   bestpsite=13, readsLen=c(28,29))

To get GTF file, you can download it from ensembl or get the online file info via AnnotationHub.

BiocManager::install("AnnotationHub")
library(AnnotationHub)
ah = AnnotationHub()
## for human hg38
hg38 <- query(ah, c("Ensembl", "GRCh38", "gtf"))
hg38 <- hg38[length(hg38)]
gtf <- mcols(hg38)$sourceurl
## for mouse mm10
mm10 <- query(ah, c("Ensembl", "GRCm38", "gtf"))
mm10 <- mm10[length(mm10)]
gtf <- mcols(mm10)$sourceurl

3.1.2 Differentail analysis only for RPFs

library(edgeR)  ## install edgeR by BiocManager::install("edgeR")
gp <- c("KD", "KD", "CTL", "CTL")
y <- DGEList(counts = cnts$RPFs, group = gp)
y <- calcNormFactors(y)
design <- model.matrix(~0+gp)
colnames(design) <- sub("gp", "", colnames(design))
y <- estimateDisp(y, design)
## To perform quasi-likelihood F-tests:
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit)
topTags(qlf, n=3)
## Coefficient:  KD 
##                        logFC   logCPM        F       PValue          FDR
## ENSDARG00000103054 -11.16762 8.682141 86767.21 6.128820e-16 2.261534e-13
## ENSDARG00000074275 -10.96103 8.689056 63046.43 1.931310e-15 3.563267e-13
## ENSDARG00000043247 -11.66550 8.621404 54103.55 3.346689e-15 4.032526e-13
## To perform likelihood ratio tests:
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt, n=3)
## Coefficient:  KD 
##                        logFC   logCPM        LR PValue FDR
## ENSDARG00000027355 -18.73631 9.551459 12085.481      0   0
## ENSDARG00000053222 -14.92366 8.172169  6682.324      0   0
## ENSDARG00000037748 -14.36672 7.796084  1603.511      0   0

3.1.3 Alternative splicing, translation initiation and polyadenylation

coverage <- coverageDepth(RPFs[grepl("KD1|WT", RPFs)], 
                          gtf=txdb, 
                          level="gene",
                          region="feature with extension")
group1 <- c("RPF.KD1.1", "RPF.KD1.2")
group2 <- c("RPF.WT.1", "RPF.WT.2")
## subset the data
coverage <- lapply(coverage, function(.ele){# do not run this step for real data
  .ele$coverage <- lapply(.ele$coverage, `[`, i=seq.int(50))
  .ele$granges <- .ele$granges[seq.int(50)]
  .ele
})
se <- spliceEvent(coverage, group1, group2)
table(se$type)
## 
## aSE 
## 115
plotSpliceEvent(se, se$feature[1], coverage, group1, group2)

3.2 RPFs and RNA-seq

3.2.1 By counts

3.2.1.1 Count for RPFs and RNA-seq

The countReads function can be used to count multiple files of ribo-seq and RNA-seq data.

path <- system.file("extdata", package="ribosomeProfilingQC")
RPFs <- dir(path, "RPF.*?\\.[12].bam$", full.names=TRUE)
RNAs <- dir(path, "mRNA.*?\\.[12].bam$", full.names=TRUE)
gtf <- file.path(path, "Danio_rerio.GRCz10.91.chr1.gtf.gz")
## make sure that the order or RPFs is corresponded to RNAs.
cnts <- countReads(RPFs, RNAs, gtf, level="tx")

3.2.1.2 Translational Efficiency (TE)

The absolute level of ribosome occupancy is strongly correlated with RNA levels for both coding and noncoding transcripts. Translational efficiency is introduced8 to show the correlation. TE is the ratio of normalized ribosome footprint abundance to mRNA density. A common normalization method is using Fragments Per Kilobase of transcript per Million mapped reads (FPKM).

fpkm <- getFPKM(cnts)
TE <- translationalEfficiency(fpkm)

3.2.1.3 Differentail analysis for TE

We suppose that the log2 transformed translational efficiency that we calculated by the ratios of RPFs to mRNAs has a linear correlation with real translational efficiency. We then use the limma package to test the differential translational efficiency.

library(limma)
gp <- c("KD", "KD", "CTL", "CTL")
TE.log2 <- log2(TE$TE + 1)
#plot(TE.log2[, 1], TE.log2[, 3], 
#     xlab=colnames(TE.log2)[1], ylab=colnames(TE.log2)[3],
#     main="Translational Efficiency", pch=16, cex=.5)
design <- model.matrix(~0+gp)
colnames(design) <- sub("gp", "", colnames(design))
fit <- lmFit(TE.log2, design)
fit2 <- eBayes(fit)
topTable(fit2, number=3)
##                         CTL KD  AveExpr        F      P.Value   adj.P.Val
## ENSDART00000128557 8.858792  1 4.929396 5022.263 1.556921e-05 0.003796782
## ENSDART00000170832 8.649233  1 4.824617 4790.420 1.659151e-05 0.003796782
## ENSDART00000167632 8.449658  1 4.724829 4574.781 1.765232e-05 0.003796782

3.2.2 By coverage

3.2.2.1 Maximum N-mer translational efficiency

If we plot the correlation mRNAs or RPFs levels to translational efficiency
calculated by all counts within a transcript, we will find that TE is not well normalized. It shows a higher value in lowly expressed transcripts and a low value in highly expressed transcripts.

plotTE(TE, sample=2, xaxis="mRNA", log2=TRUE, pch=16, cex=.5)

#plotTE(TE, sample=2, xaxis="RPFs", log2=TRUE, pch=16, cex=.5)

This issue can be fixed by calculating the maximum value (TE max) in the most highly ribosome-occupied 90 nt window within a feature8. Please note that the normalization method for TE max is not FPKM any more.

cvgs <- coverageDepth(RPFs, RNAs, txdb)
TE90 <- translationalEfficiency(cvgs, window = 90, normByLibSize=TRUE)
plotTE(TE90, sample=2, xaxis="mRNA", log2=TRUE, pch=16, cex=.5)

#plotTE(TE90, sample=2, xaxis="RPFs", log2=TRUE, pch=16, cex=.5)

Above examples are TE90 for CDS region. Following codes show how to calculate TE90 for 3’UTR regions.

cvgs.utr3 <- coverageDepth(RPFs, RNAs, txdb, region="utr3")
TE90.utr3 <- translationalEfficiency(cvgs.utr3, window = 90)
#plotTE(TE90.utr3, sample=2, xaxis="mRNA", log2=TRUE, pch=16, cex=.5)
#plotTE(TE90.utr3, sample=2, xaxis="RPFs", log2=TRUE, pch=16, cex=.5)

3.2.2.2 Ribosome Release Score (RRS)

RRS is calculated as the ratio of RPFs (normalized by RNA-seq reads) in the CDS to RPFs in the 3’UTR. Because it is hard to define the CDS region for non-coding RNAs, RRS of non-coding RNAs can not be calculated by Function ribosomeReleaseScore.

RRS <- ribosomeReleaseScore(TE90, TE90.utr3, log2 = TRUE)
#plot(RRS[, 1], RRS[, 3],
#     xlab="log2 transformed RRS of KD1", 
#     ylab="log2 transformed RRS of WT1")
#plot(RRS[, 1], log2(TE90$TE[rownames(RRS), 1]),
#     xlab="log2 transformed RSS of KD1", 
#     ylab="log2 transformed TE of KD1")

3.2.2.3 Metagene analysis plot

Plot metagene coverage for CDS, 5’UTR and 3’UTR.

cvgs.utr5 <- coverageDepth(RPFs, RNAs, txdb, region="utr5")
#metaPlot(cvgs.utr5, cvgs, cvgs.utr3, sample=2, xaxis = "RPFs")
metaPlot(cvgs.utr5, cvgs, cvgs.utr3, sample=2, xaxis = "mRNA")

4 Fragment Length Organization Similarity Score (FLOSS)1

FLOSS can be used to compare the distribution of reads length to a background such as a cluster of genes. The gene cluster could be extracted from gtf/gff files downloaded from ensembl.

## documentation: https://useast.ensembl.org/Help/Faq?id=468
gtf <- import("Danio_rerio.GRCz10.91.gtf.gz")

The gtf files can be also download via AnnotationHub

BiocManager::install("AnnotationHub")
library(AnnotationHub)
ah = AnnotationHub()
## for human hg38
hg38 <- query(ah, c("Ensembl", "GRCh38", "gtf"))
hg38 <- hg38[[length(hg38)]]
## for mouse mm10
mm10 <- query(ah, c("Ensembl", "GRCm38", "gtf"))
mm10 <- mm10[[length(mm10)]]
## because the gene ids in TxDb.Mmusculus.UCSC.mm10.knownGene and
## TxDb.Hsapiens.UCSC.hg38.knownGene
## are entriz_id, the gene_id in mm10 or hg38 need to changed to entriz_id.
library(ChIPpeakAnno)
library(org.Mm.eg.db)
mm10$gene_id <- ChIPpeakAnno::xget(mm10$gene_id, org.Mm.egENSEMBL2EG)
library(org.Hg.eg.db)
hg38$gene_id <- ChIPpeakAnno::xget(hg38$gene_id, org.Mm.egENSEMBL2EG)
gtf <- gtf[!is.na(gtf$gene_id)]
gtf <- gtf[gtf$gene_id!=""]
## protein coding
protein <- 
  gtf$gene_id[gtf$transcript_biotype %in% 
                  c("IG_C_gene", "IG_D_gene", "IG_J_gene", "IG_LV_gene", 
                    "IG_M_gene", "IG_V_gene", "IG_Z_gene", 
                    "nonsense_mediated_decay", "nontranslating_CDS", 
                    "non_stop_decay", 
                    "protein_coding", "TR_C_gene", "TR_D_gene", "TR_gene", 
                    "TR_J_gene", "TR_V_gene")]
## mitochondrial genes
mito <- gtf$gene_id[grepl("^mt\\-", gtf$gene_name) | 
                        gtf$transcript_biotype %in% c("Mt_tRNA", "Mt_rRNA")]
## long noncoding
lincRNA <- 
  gtf$gene_id[gtf$transcript_biotype %in% 
                  c("3prime_overlapping_ncrna", "lincRNA", 
                    "ncrna_host", "non_coding")]
## short noncoding
sncRNA <- 
  gtf$gene_id[gtf$transcript_biotype %in% 
                  c("miRNA", "miRNA_pseudogene", "misc_RNA", 
                    "misc_RNA_pseudogene", "Mt_rRNA", "Mt_tRNA", 
                    "Mt_tRNA_pseudogene", "ncRNA", "pre_miRNA", 
                    "RNase_MRP_RNA", "RNase_P_RNA", "rRNA", "rRNA_pseudogene", 
                    "scRNA_pseudogene", "snlRNA", "snoRNA", 
                    "snRNA_pseudogene", "SRP_RNA", "tmRNA", "tRNA",
                    "tRNA_pseudogene", "ribozyme", "scaRNA", "sRNA")]
## pseudogene
pseudogene <- 
  gtf$gene_id[gtf$transcript_biotype %in% 
                  c("disrupted_domain", "IG_C_pseudogene", "IG_J_pseudogene", 
                    "IG_pseudogene", "IG_V_pseudogene", "processed_pseudogene", 
                    "pseudogene", "transcribed_processed_pseudogene",
                    "transcribed_unprocessed_pseudogene", 
                    "translated_processed_pseudogene", 
                    "translated_unprocessed_pseudogene", "TR_J_pseudogene", 
                    "TR_V_pseudogene", "unitary_pseudogene", 
                    "unprocessed_pseudogene")]
danrer10.annotations <- list(protein=unique(protein), 
                             mito=unique(mito),
                             lincRNA=unique(lincRNA),
                             sncRNA=unique(sncRNA),
                             pseudogene=unique(pseudogene))
library(ribosomeProfilingQC)
library(GenomicFeatures)
## prepare CDS annotation
txdb <- makeTxDbFromGFF(system.file("extdata",
                                    "Danio_rerio.GRCz10.91.chr1.gtf.gz",
                                    package="ribosomeProfilingQC"),
                        organism = "Danio rerio",
                        chrominfo = seqinfo(Drerio)["chr1"],
                        taxonomyId = 7955)
CDS <- prepareCDS(txdb)

library(Rsamtools)
## input the bamFile from the ribosomeProfilingQC package 
bamfilename <- system.file("extdata", "RPF.WT.1.bam",
                           package="ribosomeProfilingQC")
## For your own data, please set bamfilename as your file path.
yieldSize <- 10000000
bamfile <- BamFile(bamfilename, yieldSize = yieldSize)

pc <- getPsiteCoordinates(bamfile, bestpsite = 13)
readsLengths <- 20:34
fl <- FLOSS(pc, ref = danrer10.annotations$protein, 
            CDS = CDS, readLengths=readsLengths, level="gene", draw = FALSE)
fl.max <- t(fl[c(1, which.max(fl$cooks.distance)), as.character(readsLengths)])
matplot(fl.max, type = "l", x=readsLengths, 
        xlab="Fragment Length", ylab="Fraction of Reads", 
        col = c("gray", "red"), lwd = 2, lty = 1)
legend("topright",  legend = c("ref", "selected gene"), 
       col = c("gray", "red"), lwd = 2, lty = 1, cex=.5)

References

1. Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell reports 8, 1365–1379 (2014).

2. Bazzini, A. A. et al. Identification of small orfs in vertebrates using ribosome footprinting and evolutionary conservation. The EMBO journal 33, 981–993 (2014).

3. Mohammad, F., Green, R. & Buskirk, A. R. A systematically-revised ribosome profiling method for bacteria reveals pauses at single-codon resolution. Elife 8, e42591 (2019).

4. Chung, B. Y. et al. The use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for ribo-seq data analysis. Rna 21, 1731–1745 (2015).

5. Popa, A. et al. RiboProfiling: A bioconductor package for standard ribo-seq pipeline processing. F1000Research 5, (2016).

6. Tjeldnes, H. An atlas of the human uORFome and its regulation across tissues. (The University of Bergen, 2018).

7. Lauria, F. et al. RiboWaltz: Optimization of ribosome p-site positioning in ribosome profiling data. PLoS computational biology 14, e1006169 (2018).

8. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. science 324, 218–223 (2009).