glad {GLAD}R Documentation

Analysis of array CGH data

Description

This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.

Usage

glad.profileCGH(profileCGH, mediancenter=FALSE,
                smoothfunc="lawsglad", bandwidth=10, round=2,
                model="Gaussian", lkern="Exponential", qlambda=0.999,
                base=FALSE, sigma,
                lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                type="tricubic", param=c(d=6),
                alpha=0.001, msize=5,
                method="centroid", nmax=8,
                verbose=FALSE, ...)

Arguments

profileCGH Object of class profileCGH
mediancenter If TRUE, LogRatio are center on their median.
smoothfunc Type of algorithm used to smooth LogRatio by a piecewise constant function. Choose either lawsglad, aws or laws.
bandwidth Set the maximal bandwidth hmax in the aws or laws function. For example, if bandwidth=10 then the hmax value is set to 10*X_N where X_N is the position of the last clone.
round The smoothing results are rounded or not depending on the round argument. The round value is passed to the argument digits of the round function.
model Determines the distribution type of the LogRatio. Keep always the model as "Gaussian" (see aws or laws).
lkern Determines the location kernel to be used (see aws or laws).
qlambda Determines the scale parameter for the stochastic penalty (see aws or laws)
base If TRUE, the position of clone is the physical position onto the chromosome, otherwise the rank position is used.
sigma Value to be passed to either argument sigma2 of aws function or shape of laws. If NULL, sigma is calculated from the data.
lambdabreak Penalty term (λ') used during the Optimization of the number of breakpoints step.
lambdacluster Penalty term (λ*) used during the MSHR clustering by chromosome step.
lambdaclusterGen Penalty term (λ*) used during the HCSR clustering throughout the genome step.
type Type of kernel function used in the penalty term during the Optimization of the number of breakpoints step, the MSHR clustering by chromosome step and the HCSR clustering throughout the genome step.
param Parameter of kernel used in the penalty term.
alpha Risk alpha used for the Outlier detection step.
msize The outliers MAD are calculated on regions with a cardinality greater or equal to msize.
method The agglomeration method to be used during the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps.
nmax Maximum number of clusters (N*max) allowed during the the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps.
verbose If TRUE some information are printed
...

Details

The function glad implements the methodology which is described in the article : Analysis of array CGH data: from signal ratio to gain and loss of DNA regions (Hupé et al., Bioinformatics 2004 20(18):3413-3422).

The principle of the GLAD algorithm: First, the detection of breakpoints is based on the estimation of a piecewise constant function with the Adaptive Weights Smoothing (AWS) procedure (Polzehl and Spokoiny, 2002). Thus, a procedure based on penalyzed maximum likelihood optimizes the number of breakpoints allows the undesirable breakpoints to be removed. Finally, based on the regions previously identified, a two-step unsupervised classification (MSHR clustering by chromosome and the HCSR clustering throughout the genome) with model selection criteria allows a status to be assigned for each region (gain, loss or normal).

Main parameters to be tuned:
qlambda if you want the smoothing to fit some very local effect, choose a smaller qlambda.
bandwidth choose a bandwidth not to small otherwise you will have a lot of little discontinuities.
lambdabreak More the parameter is high more the number of undesirable breakpoints is high.
lambdacluster More the parameter is high more the regions within a chromosome are supposed to belong to the same cluster.
lambdaclusterGen More the parameter is high more the regions over the whole genome are supposed to belong to the same cluster.

Value

An object of class "profileCGH" with the following attributes:
profileValues: a data.frame with the following added information:

    Smoothing
    The smoothing values correspond to the median of each MSHR (i.e. Region).

    Breakpoints
    The last position of a region with identical amount of DNA is flagged by 1 otherwise it is 0. Note that during the "Optimization of the number of breakpoints" step, removed breakpoints are flagged by -1.

    Region
    Each position between two breakpoints are labelled the same way with an integer value starting from one. The label is incremented by one when a new breakpoints occurs or when moving to the next chromosome. The variable region is what we call MSHR.

    Level
    Each position with equal smoothing value are labelled the same way with an integer value starting from one. The label is incremented by one when a new level occurs or when moving to the next chromosome.

    OutliersAws
    Each AWS outliers are flagged by -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0.

    OutliersMad
    Each MAD outliers are flagged by -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0.

    OutliersTot
    OutliersAws + OutliersMad.

    ZoneChr
    Clusters identified after MSHR (i.e. Region) clustering by chromosome.

    ZoneGen
    Clusters identified after HCSR clustering throughout the genome.

    ZoneGNL
    Status of each clone : Gain is coded by 1, Loss by -1 and Normal by 0.

BkpInfo: the data.frame attribute BkpInfo which gives the list of breakpoints:
    PosOrder
    The rank position of each clone on the genome.
    PosBase
    The base position of each clone on the genome.
    Chromosome
    Chromosome name.
SigmaC: the data.frame attribute SigmaC gives the estimation of the LogRatio standard-deviation for each chromosome:
    Chromosome
    Chromosome name.
    Value
    The estimation is based on the Inter Quartile Range.

Note

People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.

Author(s)

Philippe Hupé, glad@curie.fr.

See Also

profileCGH, as.profileCGH, plotProfile.

Examples


data(snijders)

### Creation of "profileCGH" object
profileCGH <- as.profileCGH(gm13330)


###########################################################
###
###  glad function as described in Hupé et al. (2004)
###
###########################################################

res <- glad(profileCGH, mediancenter=FALSE,
                smoothfunc="lawsglad", bandwidth=10, round=2,
                model="Gaussian", lkern="Exponential", qlambda=0.999,
                base=FALSE,
                lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                type="tricubic", param=c(d=6),
                alpha=0.001, msize=5,
                method="centroid", nmax=8,
                verbose=FALSE)

### Genomic profile on the whole genome
plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing")

###Genomic profile for chromosome 1
plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1, Smoothing="Smoothing")

### The standard-deviation of LogRatio are:
res$SigmaC

### The list of breakpoints is:
res$BkpInfo


[Package GLAD version 1.0.4 Index]