nsFilter {genefilter}R Documentation

Non-Specific-ly Filter an ExpressionSet

Description

This function removes unwanted probe sets from an ExpressionSet without using phenotype variables in the filtering process. Hence the filter is non-specific with respect to the phenotypes in the data.

Usage

nsFilter(eset, require.entrez = TRUE, require.symbol = TRUE,
require.GOBP = FALSE, require.GOCC = FALSE, require.GOMF = FALSE,
remove.dupEntrez = TRUE, var.func = IQR, var.cutoff = 0.5, var.filter = TRUE)

Arguments

eset an ExpressionSet object
require.entrez If TRUE, require that all probe sets have an Entrez Gene ID annotation. Probe sets without such an annotation will be filtered out.
require.symbol If TRUE, require that all probe sets have a gene symbol annotation. Probe sets without such an annotation will be filtered out.
require.GOBP If TRUE, require that all probe sets have an annotation to at least one GO ID in the BP ontology. Probe sets without such an annotation will be filtered out.
require.GOCC If TRUE, require that all probe sets have an annotation to at least one GO ID in the CC ontology. Probe sets without such an annotation will be filtered out.
require.GOMF If TRUE, require that all probe sets have an annotation to at least one GO ID in the MF ontology. Probe sets without such an annotation will be filtered out.
remove.dupEntrez If TRUE and there are multiple probe sets mapping to the same Entrez Gene ID, then the probe set with the largest value of var.func will be retained and the others removed.
var.func a function that will be used to assess the variance of a probe set across all samples. This function should return a numeric vector of length one when given a numeric vector as input. Probe sets with a var.func value less than var.cutoff will be removed. The default is IQR.
var.cutoff a numeric value to use in filtering out probe sets with small variance across samples. See the var.func argument and the details section below.
var.filter a logical indicating whether or not to perform variance based filtering. The default is TRUE.

Details

A first step in many microarray analysis procedures is to carry out non-specific filtering. The goal is to remove uninteresting probe sets without regard to the phenotype data and reduce the number of probe sets that will be included in further analysis.

Annotation Based Filtering Arguments require.entrez, require.symbol, require.GOBP, require.GOCC, and require.GOMF turn on a filter based on available annotation data. The annotation package is determined by calling annotation(eset).

Variance Based Filtering The var.func and var.cutoff arguments control the variance based filtering. The intention is to remove probe sets with little variation across samples. The default var.func is IQR and was selected because it is robust to outliers. The deafult var.cutoff is 0.5 and is motivated by the common case where the platform is a genome-wide expression array and the rule of thumb that in any given tissue only 40% of genes are expressed.

Value

A list consisting of:

eset the filtered ExpressionSet
filter.log a list giving details of how many probe sets where removed for each filtering step performed.

Author(s)

Seth Falcon

Examples

library("hgu95av2")
data(sample.ExpressionSet)
ans <- nsFilter(sample.ExpressionSet)
ans$eset
ans$filter.log

[Package genefilter version 1.14.1 Index]