read.madata {maanova}R Documentation

Read Micro Array data from TAB delimited simple text file

Description

This is the function to read MicroArray experiment data from a TAB delimited simple text file.

Usage

read.madata(datafile, designfile="design.txt", header=TRUE, spotflag=TRUE,
            metarow, metacol, row, col, pmt, ...)

Arguments

datafile The data file name with path name as a string.
designfile The design file name with path as a string.
header A logical value indicating whether the data file contains the column headers.
spotflag A flag to indicate whether the input file contain the flag for bad spot or not.
metarow The column number for meta row. Default values are 1s.
metacol The column number for meta column. Default values are 1s.
row The column number for row. Default value is NA.
col The column number for column. Default value is NA.
pmt The start column number for pmt data.
... Other gene information in the data file.

Value

An object of class rawdata, which is a list of following components:

n.array Number of arrays in the experiment.
n.dye Number of dyes.
data Two channel experiment data.
flag A matrix for spot flag. Each element corresponding to one spot. 0 means normal spot, all other values mean bad spot.
metarow Meta row for each spot.
metacol Meta column for each spot.
row Row for each spot.
col Column for each spot.
ArrayName A list of strings to represent the names of the pmt data. There are two names per array.
design An object to represent the experimental design.
Others Other experiment information listed in the data file and specified by user.

Preparing data file

Before using the package, user need to prepare the input data file. The data file is a TAB delimited text file. In this file, each row corresponding to a gene. In the columns, you can put some gene specific information, e.g., the Clone ID, Gene Bank ID, etc. and the grid location of the spot. But most importantly you need to put the pmt data after that. Most of the MicroArray gridding softwares generate one file for each slide. At this point, you need to manually combine them into the data file. You need to decide which data you want to use in analysis, e.g., mean versus median, backgroud subtracted or not, etc. For N-dye array, your pmt data should have N columns for each array. These N columns need to be adjacent to each other. You can put the spot flag as a column after pmt data for each array. (Note that if you have flag, you will have N+1 columns data for each array.) If you have replicates, replicated measurements of the same clone on the same array should appear in adjacent rows.

For example, for a 2-dye cDNA array, you have four slides scanned by Gene Pix and you get four files. First you open your favorite Spreed Sheet editor, e.g., MS Excel. Copy your clone ID and Cluster ID to the first 2 columns. Then open one of the files generated by Gene Pix, copy the grid location into next 4 columns (you only need to do this once because they are all the same for four slides). Then for all four files, copy the two columns of foreground median value (if you want to use it) and one column of flag to the file in the order of Cy5, Cy3, flag. Then select the whole file and row sort it according to Clone ID. Save the file as tab delimited text file and you are done.

The data file must be "full", that is, all rows have to have same number of fields. Sometimes leading and trailing TAB in the text file will bring problems, depends on the operating system. So user need to be careful about that.

Preparing design file

Design file is another TAB delimited text file. Number of rows of this file equals number of arrays times N (the number of dyes). Number of columns of this file depends on the experimental design. For example, you can have "Strain", "Diet", "Sex", etc. in your design file. You *MUST* have a column named "Sample" (case sensitive) in the design file. It should be integers to represent the biological individuals. Reference samples should have Sample number to be zero(0). Reference sample will always be treated as fixed factor in mixed model and it will not be involved in any test. You also must have "Array" and "Dye" columns in the design file. You must NOT have "Spot" and "Label" columns. They are reserved for spotting and labelling effects.

Note that you don't have to *USE* all factors in design file. In making the model object in makeModel, the experimental design will be determined by the design and a formula. You can put all factors in design file but turn them on/off in formula.

Read pre-transformed data

You can use other softwares to do data transformation and read in the pre-transformed data into R/maanova. If that is the case, you should skip the log2 transformation in createData by setting log.trans=FALSE. And you should do riplot and arrayview on the result of createData. Because riplot and arrayview assume the data in rawdata object is on raw scale.

Author(s)

Hao Wu

Examples

# note that data files are not distributed with the package,
# read in a file with spot flag
## Not run: 
kidney.raw <- read.madata("kidney.txt", designfile="kidneydesign.txt", 
        metarow=1, metacol=2, col=3, row=4, Name=5, ID=6,
        pmt=7, spotflag=TRUE)
## End(Not run)
# read in a file without spot flag
## Not run: 
rawdata <- read.madata("paigen.txt", designfile="design.txt",cloneid=1, 
        metarow=2, metacol=3, row=4, col=5, pmt=6, spotflag=FALSE)
## End(Not run)

[Package maanova version 1.2.1 Index]