read.madata {maanova} | R Documentation |
This is the function to read MicroArray experiment data from a TAB delimited simple text file.
read.madata(datafile, designfile="design.txt", header=TRUE, spotflag=TRUE, metarow, metacol, row, col, pmt, ...)
datafile |
The data file name with path name as a string. |
designfile |
The design file name with path as a string. |
header |
A logical value indicating whether the data file contains the column headers. |
spotflag |
A flag to indicate whether the input file contain the flag for bad spot or not. |
metarow |
The column number for meta row. Default values are 1s. |
metacol |
The column number for meta column. Default values are 1s. |
row |
The column number for row. Default value is NA. |
col |
The column number for column. Default value is NA. |
pmt |
The start column number for pmt data. |
... |
Other gene information in the data file. |
An object of class rawdata
, which is a list of following
components:
n.array |
Number of arrays in the experiment. |
n.dye |
Number of dyes. |
data |
Two channel experiment data. |
flag |
A matrix for spot flag. Each element corresponding to one spot. 0 means normal spot, all other values mean bad spot. |
metarow |
Meta row for each spot. |
metacol |
Meta column for each spot. |
row |
Row for each spot. |
col |
Column for each spot. |
ArrayName |
A list of strings to represent the names of the pmt data. There are two names per array. |
design |
An object to represent the experimental design. |
Others |
Other experiment information listed in the data file and specified by user. |
Before using the package, user need to prepare the input data file. The data file is a TAB delimited text file. In this file, each row corresponding to a gene. In the columns, you can put some gene specific information, e.g., the Clone ID, Gene Bank ID, etc. and the grid location of the spot. But most importantly you need to put the pmt data after that. Most of the MicroArray gridding softwares generate one file for each slide. At this point, you need to manually combine them into the data file. You need to decide which data you want to use in analysis, e.g., mean versus median, backgroud subtracted or not, etc. For N-dye array, your pmt data should have N columns for each array. These N columns need to be adjacent to each other. You can put the spot flag as a column after pmt data for each array. (Note that if you have flag, you will have N+1 columns data for each array.) If you have replicates, replicated measurements of the same clone on the same array should appear in adjacent rows.
For example, for a 2-dye cDNA array, you have four slides scanned by Gene Pix and you get four files. First you open your favorite Spreed Sheet editor, e.g., MS Excel. Copy your clone ID and Cluster ID to the first 2 columns. Then open one of the files generated by Gene Pix, copy the grid location into next 4 columns (you only need to do this once because they are all the same for four slides). Then for all four files, copy the two columns of foreground median value (if you want to use it) and one column of flag to the file in the order of Cy5, Cy3, flag. Then select the whole file and row sort it according to Clone ID. Save the file as tab delimited text file and you are done.
The data file must be "full", that is, all rows have to have same number of fields. Sometimes leading and trailing TAB in the text file will bring problems, depends on the operating system. So user need to be careful about that.
Design file is another TAB delimited text file. Number of rows of this file equals number of arrays times N (the number of dyes). Number of columns of this file depends on the experimental design. For example, you can have "Strain", "Diet", "Sex", etc. in your design file. You *MUST* have a column named "Sample" (case sensitive) in the design file. It should be integers to represent the biological individuals. Reference samples should have Sample number to be zero(0). Reference sample will always be treated as fixed factor in mixed model and it will not be involved in any test. You also must have "Array" and "Dye" columns in the design file. You must NOT have "Spot" and "Label" columns. They are reserved for spotting and labelling effects.
Note that you don't have to *USE* all factors in design file. In
making the model object in makeModel
, the
experimental design will be determined by the design and a formula.
You can put all factors in design file but turn them on/off in
formula.
You can use other softwares to do data transformation and read in the
pre-transformed data into R/maanova. If that is the case, you should
skip the log2 transformation in createData
by
setting log.trans=FALSE. And you should do
riplot
and arrayview
on the result of createData
. Because riplot
and
arrayview
assume the data in rawdata
object is on raw
scale.
Hao Wu
# note that data files are not distributed with the package, # read in a file with spot flag ## Not run: kidney.raw <- read.madata("kidney.txt", designfile="kidneydesign.txt", metarow=1, metacol=2, col=3, row=4, Name=5, ID=6, pmt=7, spotflag=TRUE) ## End(Not run) # read in a file without spot flag ## Not run: rawdata <- read.madata("paigen.txt", designfile="design.txt",cloneid=1, metarow=2, metacol=3, row=4, col=5, pmt=6, spotflag=FALSE) ## End(Not run)