reduceDataFrame {QFeatures} | R Documentation |
DataFrame
A long dataframe can be reduced by mergeing certain rows into a
single one. These new variables are constructed as a SimpleList
containing all the original values. Invariant columns, i.e columns
that have the same value along all the rows that need to be
merged, can be shrunk into a new variables containing that
invariant value (rather than in list columns). The grouping of
rows, i.e. the rows that need to be shrunk together as one, is
defined by a vector.
The opposite operation is expand. But note that for a
DataFrame
to be expanded back, it must not to be simplified.
reduceDataFrame(x, k, count = FALSE, simplify = TRUE, drop = FALSE) expandDataFrame(x, k = NULL)
x |
The |
k |
A ‘vector’ of length |
count |
|
simplify |
A |
drop |
A |
An expanded (reduced) DataFrame
.
Missing values do have an important effect on reduce
. Unless all
values to be reduces are missing, they will result in an
non-invariant column, and will be dropped with drop = TRUE
. See
the example below.
The presence of missing values can have side effects in higher
level functions that rely on reduction of DataFrame
objects.
Laurent Gatto
library("IRanges") k <- sample(100, 1e3, replace = TRUE) df <- DataFrame(k = k, x = round(rnorm(length(k)), 2), y = seq_len(length(k)), z = sample(LETTERS, length(k), replace = TRUE), ir = IRanges(seq_along(k), width = 10), r = Rle(sample(5, length(k), replace = TRUE)), invar = k + 1) df ## Shinks the DataFrame df2 <- reduceDataFrame(df, df$k) df2 ## With a tally of the number of members in each group reduceDataFrame(df, df$k, count = TRUE) ## Much faster, but more crowded result df3 <- reduceDataFrame(df, df$k, simplify = FALSE) df3 ## Drop all non-invariant columns reduceDataFrame(df, df$k, drop = TRUE) ## Missing values d <- DataFrame(k = rep(1:3, each = 3), x = letters[1:9], y = rep(letters[1:3], each = 3), y2 = rep(letters[1:3], each = 3)) d ## y is invariant and can be simplified reduceDataFrame(d, d$k) ## y isn't not dropped reduceDataFrame(d, d$k, drop = TRUE) ## BUT with a missing value d[1, "y"] <- NA d ## y isn't invariant/simplified anymore reduceDataFrame(d, d$k) ## y now gets dropped reduceDataFrame(d, d$k, drop = TRUE)