Previous Page Next Page Contents

stats::tabulate -- statistics of duplicate rows

Introduction

stats::tabulate(s) eliminates duplicate rows in the sample s and appends a column containing the multiplicities.

stats::tabulate(s, c1, c2, ..., f) combines all rows that are identical except for entries in the specified columns c1, c2 etc. The function f is applied to these columns, its result replaces the values in these columns.

stats::tabulate(s, [c1, f1], [c2, f2], ...) combines all rows that are identical except for entries in the columns c1, c2 etc. The functions f1, f2 etc. are applied to these columns, the results replace the values in these columns.

Call(s)

stats::tabulate(s)
stats::tabulate(s, c1, c2... <, f>)
stats::tabulate(s, c1..c2, c3..c4... <, f>)
stats::tabulate(s, [c1, f1], [c2, f2]...)
stats::tabulate(s, [c1, c2..., f1], [c3, c4..., f2]...)

Parameters

s - a sample of domain type stats::sample
c1, c2, ... - integers representing column indices of the sample s
f, f1, f2, ... - procedures

Returns

a sample of domain type stats::sample.

Related Functions

stats::calc

Details

Example 1

We create a sample:

>> s := stats::sample([[a, A, 1], [a, A, 1], [a, A, 2],
                       [b, B, 5], [b, B, 10]])
   
                                a  A   1
                                a  A   1
                                a  A   2
                                b  B   5
                                b  B  10
      

Duplicate rows of the sample are counted. There are four unique rows, one occurring twice:

>> stats::tabulate(s)
   
                               a  A   1  2
                               a  A   2  1
                               b  B   5  1
                               b  B  10  1
      

In the following call rows are regarded as duplicates, if the entries in the first two columns coincide. We compute the mean value of the third entry of the duplicates:

>> stats::tabulate(s, 3, stats::mean)
   
                               a  A   4/3
                               b  B  15/2
      

We compute both the mean and the standard deviation of the data in the third column for the sub-samples labeled 'a A' and 'b B' by the first two columns:

>> stats::tabulate(s, [3, stats::mean], [3, stats::stdev])
   
                         a  A   4/3  1/3*2^(1/2)
                         b  B  15/2          5/2
      
>> delete s:
   

Example 2

We create a sample containing columns for ``gender'', ``age'' and ``size'':

>> s := stats::sample([["f", 25, 166], ["m", 30, 180], 
                       ["f", 54, 160], ["m", 40, 170],
                       ["f", 34, 170], ["m", 20, 172]])
   
                              "f"  25  166
                              "m"  30  180
                              "f"  54  160
                              "m"  40  170
                              "f"  34  170
                              "m"  20  172
      

We use stats::mean on the second and third column to calculate the average ``age'' and ``size'' of each gender:

>> stats::tabulate(s, 2..3, float@stats::mean)
   
                      "f"  37.66666667  165.3333333
                      "m"         30.0        174.0
      

With the next call both the mean and the standard deviation of ``age'' and ``size'' for each gender are inserted into the sample.

>> stats::tabulate(s, 
     [2, float@stats::mean], [2, float@stats::stdev],
     [3, float@stats::mean], [3, float@stats::stdev])
         "f"  37.66666667  12.11977264  165.3333333  4.109609335
         "m"         30.0  8.164965809        174.0  4.320493799
      

We compute the Bravais-Pearson correlation coefficient between ``age'' and ``size'' for each gender:

>> stats::tabulate(s, [2, 3, float@stats::BPCorr])
                           "f"  -0.7540135992
                           "m"  -0.1889822365
>> delete s:

Example 3

We create a sample:

>> s := stats::sample([[a, x1, 1, 2], [b, x2, 2, 4], 
                       [b, x1, 2, 4], [e, x2, 3, 5.5]])
                              a  x1  1    2
                              b  x2  2    4
                              b  x1  2    4
                              e  x2  3  5.5

We regard rows with the same entry in the second column as ``of the same kind''. We tabulate the sample using different functions on the remaining columns:

>> stats::tabulate(s, [1, _plus], [3, _mult], [4, stats::mean])
                           a + b  x1  2     3
                           b + e  x2  6  4.75

One can apply customized procedures. In the following we define the procedure plusmult, which sums up the elements of two lists (representing columns) and then multiplies the sums.

>> plusmult := proc(x, y) begin _plus(op(x))*_plus(op(y)) end_proc:

This procedure is then used to combine the first and the third column. Simultaneously, the mean and the standard deviation of the fourth column is inserted into the sample.

>> stats::tabulate(s, [1, 3, plusmult], [4, stats::mean],
                   [4, stats::stdev])
                        3*a + 3*b  x1     3     1
                        5*b + 5*e  x2  4.75  0.75
>> delete plusmult, s:

Changes




Do you have questions or comments?


Copyright © SciFace Software GmbH & Co. KG 2000