Nana Performance Summary

P.J.Maker

This document contains some measurements for the space and time costs for the nana library. Data provided includes:

These test results were generated using:

The following table contains a summary of the results:

Code Size Time Options
assert(i >= 2); 29 1ns -O
TRAD_assert(i >= 2); 63 1ns -O
I(i >= 2); 29 1ns -O
DI(i >= 2); 10 12.3us -O
I(A(int i=0, i!=10, i++, a[i]>=0)); 41 6ns -O
d = now(); 11 31ns -O
printf("helloworld\n"); 10 90ns -O
L("helloworld\n"); 27 86ns -O
DL("helloworld\n"); 10 2.9us -O

Note:

Note that measurement code depends on GNU CC extensions and is not a thing of great beauty.

How was is it measured?

See Makefile.am and measure.sh for the true story, a quick summary would be:

The variables and code fragments used defined in prelude.c and postlude.c. All variables are declared volatile to prevent the compile optimising access to variables.

In addition all programs are compiled with the following options:

Detailed results

This section contains some more detailed results.

Assert

Code Size Time Options
assert(i >= 10); 28 1ns -O0
assert(i >= 10); 29 1ns -O1
assert(i >= 10); 13 0ns -O3
BSD_assert(i >= 10); 28 1ns -O0
BSD_assert(i >= 10); 4 0ns -O1
BSD_assert(i >= 10); 4 0ns -O3
TRAD_assert(i >= 10); 59 1ns -O0
TRAD_assert(i >= 10); 63 1ns -O1
TRAD_assert(i >= 10); 13 0ns -O3
I(i >= 10); 28 1ns -O0
I(i >= 10); 29 1ns -O1
I(i >= 10); 13 0ns -O3
DI(i >= 10); 10 13.3us -O0
DI(i >= 10); 10 13.3us -O1
DI(i >= 10); 10 13.1us -O3

Quantifiers

Code Size Time Options
I(A(char *p = str, *p != '\0', p++, islower(*p))); 130 58ns -O0
I(A(char *p = str, *p != '\0', p++, islower(*p))); 50 12ns -O1
I(A(char *p = str, *p != '\0', p++, islower(*p))); 69 10ns -O3
I(A(int i = 0, i < 10, i++, E1(int j = 0, j < 10, j++, a[i] == a[j]))); 174 416ns -O0
I(A(int i = 0, i < 10, i++, E1(int j = 0, j < 10, j++, a[i] == a[j]))); 84 140ns -O1
I(A(int i = 0, i < 10, i++, E1(int j = 0, j < 10, j++, a[i] == a[j]))); 220 63ns -O3

Log

Code Size Time Options
printf("helloworld\n"); 10 91ns -O0
printf("helloworld\n"); 10 90ns -O1
printf("helloworld\n"); -2737 93ns -O3
L("helloworld\n"); 30 90ns -O0
L("helloworld\n"); 27 92ns -O1
L("helloworld\n"); -7089 93ns -O3
DL("helloworld\n"); 10 2.9us -O0
DL("helloworld\n"); 10 2.9us -O1
DL("helloworld\n"); 10 2.9us -O3
gi = 0; LG(gi & 0x10, "helloworld\n"); 53 1ns -O0
gi = 0; LG(gi & 0x10, "helloworld\n"); 47 1ns -O1
gi = 0; LG(gi & 0x10, "helloworld\n"); 24 0ns -O3
gi = ~0; LG(gi & 0x10, "helloworld\n"); 53 86ns -O0
gi = ~0; LG(gi & 0x10, "helloworld\n"); 47 95ns -O1
gi = ~0; LG(gi & 0x10, "helloworld\n"); 24 90ns -O3
LHP(fprintf,log,"helloworld\n"); 27 27ns -O0
LHP(fprintf,log,"helloworld\n"); 23 33ns -O1
LHP(fprintf,log,"helloworld\n"); -6065 26ns -O3
LHP(L_buffer_printf,buf,"helloworld\n"); 22 79ns -O0
LHP(L_buffer_printf,buf,"helloworld\n"); 18 84ns -O1
LHP(L_buffer_printf,buf,"helloworld\n"); -4014 71ns -O3
LHP(syslog,LOG_USER,"helloworld\n"); 20 3.8us -O0
LHP(syslog,LOG_USER,"helloworld\n"); 25 3.4us -O1
LHP(syslog,LOG_USER,"helloworld\n"); -5809 5.3us -O3

Nop

Code Size Time Options
asm(""); 0 0ns -O0
asm(""); 0 0ns -O1
asm(""); 0 0ns -O3
asm("nop"); 1 0ns -O0
asm("nop"); 1 0ns -O1
asm("nop"); 1 0ns -O3
asm("nop;nop;"); 2 0ns -O0
asm("nop;nop;"); 2 0ns -O1
asm("nop;nop;"); 2 0ns -O3
asm("nop;nop;nop;"); 3 0ns -O0
asm("nop;nop;nop;"); 3 0ns -O1
asm("nop;nop;nop;"); 3 0ns -O3
asm("nop;nop;nop;nop;"); 4 0ns -O0
asm("nop;nop;nop;nop;"); 4 0ns -O1
asm("nop;nop;nop;nop;"); 4 0ns -O3
asm("nop;nop;nop;nop;nop;"); 5 0ns -O0
asm("nop;nop;nop;nop;nop;"); 5 0ns -O1
asm("nop;nop;nop;nop;nop;"); 5 0ns -O3

C Operations

Code Size Time Options
i = 4; 7 0ns -O0
i = 4; 8 0ns -O1
i = 4; 8 0ns -O3
gi = 11; 10 0ns -O0
gi = 11; 10 0ns -O1
gi = 11; 10 0ns -O3
f = 12.0; 9 0ns -O0
f = 12.0; 14 0ns -O1
f = 12.0; 8 0ns -O3
gf = 12.0; 12 0ns -O0
gf = 12.0; 16 0ns -O1
gf = 12.0; 10 0ns -O3
i++; 9 2ns -O0
i++; 11 2ns -O1
i++; 11 3ns -O3
gi++; 15 2ns -O0
gi++; 15 2ns -O1
gi++; 15 2ns -O3
j = a[i]; 15 0ns -O0
j = a[i]; 17 1ns -O1
j = a[i]; 16 0ns -O3

Data cache testing

These are are just some tests using a large array which should hopefully exceed the size of the D-cache on your machine.

Code Size Time Options
I(A(int i=0, i < 1*1024, i++, za[i] == 0)); 102 3.3us -DNT=16
I(A(int i=0, i < 2*1024, i++, za[i] == 0)); 102 5.0us -DNT=16
I(A(int i=0, i < 4*1024, i++, za[i] == 0)); 102 9.8us -DNT=16
I(A(int i=0, i < 8*1024, i++, za[i] == 0)); 102 20.4us -DNT=16
I(A(int i=0, i < 16*1024, i++, za[i] == 0)); 102 39.4us -DNT=16
I(A(int i=0, i < 32*1024, i++, za[i] == 0)); 102 77.4us -DNT=16
I(A(int i=0, i < 64*1024, i++, za[i] == 0)); 102 155.0us -DNT=16
I(A(int i=0, i < 128*1024, i++, za[i] == 0)); 102 311.5us -DNT=16

Code

This section contains a listing of all the generated code fragments.

Conclusion

Finally, if you have used this package on an interesting (or uninteresting) architecture please mail me a copy of the results for the nana home page.