\documentclass[12pt]{article}
\textwidth= 6.5in \textheight= 9.0in \topmargin = -20pt
\evensidemargin=0pt \oddsidemargin=0pt \headsep=25pt
\parskip=10pt
\font\smallit=cmti10 \font\smalltt=cmtt10 \font\smallrm=cmr9

\def\snn{\hbox{\font\dubl=msbm10 scaled 1000 {\dubl N}}}
\def\nn{\hbox{\font\dubl=msbm10 scaled 1200 {\dubl N}}}
\def\zz{\hbox{\font\dubl=msbm10 scaled 1200 {\dubl Z}}}
\def\qq{\hbox{\font\dubl=msbm10 scaled 1200 {\dubl Q}}}
\def\rr{\hbox{\font\dubl=msbm10 scaled 1200 {\dubl R}}}


\begin{document}
\vspace*{-40pt} 
\centerline{\smalltt INTEGERS: 
 \smallrm ELECTRONIC JOURNAL OF COMBINATORIAL NUMBER THEORY \smalltt 6 
(2006), \#A04} 
\vskip 40pt 
\begin{center}
{\bf SIMPSON'S PARADOX IN THE FAREY SEQUENCE} \vskip 20pt
{\bf Rasa \v Sle\v zevi\v cien\. e-Steuding}\\
{\smallit Departamento de Matem\'aticas, Universidad Aut\'onoma de
Madrid, C. Universitaria de Cantoblanco, 28\,049 Madrid, Spain}\\
{\tt rasa.steuding@uam.es}\\ \vskip 10pt {\bf J\"orn Steuding}\\
{\smallit Departamento de Matem\'aticas, Universidad Aut\'onoma de
Madrid, C. Universitaria de Cantoblanco, 28\,049 Madrid, Spain}\\
{\tt jorn.steuding@uam.es}\\
\end{center}
\vskip 30pt \centerline{\smallit Received: 6/6/05,
Revised: 12/2/05, Accepted: 2/4/06,
Published: 2/14/06} \vskip 30pt

\centerline{\bf Abstract}

\noindent We investigate the appearance of Simpson's paradox in
the Farey sequence of reduced fractions in the unit interval.

\pagestyle{myheadings} \markright{\smalltt INTEGERS: \smallrm
ELECTRONIC JOURNAL OF COMBINATORIAL NUMBER THEORY \smalltt 6
(2006), \#A04\hfill}

\thispagestyle{empty} \baselineskip=15pt \vskip 30pt


\section*{\normalsize 1. Introduction and statement of results}

In statistics it frequently occurs that the data seems to
contradict our intuition. For instance, Cohen \& Nagel \cite{ce}
cited actual death rates from tuberculosis in Richmond (Virginia)
and New York from 1910 that verified the following propositions:
\begin{itemize}
\item the death rate for African Americans was lower in Richmond
than in New York, \item the death rate for Caucasians was lower in
Richmond than in New York, \item the death rate for the total
combined population of African Americans and Caucasians was higher
in Richmond than in New York.
\end{itemize}
{\it How can it be?} We may illustrate this with another numerical
example, namely
$$
{3\over 5}<{8\over 13}\qquad\mbox{and}\qquad{7\over 10}<{5\over
7},\qquad \mbox{but}\qquad {3+7\over 5+10}={2\over 3}>{13\over
20}={8+5\over 13+7}.
$$
This phenomenon is called Simpson's paradox after E.H. Simpson
\cite{simps} who published in 1951 an influential paper on this
topic. Of course, Simpson's paradox is not a paradox but just a
simple fact about fractions. Nevertheless, it has a variety of
surprising applications arising from links between proportions,
probabilities, and their representations as fractions. A short
historical overview on Simpson's paradox can be found in Mittal
\cite{mittal}.
\par\medskip

Usually, statistical data is given as some subset of the set of
rational numbers $\qq$. In this note we investigate the appearance
of Simpson's paradox within $\qq$. For an enumeration of $\qq$ we
shall use the Farey sequence of reduced fractions in the unit
interval.
\par\medskip

For $n\in\nn$ the Farey sequence ${\mathcal F}_n$ is the ordered
list of all reduced fractions in the unit interval having
denominators less than or equal to $n$, i.e.,
$$
{\mathcal F}_n:=\left\{{a\over b}\in\qq\,:\,0\leq a\leq b\leq n,\,
{\sf gcd}(a,b)=1\right\},
$$
where, as usual, ${\sf gcd}(a,b)$ denotes the greatest common
divisor of the integers $a$ and $b$. Consecutive Farey fractions
${a\over b}<{c\over d}$ satisfy $bc-ad=1$, resp.
$$ {c\over d}-{a\over b}={1\over bd}.$$
Thus, ${\mathcal F}_n$ is not equidistantly distributed; this
property makes Farey fractions useful tools in the theory of
Diophantine approximations (see the classical paper of Ford
\cite{ford}). The Farey sequence can be build from ${\mathcal
F}_1$ by taking mediants of ${0\over 1}$ and ${1\over 1}$. For
${a\over b},{c\over d}\in{\mathcal F}_n$ their mediant is defined
by ${a+c\over b+d}$. If ${a\over b}\leq{c\over d}$, then it is
easily seen that
$$
{a\over b}\leq{a+c\over b+d}\leq{c\over d};
$$
equality holds if and only if ${a\over b}={c\over d}$. For
consecutive elements ${a\over b},{c\over d}\in{\mathcal F}_n$,
their mediant is an element of ${\mathcal F}_{b+d}$. The limit of
the Farey sequence, $\cup_{n\in\snn}{\mathcal F}_n$, consists
exactly of the reduced fractions in the interval $[0,1]$.
\par

It is a natural question to ask {\it how often} Simpson's paradox
occurs in the Farey sequence. So we are interested in the system
of diophantine inequalities
\begin{equation}\label{farey}
0\leq {a\over b}<{A\over B},\quad 0\leq {c\over d}<{C\over
D},\qquad\mbox{and}\qquad {a+c\over b+d}>{A+C\over B+D}
\end{equation}
for given integers $A,B,C,D$. It is easily seen that a certain
asymmetry is necessary for this phenomenon to happen. Assume that
$BC=AD$, resp. ${A\over B}={C\over D}$. Then, obviously, the
mediant ${A+C\over B+D}$ coincides with both, which shows that
${a\over b}$ and ${c\over d}$ both are less than ${A+C\over B+D}$.
Further, the case of $A=0$ is uninteresting. Therefore, in the
sequel we may assume w.l.o.g. that $0<{A\over B}<{C\over D}\leq
1$.
\par

First of all, we note that there are infinitely many examples for
Simpson's paradox everywhere in the Farey sequence. As a matter of
fact, it is elementary to check that for any positive integer $n$
with ${\sf gcd}(n,B)=1$ and ${\sf gcd}(n,D)=1$, and satisfying the
inequalities:

$$
0<{a\over b}={A\over B}-{1\over n}<{A\over
B}\qquad\mbox{and}\qquad 0<{c\over d}={C\over D}-{1\over
n^2}<{C\over D},
$$
then
$$
{a+c\over b+d}>{A+B\over C+D}\qquad\mbox{for}\quad n>{1\over 2}
\left(1+\sqrt{1+4{(B+D)^2\over BC-AD}}\right).
$$

The last example gives a slight indication as to whether or not
Simpson's paradox is a rare event. We shall prove that, given two
fractions $0<{A\over B}< {C\over D}$ in the Farey sequence
$\mathcal{F}_n$, pairs $({a\over b},{c\over d})\in\mathcal{F}^2_n$
satisfying (\ref{farey}) occur with positive probability (in the
sense of a Laplace experiment), as $n\to\infty$.
\par\bigskip

\noindent {\bf Theorem.} {\it For two fractions $0<{A\over
B}<{C\over D}\leq 1$ in the Farey sequence ${\mathcal F}_n$, we
have}
$$
\lim_{n\to\infty}{ \sharp\left\{\left({a\over b},{c\over
d}\right)\in{\mathcal F}_n^2\,:\, {a\over b}<{A\over B}, {c\over
d}<{C\over D}\quad\mbox{and}\quad {a+c\over b+d}>{A+C\over
B+D}\right\} \over\sharp\left\{\left({a\over b},{c\over
d}\right)\in{\mathcal F}_n^2\,:\, {a\over b}<{A\over B}, {c\over
d}<{C\over D}\right\}}=\delta>0
$$
{\it with}
\begin{eqnarray*}
\delta:=\delta\left({A\over B},{C\over D}\right)&:=&{1\over
9}{BD\over AC}\Bigg\{{1\over 2}\Delta^2\Big(D^2
\Psi(1)-2BD\Psi(0)+B^2\Psi(-1)\Big)+\\
&&\qquad\qquad+{A\over
B}\Big(B\Delta\Upsilon(0)-\left(D\Delta+{1\over 2} {A\over
B}\right)\Upsilon(1)\Big)\Bigg\},
\end{eqnarray*}
{\it where}
\begin{eqnarray}
\Upsilon(\ell)&:=&{36\over
4-\ell^2}\min\left\{1,\nabla^{-2-\ell}\right\}-{9 \over
2-\ell}\min\left\{\nabla^{2-\ell},
\nabla^{-2-\ell}\right\},\label{gringo1}\\
\Psi(\ell)&:=&{9\over
2-\ell}\Big(\min\left\{\nabla^{2-\ell},\nabla^{-2-\ell}\right\}-
\min\left\{\left({D\over B}\right)^{2-\ell},\left({D\over
B}\right)^{-2-\ell}\right\}\Big)+\label{gringo2}\\
&&+{36\over 4-\ell^2} \Big(\min\left\{1,\left({D\over
B}\right)^{-2-\ell}\right\}
-\min\left\{1,\nabla^{-2-\ell}\right\}\Big),\nonumber
\end{eqnarray}
\begin{equation}
\Delta:={BC-AD\over BD(B+D)}\qquad\mbox{and}\qquad
\nabla:={D(A+C)\over BC-AD}.\label{joder}
\end{equation}
\par\bigskip

\noindent The expression $\delta$ looks rather complicated on
first sight but it simply is a rational function of $A,B,C,D$.
\par

Note that there also exist examples with equality between the
mediants. For instance,
$$
{1\over 5}<{2\over 5},\quad {14\over 25}<{3\over 5}, \qquad
\mbox{and}\qquad {1+14\over 5+25}={1\over 2}={2+3\over 5+5}.
$$
As the proof of the theorem will show the set of such examples has
zero-density.

\vskip 30pt

\section*{\normalsize 2. Proof of the theorem}


There is an interesting geometrical interpretation of the Farey
sequence. The Farey fractions ${a\over b}\in{\mathcal F}_n$
correspond to the points with integer coordinates $(a,b)$ situated
in the triangle given by $\{(x,y)\in\rr^2\,:\,0\leq x,y,x+y\geq
n\}$, which {\it can be seen from the origin}, i.e., for which $a$
and $b$ are coprime. As we shall see below, for the proof of the
theorem we have to count lattice points under this and further
conditions.
\par

We start with asymptotic formula for the number of Farey fractions
${a\over b}\in{\mathcal F}_n$ under a given magnitude. Let
$\xi\in(0,1]$ be fixed, then
$$
\sharp\left\{{a\over b}\in{\mathcal F}_n\,:\,{a\over b} \leq
\xi\right\}=1+\sum_{1\leq b\leq n}\sum_{1\leq a\leq b\xi\atop {\sf
gcd}(a,b)=1}1 =\Sigma(n;\xi),
$$
say. Following the argument which gives an asymptotic formula for
the cardinality of ${\mathcal F}_n$ (see \cite{hardy}), we find
\begin{equation}\label{sigm}
\Sigma(n;\xi)={3\xi\over\pi^2}n^2+O(n\log n),
\end{equation}
where the implicit constant in the error term is absolute. Thus
\begin{eqnarray}\label{all}
\sharp\left\{\left({a\over b},{c\over d}\right)\in{\mathcal
F}_n^2\,:\, {a\over b}<{A\over B}, {c\over
d}<{C\over D}\right\}&=&\Sigma\left(n;{A\over B}\right)\Sigma\left(n;{C\over D}\right)\nonumber\\
&=&{9\over\pi^4}{AC\over BD}n^4+O(n^3\log n).
\end{eqnarray}
In order to estimate the proportion in question we have to do a
similar computation under the additional restriction (\ref{farey})
for the related mediants. Let
$$
\sharp\left\{\left({a\over b},{c\over d}\right)\in{\mathcal
F}_n^2\,:\, {a\over b}<{A\over B}, {c\over d}<{C\over
D}\quad\mbox{and}\quad {a+c\over b+d}>{A+C\over B+D}\right\}=
\Sigma(n)+\Sigma_0(n),
$$
where $\Sigma_0(n)$ counts the number of those tuples $({a\over
b},{c\over d})$ for which $ac=0$. It follows from (\ref{sigm})
that $\Sigma_0(n)\ll n^2$ and so their contribution is negligible.
We have
\begin{equation}\label{abel}
\Sigma(n)=\sum_{1\leq b\leq n}\sum_{1\leq d\leq n}\sum_{1\leq
a<b{A\over B}, {\sf gcd}(a,b)=1}\sum_{1\leq c<d{C\over D}, {\sf
gcd}(c,d)=1\atop a+c>(b+d){A+C\over B+D}}1.
\end{equation}
Denote by $\mu(n)$ the M\"obius $\mu$-function, i.e., $\mu(1)=1$,
$\mu(n)=(-1)^\nu$ if $n$ is the product of $\nu$ distinct primes,
and $\mu(n)=0$ otherwise (if $n$ has some square divisor). In view
of the well-known formula
$$
\sum_{d\vert m}\mu(d)=\left\{\begin{array}{c@{\quad}c}1 &
\mbox{if} \quad m=1, \\ 0 & \mbox{otherwise}, \end{array}\right.
$$
we can rewrite (\ref{abel}) as
\begin{eqnarray*}
\Sigma(n)&=&\sum_{1\leq b\leq n}\sum_{1\leq d\leq n}\sum_{1\leq
a<b{A\over B}}\sum_{1\leq c<d{C\over D}\atop a+c>(b+d){A+C\over
B+D}}\sum_{\alpha\vert {\sf gcd}(a,b)}\mu(\alpha)
\sum_{\gamma\vert {\sf gcd}(c,d)}\mu(\gamma)\\
&=&\sum_{1\leq b\leq n}\sum_{1\leq d\leq n}\sum_{\alpha\vert
b}\mu(\alpha) \sum_{\gamma\vert d}\mu(\gamma)\sum_{1\leq
a<b{A\over B}\atop \alpha\vert a} \sum_{1\leq c<d{C\over D}\atop
\gamma\vert c, a+c>(b+d){A+C\over B+D}}1.
\end{eqnarray*}
First we shall consider the inner two sums, which count the
lattice points $(x,y)\in\zz^2$ with $x\equiv 0\bmod\,\alpha,
y\equiv 0 \bmod\,\gamma$ lying above the straight line
$x+y=(b+d){A+C\over B+D}$ inside the rectangle $1\leq x<b{A\over
B},1\leq y<d{C\over D}$. Since these are exactly the lattice
points of the sublattice $\alpha\zz\times\gamma\zz$ in the convex
region
$$
{\mathcal R}(b,d):=\left\{(x,y)\in\rr^2:1\leq x<b{A\over B},1\leq
y<d{C\over D}, x+y>(b+d){A+C\over B+D}\right\},
$$
the number of these lattice points equals asymptotically the
volume of ${\mathcal R}(b,d)$ divided by the volume of a
fundamental parallelepiped. It is not difficult to see that the
error term is of the order of the boundary (see Lemma 2.1.1 in
Huxley \cite{hux}). Thus,
\begin{equation}\label{uno}
\sum_{1\leq a<b{A\over B}\atop \alpha\vert a} \sum_{1\leq
c<d{C\over D}\atop\gamma\vert c, a+c>(b+d){A+C\over B+D}}1
={{\sf{vol}}({\mathcal R}(b,d))\over
\alpha\gamma}+O\left({{\sf{length}}({\mathcal R}(b,d))\over
\alpha+\gamma}\right).
\end{equation}
Note that the volume of ${\mathcal R}(b,d)$ does not depend on
$\alpha$ and $\gamma$. In the next step we shall compute this
volume.
\par

By simple geometric arguments it follows that the region
${\mathcal R}(b,d)$ is
\begin{itemize}
\item a single point or empty if and only if
$$
b{A\over B}+d{C\over D}\leq (b+d){A+C\over B+D};
$$
\item a triangle of volume
$$
{1\over 2}\Delta^2(dB-bD)^2
$$
(where $\Delta$ is defined by (\ref{joder})) if and only if
$$
d{C\over D}\leq(b+d){A+C\over B+D}<b{A\over B}+d{C\over D};
$$
\item a trapezoid of volume
$$
{A\over B}\left(bdB\Delta-b^2\left(D\Delta+{1\over 2}{A\over
B}\right)\right)
$$
if and only if
$$
(b+d){A+C\over B+D}<d{C\over D}.
$$
\end{itemize}
Other cases of intersections of a half-plane with a rectangle
cannot occur; to see this note that $b<b+d$ and ${A\over
B}<{A+B\over C+D}$, and thus
$$
b{A\over B}<(b+d){A+C\over B+D}.
$$
Let us remark that all of the above listed cases occur for any
given $0<{A\over B}<{C\over D}$ as $n\to\infty$. In fact, this
will imply, as we shall see below, the existence and the
positivity of the proportion $\delta=\delta({A\over B},{C\over
D})$.
\par

The contribution of the main term from (\ref{uno}) to $\Sigma(n)$
may be written as
\begin{equation}\label{labe}
\Sigma(n)=\Sigma_{\Delta}(n)+\Sigma_{\diamondsuit}(n)+{\sf
error}(n),
\end{equation}
where $\Sigma_\Delta(n)$ and $\Sigma_\diamondsuit(n)$ are the sums
of the terms with ${\mathcal R}(b,d)$ triangle or trapezoid, i.e.,
\begin{eqnarray*}
\Sigma_\Delta(n)&:=&\sum_{1\leq b\leq n}\sum_{1\leq d\leq n\atop
d{C\over D}\leq (b+d){A+C\over B+D}<b{A\over B}+d{C\over
D}}\sum_{\alpha\vert b}{\mu(\alpha)\over \alpha}\sum_{\gamma\vert
d}{\mu(\gamma)\over\gamma}\cdot{1\over 2}\Delta^2(bD-dB)^2,\\
\Sigma_\diamondsuit(n)&:=&\sum_{1\leq b\leq n}\sum_{1\leq d\leq
n\atop (b+d){A+C\over B+D}<d{C\over D}}\sum_{\alpha\vert
b}{\mu(\alpha)\over \alpha}\sum_{\gamma\vert
d}{\mu(\gamma)\over\gamma}\cdot{A\over B}\left(bdB\Delta-b^2
\left(D\Delta+{1\over 2}{A\over B}\right)\right),
\end{eqnarray*}
and ${\sf error}(n)$ is the error term with respect to
(\ref{uno}). We may rewrite the condition of summation for the
triangle and for the trapezoid by
$$
b{D\over B}<d\leq b{D(A+C)\over
BC-AD}=b\nabla\qquad\mbox{and}\qquad b\nabla=b{D(A+C)\over
BC-AD}<d,
$$
respectively. Recall that Euler's $\phi$-function $\phi(b)$ is for
$b\in\nn$ defined as the number of positive integers $a\leq b$
that are coprime to $b$. It satisfies the identity
$$
\phi(b)=b\sum_{d\vert b}{\mu(d)\over d}.
$$
Taking this into account, we find
\begin{eqnarray}
\Sigma_\Delta(n)&=&{1\over 2}\Delta^2\sum_{1\leq b\leq
n}{\phi(b)\over b}\sum_{b{D\over B}<d\leq \min\{n,b\nabla\}}
{\phi(d)\over d}(bD-dB)^2,\label{una}\\
\Sigma_\diamondsuit(n)&=&{A\over B}\sum_{1\leq b\leq
n}{\phi(b)\over b}\sum_{b\nabla<d\leq n}{\phi(d)\over
d}\left(bdB\Delta-b^2 \left(D\Delta+{1\over 2}{A\over
B}\right)\right)\label{dos}.
\end{eqnarray}
In order to evaluate these double sums, we have to compute the
asymptotics for certain sums involving Euler's $\phi$-function.
\par

It is well-known that
$$
\sum_{1\leq b\leq n}\phi(b)={3\over \pi^2}n^2+O(n\log n).
$$
By partial summation, it follows that
$$
\sum_{N\leq k\leq
M}\phi(k)k^\ell={6\over(2+\ell)\pi^2}\left(M^{2+\ell}-N^{2+\ell}\right)
+O\left(M^{\ell+1}\log M\right)
$$
for $\ell\geq -1$. Then, for $\ell\in\{0,\pm 1\}$, we have
\begin{eqnarray*}
\lefteqn{\sum_{1\leq b\leq n}\phi(b)b^\ell \sum_{xb<d\leq n}
\phi(d)d^{-\ell}}\\
&=&{6\over (2-\ell)\pi^2}\Bigg(n^{2-\ell}\sum_{1\leq b\leq
\min\{n,n/x\}}\phi(b)b^\ell-x^{2-\ell}\sum_{1\leq b\leq
\min\{n,n/x\}}\phi(b)b^2\Bigg)+\\
&&+O\Big(n^{1-\ell}\log n\sum_{1\leq b\leq n}\phi(b)b^\ell\Big)\\
&=&{n^4\over \pi^4}\Upsilon(x;\ell)+O\Big(n^3(\log n)^2\Big),
\end{eqnarray*}
where
$$
\Upsilon(x;\ell):={36\over 4-\ell^2}\min\{1,x^{-2-\ell}\}- {9\over
2-\ell}\min\{x^{2-\ell},x^{-2-\ell}\}.
$$
Similarly,
\begin{eqnarray*}
\lefteqn{\sum_{1\leq b\leq n}\phi(b)b^\ell \sum_{1\leq d\leq
n\atop xb<d\leq yb} \phi(d)d^{-\ell}}\\
&=&\sum_{1\leq b\leq \min\{n,n/y\}}\phi(b)b^\ell \sum_{xb<d\leq
yb} \phi(d)d^{-\ell}+\\
&&+\sum_{\min\{n,n/y\}< b\leq \min\{n,n/x\}}\phi(b)b^\ell
\sum_{xb<d\leq n} \phi(d)d^{-\ell}\\
&=&{6(y^{2-\ell}-x^{2-\ell})\over (2-\ell)\pi^2}\sum_{1\leq b\leq
\min\{n,n/y\}}\phi(b)b^2+\\
&&+{6\over(2-\ell)\pi^2}\Bigg(n^{2-\ell}\sum_{\min\{n,n/y\}<b\leq
\min\{n,n/x\}}\phi(b)b^\ell+\\
&&-x^{2-\ell}\sum_{\min\{n,n/y\}<b\leq
\min\{n,n/x\}}\phi(b)b^2\Bigg)+O\Big(n^{1-\ell}\log n\sum_{1\leq
b\leq
n}\phi(b)b^\ell\Big)\\
&=& {n^4\over \pi^4}\Psi(x,y;\ell)+O\Big(n^3(\log n)^2\Big),
\end{eqnarray*}
where
\begin{eqnarray*}
\Psi(x,y;\ell)&:=&{9\over
2-\ell}\Big(\min\{y^{2-\ell},y^{-2-\ell}\}-
\min\{x^{2-\ell},x^{-2-\ell}\}\Big)+\nonumber\\
&&+{36\over
4-\ell^2}\Big(\min\{1,x^{-2-\ell}\}-\min\{1,y^{-2-\ell}\}\Big),
\end{eqnarray*}
valid also for $\ell\in\{0,\pm 1\}$. Using this in (\ref{una}) and
(\ref{dos}), it yields
\begin{eqnarray*}
\Sigma_\Delta(n)&=&{n^4\over \pi^4}{1\over 2}\Delta^2\Big(D^2
\Psi(1)-2BD\Psi(0)+B^2\Psi(-1)\Big)+O\Big(n^3(\log n)^2\Big),\\
\Sigma_\diamondsuit(n)&=&{n^4\over \pi^4}{A\over
B}\Big(B\Delta\Upsilon(0)-\left(D\Delta+{1\over 2} {A\over
B}\right)\Upsilon(1)\Big)+O\Big(n^3(\log n)^2\Big),
\end{eqnarray*}
where $\Upsilon(\ell)=\Upsilon(\nabla;\ell)$ is given by
(\ref{gringo1}) and $\Psi(\ell)=\Psi\Big({D\over
B},\nabla;\ell\Big)$ by (\ref{gringo2}). For the error term in
(\ref{uno}) we note that in all cases
$$
{\sf{length}}({\mathcal R}(b,d))\ll b{A\over B}+d{C\over D}
$$
and this leads to the estimate ${\sf error}(n)\ll n^3$ in
(\ref{labe}). Hence, we finally obtain
\begin{eqnarray*}
\Sigma(n)&=&{n^4\over \pi^4}\Bigg\{{1\over 2}\Delta^2\Big(D^2
\Psi(1)-2BD\Psi(0)+B^2\Psi(-1)\Big)+\\
&&\qquad+{A\over B}\Big(B\Delta\Upsilon(0)-\left(D\Delta+{1\over
2} {A\over B}\right)\Upsilon(1)\Big)\Bigg\}+\\
&&+O\Big(n^3(\log n)^2\Big).
\end{eqnarray*}
Dividing this by the quantity in (\ref{all}), we get the required
proportion, and conclude the proof of the theorem.

\vskip 30pt

\section*{\normalsize 3. Concluding remarks}


The rather complicated expression for $\delta=\delta({A\over
B},{C\over D})$ was checked by computer experiments. It is
interesting to compare the values of $\delta$ for different data
$A,B,C,D$. For instance,
\begin{eqnarray*}
\delta\left({1\over 2},{1\over 1}\right)=&\displaystyle{11\over 216}&=0.05092\ldots\ ,\\
\delta\left({1\over 3},{2\over 3}\right)=&\displaystyle{1\over 72}&=0.01388\ldots\ ,\\
\delta\left({1\over 7},{1\over 2}\right)=&\displaystyle{1597\over
6615}&=0.24142\ldots\, .
\end{eqnarray*}
In each of these three cases, computing the corresponding
proportions $\Sigma(n)/(\Sigma(n;{A\over B})\Sigma(n;{C\over D}))$
for $n=200$, we have obtained values that differ from the
corresponding proportion $\delta$ by less than $7\cdot 10^{-4}$.
\par

It is also interesting to ask for a {\it global} proportion, i.e.,
when $0<{A\over B}<{C\over D}\leq 1$ are chosen randomly. However,
with respect to this question our approach seems to be rather
technical. We have computed the corresponding quantity
$$
\lambda(n):={\sharp\left\{\left({a\over b},{A\over B}, {c\over d},
{C\over D}\right)\in{\mathcal F}_n^4\,:\, {a\over b}<{A\over B},
{c\over d}<{C\over D}\quad\mbox{and}\quad {a+c\over b+d}>{A+C\over
B+D}\right\} \over\sharp\left\{\left({a\over b},{A\over B},
{c\over d}, {C\over D}\right)\in{\mathcal F}_n^4\,:\, {a\over
b}<{A\over B}, {c\over d}<{C\over D}\right\}}
$$
for several values of $n$ and obtained the following values:
\par\medskip
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline $n$          & 10 & 20 & 30 \\
\hline $\lambda(n)$ & 0.03755\ldots & 0.03750\ldots &
0.03847\ldots \\
\hline
\end{tabular}
\end{center}
\par\medskip
We conjecture that the limit of $\lambda(n)$ for $n\to\infty$
exists and is positive, probably around the value $0.04$; however,
we did not succeed in proving that and leave it as an open
problem.

\vskip 30pt

\section*{\normalsize Acknowledgments} The authors are grateful to the
anonymous referee for his or her valuable remarks and corrections to a
first version of this article.

\vskip 30pt

\begin{thebibliography}{99}

\bibitem{ce} {\sc M.R. Cohen, E. Nagel}, {\it An Introduction to Logic and
Scientific Method}, New York: Harcourt, Brace and Co 1934

\bibitem{ford}{\sc L.R. Ford}, Fractions, {\it Amer. Math. Monthly} {\bf
45} (1938), 586-601

\bibitem{hardy}{\sc G.H. Hardy, E.M. Wright}, {\it An introduction to the theory of
numbers}, Oxford University Press 1979, 5th ed.

\bibitem{hux}{\sc M.N. Huxley}, {\it Area, lattice points, and exponential sums},
The Clarendon Press, Oxford University Press, New York 1996

\bibitem{mittal}{\sc Y. Mittal}, Homogeneity of subpopulations and
Simpson's paradox, {\it J. Amer. Stat. Assoc.} {\bf 86}, No. 413
(1991), 167-172

\bibitem{simps} {\sc E.H. Simpson}, The interpretation of interaction in
contingency tables, {\it Journal of the Royal Statistical Society,
Series B} {\bf 13} (1951), 238-241

\end{thebibliography}

\end{document}
