TCP
Section: Linux Programmer's Manual (7)
Updated: 25 Apr 1999
NAME
tcp - TCP protocol.
SYNOPSIS
#include <sys/socket.h>
#include <netinet/in.h>
tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
DESCRIPTION
This is an implementation of the TCP protocol
defined in RFC793, RFC1122
and RFC2001 with the NewReno and SACK extensions.
It provides a reliable, stream oriented, full duplex connection between
two sockets on top of
ip(7).
TCP guarantees that the data arrives in order and retransmits lost
packets. It generates and checks a per packet checksum to catch
transmission errors. TCP does not preserve record boundaries.
A fresh TCP socket has no remote or local address and is not fully specified.
To create an outgoing TCP connection use
connect(2)
to establish a connection to another TCP socket.
To receive new incoming connections
bind(2)
the socket first to a local address and port and then call
listen(2)
to put the socket into listening state. After that a new
socket for each incoming connection can be accepted
using
accept(2).
A socket which has had
accept
or
connect
successfully called on it is fully specified and may transmit data.
Data cannot be transmitted on listening or not yet connected sockets.
Linux 2.2 supports the RFC1323 TCP high performance extensions.
This includes large TCP windows to support links with high latency
or bandwidth.
In order to make use of them, the send and receive buffer sizes must be
increased. They can be be set globally with the
net.core.wmem_default
and
net.core.rmem_default
sysctls, or on individual sockets by using the
SO_SNDBUF
and
SO_RCVBUF
socket options. The maximum sizes for socket buffers are limited by the global
net.core.rmem_max
and
net.core.wmem_max
sysctls. See
socket(7)
for more information.
TCP supports urgent data. Urgent data is used to signal the receiver
that some important message is part of the data stream and that it should
be processed as soon as possible.
To send urgent data specify the
MSG_OOB
option to
send(2).
When urgent data is received, the kernel sends a
SIGURG
signal to the
reading process or the process or process group that has been set for
the socket using the
FIOCSPGRP
or
FIOCSETOWN
ioctls. When the
SO_OOBINLINE
socket option is enabled, urgent data is put into the normal data stream
(and can be tested for by the
SIOCATMARK
ioctl),
otherwise it can be only received when the
MSG_OOB
flag is set for
sendmsg(2).
ADDRESS FORMATS
TCP is built on top of IP (see
ip(7)).
The address formats defined by
ip(7)
apply to TCP. TCP only supports
point-to-point communication; broadcasting and multicasting are not supported.
SYSCTLS
These sysctls can be accessed by the
/proc/sys/net/ipv4/*
files or with the
sysctl(2)
interface. In addition, most IP sysctls also apply to TCP; see
ip(7).
- tcp_window_scaling
-
Enable RFC1323 TCP window scaling.
- tcp_sack
-
Enable RFC2018 TCP Selective Acknowledgements.
- tcp_timestamps
-
Enable RFC1323 TCP timestamps.
- tcp_fin_timeout
-
How many seconds to wait for a final FIN packet before the socket is
forcibly closed. This is strictly a violation of the TCP specification,
but required to prevent denial-of-service attacks.
- tcp_keepalive_probes
-
Maximum TCP keep-alive probes to send before giving up. Keep-alives are only
sent when the
SO_KEEPALIVE
socket option is enabled.
- tcp_keepalive_time
-
The number of seconds after no data has been transmitted before a keep-alive
will be sent on a connection. The default is 10800 seconds (3 hours).
- tcp_max_ka_probes
-
How many keep-alive probes are sent per slow timer run. To prevent
bursts, this value should not be set too high.
- tcp_stdurg
-
Enable the strict RFC793 interpretation of the TCP urgent-pointer field.
The default is to use the BSD-compatible interpretation of the urgent-pointer,
pointing to the first byte after the urgent data. The RFC793 interpretation
is to have it point to the last byte of urgent data. Enabling this option
may lead to interoperatibility problems.
- tcp_syncookies
-
Enable TCP syncookies. The kernel must be compiled with
CONFIG_SYN_COOKIES.
Syncookies protects a socket from overload when too many connection
attempts arrive. Client machines may not be able to detect
an overloaded machine with a short timeout anymore when syncookies are enabled.
- tcp_max_syn_backlog
-
Length of the per-socket backlog queue. As of Linux 2.2, the backlog specified
in
listen(2)
only specifies the length of the backlog queue of already established sockets.
The maximum queue of sockets not yet established (in
SYN_RECV
state)
per listen socket is set by this sysctl. When more connection requests arrive,
Linux starts to drop packets. When syncookies are enabled the packets are still
answered and this value is effectively ignored.
- tcp_retries1
-
Defines how many times an answer to a TCP connection request is
retransmitted before giving up.
- tcp_retries2
-
Defines how many times a TCP packet is retransmitted in established state
before giving up.
- tcp_syn_retries
-
Defines how many times to try to send an initial SYN packet to a remote
host before giving up and returns an error. Must be below 255.
This is only the timeout for outgoing connections; for incoming
connections the number of retransmits is defined by
tcp_retries1.
- tcp_retrans_collapse
-
Try to send full-sized packets during retransmit. This is used to work around
TCP bugs in some stacks.
SOCKET OPTIONS
To set or get a TCP socket option, call
getsockopt(2)
to read or
setsockopt(2)
to write the option with the socket family argument set to
SOL_TCP.
In addition,
most
SOL_IP
socket options are valid on TCP sockets. For more information see
ip(7).
- TCP_NODELAY
-
Turn the Nagle algorithm off. This means that packets are always sent as soon
as possible and no unnecessary delays are introduced, at the cost of more
packets in the network. Expects an integer boolean flag.
- TCP_MAXSEG
-
Set or receive the maximum segment size for outgoing TCP packets. If this
option is set before connection establishment, it also changes the MSS value
announced to the other end in the initial packet. Values greater than
the interface MTU are ignored and have no effect.
- TCP_CORK
-
If enabled don't send out partial frames.
All queued partial frames are sent when the option is cleared again.
This is useful for prepending headers
before calling
sendfile(2),
or for throughput optimization. This option cannot be combined with
TCP_NODELAY.
IOCTLS
These ioctls can be accessed using
ioctl(2).
The correct syntax is:
-
int value;
error = ioctl(tcp_socket, ioctl_type, &value);
- FIONREAD
-
Returns the amount of queued unread data in the receive buffer. Argument
is a pointer to an integer.
- SIOCATMARK
-
Returns true when the all urgent data has been already received by the user
program.
This is used together with
SO_OOBINLINE.
Argument is an pointer to an integer for the test result.
- TIOCOUTQ
-
Returns the amount of unsent data in the socket send queue in the passed
integer value pointer.
ERROR HANDLING
When a network error occurs, TCP tries to resend the packet. If it doesn't
succeed after some time, either
ETIMEDOUT
or the last received error
on this connection is reported.
Some applications require a quicker error notification.
This can be enabled with the
SOL_IP
level
IP_RECVERR
socket option. When this
option is enabled, all incoming errors are immediately passed to the user program.
Use this option with care - it makes TCP less tolerant to routing changes
and other normal network conditions.
NOTES
When an error occurs doing a connection setup occuring in a socket write
SIGPIPE
is only raised when the
SO_KEEPOPEN
socket option is set.
TCP has no real out-of-band data; it has urgent data. In Linux this means
if the other end sends newer out-of-band data the older urgent data is
inserted as normal data into the stream (even when
SO_OOBINLINE
is not set). This differs from BSD based stacks.
Linux uses the BSD compatible interpretation
of the urgent pointer field by default. This violates RFC1122, but is
required for interoperability with other stacks. It can be changed by
the
tcp_stdurg
sysctl.
ERRORS
- EPIPE
-
The other end closed the socket unexpectedly or a read is executed on
a shut down socket.
- ETIMEDOUT
-
The other end didn't acknowledge retransmitted data after some time.
- EAFNOTSUPPORT
-
Passed socket address type in
sin_family
was not
AF_INET.
Any errors defined for
ip(7)
or the generic socket layer may also be returned for TCP.
BUGS
Not all errors are documented.
IPv6 is not described.
Transparent proxy options are not described.
VERSIONS
The sysctls are new in Linux 2.2.
IP_RECVERR
is a new
feature in Linux 2.2.
TCP_CORK
is new in 2.2.
SEE ALSO
socket(7), socket(2), ip(7), sendmsg(2), recvmsg(2)
RFC793 for the TCP specification.
RFC1122 for the TCP requirements and a description of the Nagle
algorithm.
RFC2001 for some TCP algorithms.