Next Previous Contents

1. Introduction

1.1 ChangeLog

LVS-HOWTO

posted to the LVS site, LVS website http://www.linuxvirtualserver.org

1.2 Purpose of this HOWTO

To enable you to understand how a Linux Virtual Server (LVS) works. Another document, the LVS-mini-HOWTO, tells you how to setup and install an LVS without understanding how the LVS works. The mini-HOWTO was extracted from this HOWTO. Eventually the redundant material will be dropped from this HOWTO.

The material here covers directors and realservers with 2.2 and 2.4 kernels. The code for 2.0.x kernels still works fine and was used on production systems when 2.0.x kernels were current, but is not being developed further. For 2.2 kernels, the networking code was rewritten, producing the arp problem. This changes the installation of LVS from a simple process that can be done by almost anyone, to thought provoking, head scratching exercise, which requires detailed understanding of the workings of LVS. For 2.0 and 2.2 LVS is stand alone code based on ip_masquerading and doesn't integrate well with other uses of ip_masquerading. For 2.4 kernels, LVS was rewritten as a netfilter module to allow it to fit into and be visible to other netfilter modules. Unfortunately the fit isn't perfect but cooperation with netfilter does work in most cases. Being a netfilter module, the latency and throughput are slightly worse for 2.4 LVS than for the 2.2 code. However with modern CPUs being running at 800MHz, the bottleneck now is network throughput rather than LVS throughput (you only need a small number of realservers to saturate 100Mbps ethernet).

In general ipvsadm commands and services have not changed between kernels.

1.3 Nomenclature/Abbreviations

If you use these terms when you mail us, we'll know what you're talking about.

Preferred names

synonyms

Please use the first term in these lines. The other words are valid but less precise (or are redundant).

backend (multi-tier) servers

The realservers sometimes are frontends to other backend servers. The client does not connect to these backend servers and they are not in the ipvsadm table.

e.g.

These backend servers are setup separately from the LVS.

the term "the server" is ambiguous

People sometimes call the director or the realservers, "the server". Since the LVS appears as a server to the client and since the realservers are also serving services, the term "server" is ambiguous. Do not use the term "the server" when talking about LVS. Most often you are referring to the "director" or the "realservers". Sometimes (e.g. when talking about throughput) you are talking about the LVS.

I use "realserver" as I despair of finding a reference to a "real server" in a webpage using the search keys "real" and "server". Horms and I (for reasons that neither of us can remember) have been pushing the term "real-server" for about a year, on the mailing list, and no-one has adopted it. We're going back to "realserver".

names of IPs in an LVS

client IP     = CIP
virtual IP    = VIP (the IP on the director that the client connects to)
director IP   = DIP (the IP on the director in the realserver's network)
realserver IP = RIP (and RIP1, RIP2...) (the IP on the realserver)
director GW   = DGW (or director gw)
realserver GW = RGW, SGW (or server gw)

1.4 What is an LVS?

A Linux Virtual Server (LVS) is a cluster of servers which appears to be one server to an outside client. This apparent single server is called here a "virtual server". The individual servers (realservers) are under the control of a director (or load balancer), which runs a Linux kernel patched to include the ipvs code. The ipvs code running on the director is the essential feature of the LVS (although other user level code can/is used to manage the LVS).

What is a VIP?

The director presents an IP called the Virtual IP (VIP) to clients. (When using fwmarks, VIPs are agregated into groups of IPs, but the same principles apply as for a single IP). When a client connects to the VIP, the director forwards the client's packets to one particular realserver for the duration of the client's connection to the LVS. This connection is chosen and managed by the director. The realservers serve services (eg ftp, http, dns, telnet, nntp, smtp) such as are found in /etc/services or inetd.conf. The LVS presents one IP on the director (the virtual IP, VIP) to clients.

Peter Martin p.martin@ies.uk.com and John Cronin jsc3@havoc.gtf.org 05 Jul 2001

The VIP is the address which you want to load balance i.e. the address of your website. The VIP is usually an alias (e.g. eth0:1) so that the VIP can be swapped between two directors if a fault is detected on one.

The VIP is the IP address of the "service", not the IP address of any of the particular systems used in providing the service (ie the director and the realservers).

The VIP be moved from one director to another backup director if a fault is directed (typically this is done by using mon and heartbeat, or something similar). The director can have multiple VIPs. Each VIP can have one or more services associated with it e.g. you could have HTTP/HTTPS balanced using one VIP, and FTP service (or whatever) balanced using another VIP, and calls to these VIPs can be answered by the same or different realservers.

Groups of VIPs and/or ports can be setup with fwmark.

The realservers have to be configured to work with the VIPs on the director (this includes handling the arp problem).

There can be persistence issues, if you are using cookies or https, or anything else that expects the realserver fulfilling the requests to have some connection state information. This is also addressed on the LVS persistence page

Where do you use an LVS?

Client/Server relationship is preserved in an LVS

Basic LVS topology



                        ________
                       |        |
                       | client | (local or on internet)
                       |________|
                           |
                        (router)
                           |
--                         |
L                      Virtual IP
i                      ____|_____
n                     |          | (director can have 1 or 2 NICs)
u                     | director |
x                     |__________|
                           |
V                          |
i                          |
r         ----------------------------------
t         |                |               |
u         |                |               |
a         |                |               |
l    _____________   _____________   _____________
    |             | |             | |             |
S   | realserver1 | | realserver2 | | realserver3 |
e   |_____________| |_____________| |_____________|
r
v
e
r
---

LVS director is an L4 switch

In the computer beastiary, the director is a layer 4 (L4) switch. The director makes decisions at the IP layer and just sees a stream of packets going between the client and the realservers. In particular an L4 switch makes decisions based on the IP information in the headers of the packets.

Here's a description of an L4 switch from Super Sparrow Global Load Balancer documentation

" Layer 4 Switching: Determining the path of packets based on information available at layer 4 of the OSI 7 layer protocol stack. In the context of the Internet, this implies that the IP address and port are available as is the underlying protocol, TCP/IP or UCP/IP. This is used to effect load balancing by keeping an affinity for a client to a particular server for the duration of a connection. "

This is all fine except

Nevo Hed nevo@aviancommunications.com 13 Jun 2001

The IP layer is L3.

Alright, I lied. TCPIP is a 4 layer protocol and these layers do not map well onto the 7 layers of the OSI model. (As far as I can tell the 7 layer OSI model is only used to torture students in classes.) It seems that everyone has agreed to pretend that tcpip uses the OSI model and that tcpip devices like the LVS director should therefore be named according to the OSI model. Because of this, the name "L4 switch" really isn't correct, but we all use it anyhow.

The director does not inspect the content of the packets and cannot make decisions based on the content of the packets (e.g. if the packet contains a cookie, the director doesn't know about it and doesn't care). The director doesn't know anything about the application generating the packets or what the application is doing. Because the director does not inspect the content of the packets (layer 7, L7) it is not capable of session management or providing service based on packet content. L7 capability would be a useful feature for LVS and perhaps this will be developed in the future (preliminary code is out - May 2001 - ktcpvs).

The director is basically a router, with routing tables set up for the LVS function. These tables allow the director to forward packets to realservers for services that are being LVS'ed. If http (port 80) is a service that is being LVS'ed then the director will forward those packets. The director does not have a socket listener on VIP:80 (i.e. netstat won't see a listener).

John Cronin jsc3@havoc.gtf.org (19 Oct 2000) calls these types of servers (i.e. lots of little boxes appearing to be one machine) "RAILS" (Redundant Arrays of Inexpensive Linux|Little|Lightweight|L* Servers). Lorn Kay lorn_kay@hotmail.com calls them RAICs (C=computer), pronounced "rake".

LVS forwards packets to realservers

The director uses 3 different methods of forwarding

Some modification of the realserver's ifconfig and routing tables will be needed for LVS-DR and LVS-Tun forwarding. For LVS-NAT the realservers only need a functioning tcpip stack (i.e. the realserver can be a networked printer).

LVS works with all services tested so far (single and 2 port services) except that LVS-DR and LVS-Tun cannot work with services that initiate connects from the realservers (so far; identd and rsh).

The realservers can be indentical, presenting the same service (eg http, ftp) working off file systems which are kept in sync for content. This type of LVS increases the number of clients able to be served. Or the realservers can be different, presenting a range of services from machines with different services or operating systems, enabling the virtual server to present a total set of services not available on any one server. The realservers can be local/remote, running Linux (any kernel) or other OS's. Some methods for setting up an LVS have fast packet handling (eg LVS-DR which is good for http and ftp) while others are easier to setup (eg transparent proxy) but have slower packet throughput. In the latter case, if the service is CPU or I/O bound, the slower packet throughput may not be a problem.

For any one service (eg httpd at port 80) all the realservers must present identical content since the client could be connected to any one of them and over many connections/reconnections, will cycle through the realservers. Thus if the LVS is providing access to a farm of web, database, file or mail servers, all realservers must have identical files/content. You cannot split up a database amongst the realservers and access pieces of it with LVS.

The simplest LVS to setup involved clients doing read-only fetches (e.g. a webfarm). If the client is allowed to write to the LVS (e.g. database, mail farm), then some method is required so that data written on one realserver is transferred to other realservers before the client disconnects and reconnects again. This need not be all that fast (you can tell them that their mail won't be updated for 10mins), but the simplest (and most expensive) is for the mail farm to have a common file system for all servers. For a database, the realservers can be running database clients which connect to a single backend database, or else the realservers can be running independant database daemons which replicate their data.

LVS runs on any Linux platform

An LVS requires a Linux director (Intel and Alpha versions known to work. The LVS code doesn't have any Intel specific instructions and is expected to work on any machine that Linux runs on.

Code for LVS is different for 2.4.x and 2.2.x kernels

There are differences in the coding for LVS for the 2.0.x, 2.2.x and 2.4.x kernels. Development of LVS on 2.0.36 kernels has stopped (May 99). Code for 2.2.x kernels is production level and this HOWTO is up to date for 2.2.19 kernels. Code for 2.4.x kernels is relatively new and the HOWTO is less complete for the 2.4.x material (check on the mailing list). (Jun 2001, we're pretty much upto date now.)

The 2.0.x and 2.2.x code is based on the masquerading code. Even if you don't explicitely use ipchains (eg with LVS-DR or LVS-Tun), you will see masquerading entries with `ipchains -M -L` (or `netstat -M`).

Code for 2.4.x kernels was rewritten to be compatible with the netfilter code (i.e. its entries will show up in netfilter tables). It is now production level code. Because of incompatibilities with LVS-NAT for 2.4.x LVS was in development mode (till about Jan 2001) for LVS-NAT.

kernels from 2.4.x series are SMP for kernel code

2.4.x kernels are SMP for kernel code as well as user space code, while 2.2.x kernels are only SMP for user space code. LVS is all kernel code. A dual CPU director running a 2.4.x kernel should be able to push packets at twice the rate of the same machine running a 2.2 kernel (if other resources on the director don't become limiting).

OS for realservers

You can have almost any OS on the realservers (all are expected to work, but we haven't tried them all yet). The realservers only need a tcpip stack - a networked printer can be a realserver.

LVS works on ethernet

LVS works on ethernet. There are some limitations on using ATM.

LVS is continually being developed

LVS is continually being developed and usually only the more recent kernel and kernel patches are supported. Usually development is incremental, but with the 2.2.14 kernels the entries in the /proc file system changed and all subsequent 2.2.x versions were incompatible with previous versions.

Other documentation

For more documentation, look at the LVS web site (eg a talk I gave on how LVS works on 2.0.36 kernel directors)

1.5 LVS Failure

The LVS itself does not provide high availability. Other software (eg mon, ldirectord, or the Linux HA code) is used in conjunction with LVS to provide high availability (i.e. to switch out a failed realserver/service or a failed director).

Another package keepalived is designed to work with LVS watching the health of services. Julian has written Netparse, which is suffering the same fate.

There are two types of failures with an LVS.

In both cases (failure of director, or failure of a service), the client's session with the realserver will be lost (as would happen in the case of a single server). With failover however, the client will be presented with a new connection when they initiate a reconnect.

1.6 Thanks

Contributions to this HOWTO came from the mailing list and are attibuted to the poster (with e-mail address). Postings may have been edited to fit them into the flow of the HOWTO.

The LVS logo (Tux with 3 lighter shaded penguins behind him representing a director and 3 realservers) is by Mike Douglas spike@bayside.net

LVS homepage is running on a machine donated by Voxel.

LVS mailing list is hosted by Lars in Germanylmb@suse.de

1.7 HOWTO source is sgml

The HOWTO is written in sgml. The char '&' found in C source code has to be written as & in sgml. If you swipe patches from the sgml rather than say the html rendering of it, you will get code which needs to be edited to fix the &.

1.8 Mailing list, subscribing, unsubscribing and searchable archives

mailing list details

Thanks to Hank Leininger for the mailing list archive which is searchable not only by subject and author, but by strings in the body. Hank's resource has been of great help in assembling this HOWTO.

1.9 Minimal knowledge required, getting technical help

Ratz ratz@tac.ch

To be able to setup/maintain an LVS, you need to be able to

The mailing list and HOWTOs cover information specific to LVS. The rest you have to handle yourself. All of us knew nothing about computers when we first started, we learnt it and you can too. If you can't setup a simple LVS from the mini-HOWTO, without getting into a major sweat (or being able to tell us what's wrong with the instructions), then you need to do some more homework.

It's hard to believe but we do get postings like

recompiling the kernel is hard (or I don't read HOWTOs), can't you guys cut me some slack and just tell me what to do?

The answer is: NO WE WON'T

The people on the mailing list answer questions for free, and have important things to do, like keeping up with /. and checking our e-mail. When we're at home, there is beer to drink and Gilligan's Island re-runs to watch. Reading to you does not rate. I expect the people who post these statements don't read the HOWTO, so I may be wasting my time here. Still there's people who think that their time is more important than ours.

can anybody tell me how to setup a windows realserver? thank you very much! I'm in a hurry.
robert.gehr@web2cad.de

I can't think of anyone who has set up lvs in a hurry :-)

To get technical help:

1.10 Posting problems/questions to the mailing list

There's always new ideas and questions being posted on the mailing list. We don't expect this to stop.

If you have a problem with your LVS not working, before you come up on the mailing list, please -

If you don't understand your problem well, here's a suggested submission format from Roberto Nibali ratz@tac.ch

  1. System information, such as kernel, tools and their versions. Example:
            hog:~ # uname -a
            Linux hog 2.2.18 #2 Sun Dec 24 15:27:49 CET 2000 i686 unknown
    
            hog:~ # ipvsadm -L -n | head -1
            IP Virtual Server version 1.0.2 (size=4096)
    
            hog:~ # ipvsadm -h | head -1
            ipvsadm v1.13 2000/12/17 (compiled with popt and IPVS v1.0.2)
            hog:~ # 
            
    
  2. Short description and maybe sketch of what you intended to setup. Example (LVS-DR):
            o Using LVS-DR, gatewaying method.
            o Load balancing port 80 (http) non-persistent.
            o Network Setup:
    
                            ________
                           |        |
                           | client |
                           |________|
                               | CIP
                               |
                               |
                               |
                            (router)
                               |
                               |
                               |
                               | GEP
                     (packetfilter, firewall)
                               | GIP
                               |
                               |       __________
                               |  DIP |          |
                               +------+ director |
                               |  VIP |__________|
                               |
                               |
                               |
             +-----------------+----------------+
             |                 |                |
             |                 |                |
         RIP1, VIP         RIP2, VIP        RIP3, VIP
        ____________      ____________    ____________
       |            |    |            |  |            |
       |realserver1 |    |realserver2 |  |realserver3 |
       |____________|    |____________|  |____________|
    
    
            CIP  = 212.23.34.83
            GEP  = 81.23.10.2       (external gateway, eth0)
            GIP  = 192.168.1.1      (internal gateway, eth1, masq or NAT)
            DIP  = 192.168.1.2      (eth0:1, or eth1:1)
            VIP1 = 192.168.1.110    (director: eth0:110, realserver: lo0:110)
            RIP1 = 192.168.1.11     
            RIP2 = 192.168.1.12     
            RIP3 = 192.168.1.13     
            DGW  = 192.168.1.1      (GIP for all realserver)
    
            o ipvsadm -L -n
    
    hog:~ # ipvsadm -L -n
    IP Virtual Server version 1.0.2 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
    TCP  192.168.1.10:80 wlc
      -> 192.168.1.13:80             Route   0      0          0         
      -> 192.168.1.12:80             Route   0      0          0         
      -> 192.168.1.11:80             Route   0      0          0         
    hog:~ # 
    
    The output from ifconfig from all machines (abbreviated, just need the
    IP, netmask etc), and the output from netstat -rn.
    
  3. What doesn't work. Show some output like tcpdump, ipchains, ipvsadm and kernlog. Later we may ask you for a more detailed configuration like routing table, OS-version or interface setup on some machines used in your setup. Tell us what you expected. Example:
            o ipchains -L -M -n (2.2.x) or cat /proc/net/ip_conntrack (2.4.x)
            o echo 9 > /proc/sys/net/ipv4/vs/debug_level && tail -f /var/log/kernlog
            o tcpdump -n -e -i eth0 tcp port 80
            o route -n
            o netstat -an
            o ifconfig -a
            
    

1.11 ToDo List

for the HOWTO

Combining HA and LVS (e.g. Ultramonkey).

I realise that information in here isn't all that easy to locate yet (there's no index and you'll have to search with your editor) and that the ordering of sections could be improved.

I'll work on it as I have time.

for LVS

Does anyone want to write a MIB for LVS?

Nov 2001. We have a MIB written by Romeo Benzoni rb@ssn.tp!

1.12 Other load balancing solutions

from lvs@spiderhosting.com a list of load balancers

1.13 Software/Information useful/related to LVS

Ultra Monkey is LVS and HA combined.

from lvs@spiderhosting.com Super Sparrow Global Load Balancing using BGP routing information.

From ratz, there's a write up on load imbalance with persistence and sticky bits at our friends at M$.

From ratz, Zero copy patches to the kernel to speed up network throughput, Dave Miller's patches, Rik van Riel's vm-patches and more of Rick van Riel's patches. The Zero copy patches may not work with LVS and may not work with netfilter either (from Katejohn@antefacto.com).

From Michael Brown michael_e_brown@dell.com, the TUX kernel level webserver.

From Lars lmb@suse.de mod_backhand, a method of balancing apache httpd servers that looks like ICP for web caches.

A lightweight and simple webbased cluster monitoring tool designed for beowulfs procstatd, the latest version was 1.3.4 (you'll have to look around on this page).

From Putchong Uthayopas pu@ku.ac.th a heavyweight (lots of bells and whistles) cluster monitoring tool KCAP


Next Previous Contents