I’m back finally. There’s the translation of the Igor Sysoev’s report made on the RIT conference. Igor Sysoev is the creator of one of the most used lightweight http servers in Russia and the world – nginx.
I also use nginx as reverse-proxy and load balancer in my project.
FreeBSD stores the network data in the mbuf clusters 2Kb each, but only 1500B are used in each cluster (the size of the Ethernet packet)
For each mbuf cluster there is “mbuf” structure needed, which have 256B in size and used to organize mbuf clusters in chains. There’s possibility to store some additional useful 100B data into the mbuf, but it is not always used.
If server have the RAM of 1Gb or more 25 thousands of mbuf clusters will be created by default but it is not enough in some cases.
When there’s no any free mbuf clusters available FreeBSD enters the zonelimit state and stops to answer to any network requests. You can see it as the `zoneli` state in the output of the `top` command.
To fix this problem the only solution is to log in through the local console and reboot the system. It is impossible to kill the process in `zoneli` state. This problem is also actual for Linux 2.6.x but even local console will not work in this state for Linux.
There is the patch that fixes the problem, it returns ENOBUFS error, which indicates entering the `zoneli` state and the program may close some connections when receives the error. Unfortunately this patch have not been merged into FreeBSD yet.
The state of used mbuf clusters can be checked by the following command:
> netstat -m
1/1421/1425 mbufs in use (current/cache/total)
0/614/614/25600 mbufs clusters in use (current/cache/total/max)
You can increase quantity of the mbufs clusters through the kern.ipc.nmbclusters parameter:
> sysctl kern.ipc.nmbclusters=65536
For earlier versions of FreeBSD mbuf clusters can be configured only in boot time:
25000 mbuf clusters takes bout a 50Mb in the memory, 32000 – 74Mb, 65000 – 144 Mb (raises by the power of 2). 65000 is the boundary value and I can’t recommend to exceed it without increasing address space of the kernel first.
Increasing the amount of memory available for kernel
The default space for the kernel in memory is 1Gb for i386 architecture. To set it to 2Gb specify the following line in the kernel configuration file:
On the amd64 the the KVA is always 2Gb and there’s no possibility to increase it yet.
In addition to increasing the address space there’s the possibility to increase the limit of the physical memory available for kernel (320Mb by default). Let’s increase it to 1Gb:
And reserve 275Mb for mbuf cluster from that space:
Establishing the connection. syncache and syncookies
There’s approximately 100 bytes needed to serve one single connection.
Approximatelly 100 bytes space is used for single unfinished connection in syncache.
There’s possibility to store information about 15000 connections in memory. Approximately.
Snyncache parameters can bee seen by “sysctl net.inet.tcp.syncache” command (read-only).
Syncache parameters can be changed only during boot time:
when the new connection does not fit into overfull syncache FreeBSD enters the `syncookies` state (TCP SYN cookies). This possibility is enabled with:
The syncache population and the syncookies stats can be seen with `ntestat -s -p tcp` command.
When the connection is accepted it comes to the “listen socket queue”
Their’s stats can be seen with the `netstat -Lan` command.
Inreasing of the queue is possible with the `sysctl kern.ipc.somaxconn=4096` command
Whan the connection is accepted FreeBSD creates the sockets structures.
To increase the limit of the open sockets:
In earlier versions:
The current state can be seen with the following command:
> vmstat -z
If the server processes several tens of thousands connections the tcb hash allows to detect the target connection for each incoming tcp packet quickly.
The tcb hash is 512 bytes by default.
The current size can be seen with:
It is changeable in the boot time:
Applicatios are working not with the sockets but with files. And there’s file descriptor needed for each socket because of that. To increase:
These options can be changed on the live system but they will not affect already running processes. nginx have the ability to change the open files limit on the fly:
Buffers for incoming data. 64Kb by default, if there’s no large uploads can be decreased to 8Kb (decreases the probability of overflow during a DDoS attack):
listen 80 default rcvbuf=8k;
Buffers for outgoing data. 32K by default. If data have a small size usually or there’s a lack of mbuf clusters it may be decreased:
listen 80 default sendbuf=16k;
In the case when server has written some data to the socket but the client do not want to receive it the data will live in the kernel for several minutes even after the connection will be closed by timeout. Nginx have the option to erase all data after the timeout:
Another way to save some mbuf clusters is the sendfile. It uses the kernel file buffers memory to send the data to the network interface without any intermediate buffers usage.
To enable in nginx:
(you should explicitly switch it off if you’re sending files from the partition mounted via smbfs or cifs – ReRePi)
On the i386 platform with 1Gb and more memory 6656 sendfile buffers will be allocated which is usually enough. On the amd64 platform more optimal implementation is used and there’s no need in sendbufs at all.
On the sendbuf overflow the process stucks in the `sfbufa` state, but things turns ok after the buffer size is increased:
After the connection was closed the socket enters the TIME_WAIT state. In this state it can live for 60 seconds by default. This time can be changed with sysctl (in milliseconds divided by 2. 2×30000 MSL = 60 seconds):
Outgoing connection are bind to the ports from the 49152 – 65535 range (16 thousands). It is better to be increased (1024 – 65535):
To use ports in natural order instead of random (to make the second connection for the same port impossible before TIME_WAIT):
In FreeBSD 6.2 the possibility to not create TIME_WAIT state for localhost connections was added: