Linux NAT Using Conntrack and IPtables

.

Doing the Network Address Translation (NAT) into Linux kernel scales the performance up. This mechanism consists of two parts:

The Connection Tracking/Conntrack Modules

It is a tracking technique of the connections. It is used to know how the packets that pass through the system are related to their connections. The connection tracking does NOT manipulate the packets and It works independently of the NAT module. The conntrack entry looks like:

udp 17 170 src=192.168.1.2 dst=192.168.1.5 sport=137 dport=1025 src=192.168.1.5 dst=192.168.1.2 sport=1025 dport=137 [ASSURED] use=1

The conntrack entry is stored into two separate tuples (one for the original direction (red) and another for the reply direction (blue)). Tuples could belong to different linked lists/buckets in conntrack hash table. The connection tracking modules is responsible for creating and removing the tuples.

Note: The tracking of the connections is ALSO used by iptables to do packet matching based on the connection state.

The NAT Modules

The NAT modules do the NATing itself. They use the tuples and modify them based on the NATing rules. In this way the tuples in the connection tracking table remains in consistent state.

nat

If the packet belongs to an existing connection, this means there is already a conntrack entry (two tuples) in the conntrack table. The NAT module knows this by checking a field in the tuple created for the new arrived packet. Then the packet manipulation is done based on the conntrack entry (The manipulation is determined previously).

If the received packet represents a start of a new connection (first packet), the NAT module looks for a rule in the “NAT” table. If a rule is found, the NAT manipulation will be applied based on the rule and the tuples in the conntrack table will be changed. The tuples are created by conntrack at local outtput hook point before NAT for SNAT (Source NAT) so they need to be updated after doing the NAT for the first packet.

Assume the packets are leaving on network interface “eth1″(-o means “output”) to the internet and the interface “eth0” is connected to the local network. To change the source addresses to 1.2.3.4  and the ports 1-1023, you can add this rule:

# iptables -t nat -A POSTROUTING -p tcp -o eth1 -j SNAT –to 1.2.3.4:1-1023

You can specify a range of IP addresses as well (SNAT –to 1.2.3.4-1.2.3.6).

You can also use what is called MASQUERADE where the the sender’s address is replaced by the router’s address.

# iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

Note: Here i am doing SNAT (Source NAT). You can also do Destination NAT (DNAT) where the conntrack hooks into pre routing hook point. To write DNAT rules, use the chain  PREROUTING and the target DNAT.

NAT Settings

  • You need to load the “nf_conntrack”: # modprobe nf_conntrack
  • You need to start iptables service: # systemctl start iptables
  • You need to enable IP_Forwarding:
    • Temporarily: # echo “1” > /proc/sys/net/ipv4/ip_forward
    • Permanently:  Write net.ipv4.ip_forward = 1 in the file “/etc/sysctl.conf ” and reload (# sysctl -p).
  • Then set NATing rules as mentioned above.
  • Add Forwarding rules to forward packets from one interface to another in both direction:

From the public (interface:eth1) to private(interface eth0):

# iptables -A FORWARD -i eth1 -o eth0 -m state –state RELATED,ESTABLISHED -j ACCEPT

From private(eth0) to public(eth1):

# iptables -A FORWARD -i eth0 -o eth1 -j ACCEPT

  • Finally you need to save the IPtables rules to be persistent: # iptables-save

Note

  • If you got this error “nf_conntrack: table full, dropping packet“ and you have enough free memory , you can expand the size of conntrack table, click here.

More Information


Tuning The Linux Connection Tracking System

.

Introduction

trackerThe connection tracking entry (conntrack entry) looks like this:

udp 17 170 src=192.168.1.2 dst=192.168.1.5 sport=137 dport=1025 src=192.168.1.5 dst=192.168.1.2 sport=1025 dport=137 [ASSURED] use=1

It contains two elements the original direction (the red) and the reply direction (the blue).

To display all table’s entries, read  “/proc/net/nf_conntrack”.

The conntrack entry is stored into two separate nodes (one for each direction) in different linked lists. Each linked list is called  bucket. The bucket is an element in a hash table. The hash value is calculated based on the received packet and used as index in the hash table. Iteration is done over the linked list of nodes to find the wanted node.

Long list is not recommended (iteration cost): The cost depends on the length of the list and the position of the wanted conntrack node.

Long hash table is recommended (constant time): The cost is the hash calculation.

Linked List Size (Bucket Size)= Maximum Number of nodes / Hash Table Size (Number of Buckets)

The hash table is stored in the kernel memory. We can tune the size of the bucket and the maximum number of nodes. The required memory = conntrack node’s memory size * 2* simultaneous connections your system aim to handle. Example: 304 bytes per conntrack and 1M connections requires 304*2 MB.

It is not recommended to set so big values if you have less than 1G RAM.

Tuning the Values

If your server has a lot of connections to be handled and the conntrack table is full, you will get this error “nf_conntrack: table full, dropping packet“. This will limit the number of simultaneous connections your system can handle.

To get the maximum number of nodes:

# /sbin/sysctl -a|grep -i nf_conntrack_max
net.nf_conntrack_max = 65536

To get the hash table size (number of buckets):

# /sbin/sysctl -a|grep -i nf_conntrack_buckets
net.netfilter.nf_conntrack_buckets = 16384

The bucket size (linked list length)= 4 (65536/16384).

To temporarily change the value of hash table size to 2*16384=32768:

# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

To permanently change the value:

# echo “net.netfilter.nf_conntrack_buckets = 32768” >> /etc/sysctl.conf
# /sbin/sysct -p

The same way for “nf_conntrack_max”:

Temporarily: # echo 131072 > /proc/sys/net/nf_conntrack_max

To permanently:

# echo “net.netfilter.nf_conntrack_max = 131072” >> /etc/sysctl.conf
# /sbin/sysct -p

This requires 38 MB memory.

“nf_conntrack” Other Values

You can change these values also in the same way:

# /sbin/sysctl -a|grep -i nf_conntrack
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 16384
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 817
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 256
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 1
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 65536

Last word: Disable “nf_conntrack” if it is not necessary.