The Virtual Ethernet Bridge: docker0

.

docker Docker [Installation: click here] creates the virtual interface “docker0” which is virtual Ethernet bridge. This virtual bridge forwards the packets between the containers each other and between the host and the container.

You see this when you execute: “ip addr list docker0 or “ifconfig docker0.

output of “ifconfig docker0“:

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
inet 172.17.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
ether 02:42:63:8f:85:df  txqueuelen 0  (Ethernet)
RX packets 0  bytes 0 (0.0 B)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The IP address of this interface is a randomly chosen private IP that is not used by the host.  Here it is 172.17.0.1 with the net mask of 16 bits (255.255.0.0) and the MAC address 02:42:63:8f:85:df.

To display the virtual networks created by Docker, execute:

# docker network ls

Output:

NETWORK ID          NAME                DRIVER
dfc34cd75029        bridge              bridge 
3453b5653383        none                null
e9a375f4a8a2        host                host

The figure below describes how the virtual interfaces are connected to the bridge docker0.  All veth* interfaces are attached/linked to this bridge. Each veth* interface has a peer (virtual interface) in a specific container.

docker0

Some Notes:

  • The eth0 in the container and the veth* in the host are pairs and connected like a pipe. So the packets enter from one of them will reach the other and vice versa.
  • The eth0 has a private IP address from the same range of the bridge docker0’s IP address which is considered as a gateway. The MAC address is generated from the IP address.
  • The MAC address of the veth* interface is set.
  • The bridge docker0 connects veth* interfaces while sharing a single IP address.

To inspect the bridge network, execute:

# docker network inspect bridge

Output:

[
{
“Name”: “bridge”,
“Id”: “dfc34cd75029786a53356e554b34a14c290ef4969e38ecb1e4ae9c34598e93d7”,
“Scope”: “local”,
“Driver”: “bridge”,
“IPAM”: {
“Driver”: “default”,
“Config”: [
{
“Subnet”: “172.17.0.0/16”
}
]
},
“Containers”: {},
“Options”: {
“com.docker.network.bridge.default_bridge”: “true”,
“com.docker.network.bridge.enable_icc”: “true”,
“com.docker.network.bridge.enable_ip_masquerade”: “true”,
“com.docker.network.bridge.host_binding_ipv4”: “0.0.0.0”,
“com.docker.network.bridge.name”: “docker0”,
“com.docker.network.driver.mtu”: “1500”
}
}
]

Because there is no container created yet, we have the array “Containers” empty (“Containers”: {}). The bridge network can be created manually.

To run a container in a specific network, executes:

# docker run –net=<NETWORK> …..The rest ………..

Note: Instead of using a bridge, we can use iptables NAT configuration.


More Information


 

Advertisements

TURN In Few Words

.

TURN is abbreviation for Traversal Using Relays around NAT. It is a control protocol that allows the host behind a NAT to exchange packets with its peers using the relay.  It is specified in the RFC [5766]. The following are few words about this protocol:

  • TURN is part of the ICE (Interactive Connectivity Establishment) but it can be used without ICE.
  • TURN is designed to solve the communication problem when both the client and its peer are behind respective NAT where the hole punching techniques (discovering direct communication path) may fail. In other words TURN is used when a direct communication path between the client and the pair can NOT be found.
  • The public TURN server sits between the two hosts that are behind NAT and relays the packets between them.
  • TURN is client-server protocol. The client is called TURN client and the server is called TURN server.
  • The TURN client obtains (using the TURN protocol -Allocate transaction) on the TURN server what is called relayed transport address (IP address and port).
  • The client sends CreatePermissions request to the TURN server to create permissions (permissions to validate the peer-server communication).
  • The TURN server sees the messages coming from the client as it is coming from a transport address on NAT. This address is called client’s server-reflexive transport address.
  • The NAT forwards packets coming to the client’s server-reflexive transport to the client’s host transport address (private address).
  • The TURN server receives the application data from the client, make the relayed transport address as the source of the packets and relays them to the peer using UDP datagrams.
  • The peer sends the application data in UDP packets to the client’s relayed transport address on the relay server. Then the server checks the permissions and on validation it relays the data to the client.
  • A way to communicate the relayed transport address and peers addresses (server-reflexive transport addresses) is needed (out of scope of the TURN protocol).

TURN

  • In VOIP, if TURN is used with ICE, then the client puts its obtained relayed transport address as an ICE candidate (among other candidates) in the SDP carried by the rendezvous protocol like SIP. When the other peers receives the SIP request, they will know how to reach the client.
  • The TURN messages (encapsulation of the application data) contains an indication of the peer the client is communicating with so the client can use a single relayed transport address to communicate with multiple peers. This is when the the rendezvous protocol (e.g. SIP) supports forking.
  • Using TURN is expensive so when TURN is used with ICE, the ICE uses hole punching techniques first to discover the direct path. If the direct path is not found, then TURN is used.
  • Using TURN makes the communication no longer peer to peer communication.

More Information


 

Network Namespaces In Linux Kernel

 

Introduction

Each kernel network namespace has its own network devices (even the loopback interface), IP addresses, firewall rules, the “/proc/net” and /sys/class/net directory trees, sockets, IP routing tables, port numbers,…etc

indexclone() is a system call used to create a child process. If the CLONE_NEWNET flag is set, then the child process will be created in a new network namespace. Execute system(“ip link”) and system(“ip netns”) in the child process and in the parent to see the difference.

unshare() system call which creates a new namespace and adds the current process to it.

setns() system call is used to join an existing namespace.

You can define a virtual network device (veth) in the namespace and you can create a tunnel between two virtual network devices from different namespaces. The implementation of this is like creating a pipe. To connect the namespace to the internet, a bridge need to be created in the root namespace and the virtual device (veth) in the child namespace will be linked/bounded to the bridge. The physical network device can be assigned only to the root namespace. Instead of creating a bridge, you can use IP forwarding with NAT rules in the root namespace.

The namespace is addresses by it name or by PID of a process inside the namespace.

If a service in a namespace has been infected, this will not affect other services in other namespaces. This is due to the isolation property.

Add New Network Namespace

We can use the “ip” networking configuration tool to play with namespaces.  The namespace can persist even if it has not processes running in it. To add new empty namespace:

# ip netns add BinanNameSpace

“BinanNameSpace” is the name of the new created namespace. A bind mount is created for “BinanNameSpace” under “/var/run/netns”.

Get The Current Namespaces

To get the current namespaces, execute: “ip netns” or “ls /var/run/netns

BinanNameSpace
qdhcp-ae4d3669-d1ab-4133-8ea6-059611dc524e
qrouter-f96c719a-56e9-4b52-b2cb-da326fc1a429

List The Interfaces In The Namespace

To list the interfaces inside the namespace, execute:

# ip netns exec BinanNameSpace ip link list

1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Note we have only loopback interface and it is down so we cannot ping it:

# ip netns exec BinanNameSpace ping 127.0.0.1

connect: Network is unreachable

To make the loopback interface up: # ip netns exec BinanNameSpace ip link set dev lo up

# ip netns exec BinanNameSpace ping 127.0.0.1

PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.044 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.035 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.034 ms
^C
— 127.0.0.1 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.034/0.037/0.044/0.008 ms

Create Virtual Network Device In The Namespace

To create a virtual network device (veth1) in the “BinanNamespace” and make it as a peer to veth0 in the root namespace:

# ip link add veth0 type veth peer name veth1

# ip link set veth1 netns BinanNameSpace

Set IP addresses To The Virtual Network Devices

For the veth0 in the root namespace: # ifconfig veth0 11.0.0.2/24 up

For the veth1 in the “BinanNameSpace”: # ip netns exec BinanNameSpace ifconfig veth1 11.0.0.1/24 up

Now to check the interface veth0:

# ifconfig veth0

veth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
inet 11.0.0.2  netmask 255.255.255.0  broadcast 11.0.0.255
…..

To check the interface eth1:

# ip netns exec BinanNameSpace ifconfig veth1
veth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
inet 11.0.0.1  netmask 255.255.255.0  broadcast 11.0.0.255
……

Connection Test

To test the connection between the root namespace and “BinanNameSpace”, we ping in both direction:

From the root(veth0: 11.0.0.2) to BinanNameSpace(veth1:11.0.0.1):

# ping 11.0.0.1
PING 11.0.0.1 (11.0.0.1) 56(84) bytes of data.
64 bytes from 11.0.0.1: icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from 11.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms
64 bytes from 11.0.0.1: icmp_seq=3 ttl=64 time=0.044 ms
^C
— 11.0.0.1 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.036/0.040/0.044/0.008 ms

From BinanNameSpace (veth1:11.0.0.1) to the root namespace (veth0: 11.0.0.2):

# ip netns exec BinanNameSpace ping 11.0.0.2

PING 11.0.0.2 (11.0.0.2) 56(84) bytes of data.
64 bytes from 11.0.0.2: icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from 11.0.0.2: icmp_seq=2 ttl=64 time=0.034 ms
64 bytes from 11.0.0.2: icmp_seq=3 ttl=64 time=0.038 ms
^C
— 11.0.0.2 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.034/0.038/0.042/0.003 ms

Delete A Namespace

To delete the namespace, execute: # ip netns delete BinanNameSpace


 More Information