Search This Blog

Tuesday, December 25, 2012

KVM hosted virtual servers using bridging: theory and practice

If you are a systems or networks administrator that:
  • works in enterprise data centers or 
  • someone that wants to deploy virtual servers in a newly acquired multi-core server using RHEL 6 and nothing more than the Linux KVM and RedHat's basic virt-manager application and/or 
  • you wish to gain an understanding of KVM's virtual networking architecture
then this article/technical walkthrough is for you. Most of these techniques will work on other Linux distributions besides RHEL 6. Admittedly, there are more user friendly, free and commercial tools that allow you to deploy virtual machines. The usual suspects include VMware, RedHat, Oracle, Parallels that provide industrial strength solutions with intuitive point-and-click interfaces that make the setup of virtual machines an easy task.

However, I like to keep my production server software stack as simple as possible. Those of you that had to troubleshoot VM performance or other problems and faced the 'ping-pong' between the virtualization and the OS vendors will know what I mean. Thus, I use KVM/qemu and virt-manager to cater for my VM needs. The downside is that these tools are less intuitive to use for the newcomer, but with a little bit of good documentation and practice, they can be effective. I draw this conclusion after looking around in various technical support threads and after browsing RedHat's documentation on the subject. The threads seem to confuse the various virtual switching modes and techniques when things could be done more easily with interface bridging. The same can be said for Redhat's Virtualization Administration Guide, which does a fairly good job detailing the Routed, NAT and isolated virtual networking modes (Chapter 18), however it fails to mention how bridging could be used for hosting virtual servers. I am going to spend the rest of the article to explain this in detail.

The Theory

Let's be more specific now and explain what I mean when I say I need to deploy a fully networked virtual server. When you use the virt-manager application, it's easy to deploy a network enabled guest OS by means of using Network Address Translation (NAT). In fact, NAT (IP Masquerading, a specific mode of NAT) is the default guest OS virtual networking mode, using the IP address of the physical host server.  

 Figure 1

The figure above displays the networking data path traversal from the VM guests, all the way to the physical network/VLAN, when using the default virtual networking mode (NAT). Starting at the bottom of the figure, each guest has been assigned to a virtual network interface (vnetx). This is essentially a software implementation of an interface which is part of a virtual switch. At the other end of the virtual switch, a virtual bridge interface (virbr0) merges the traffic from the VMs and interfaces to the IPTABLES module which performs the actual NAT. At the end, you have the eth0 physical interface which carries the packets to the actual wire.  

In this scenario, your guest OS will have outbound network connectivity. Should you wish to enable inbound network connectivity, you will fail. It is possible to perform other tricks and enable port forwarding/SNAT/DNAT to enable inbound connections. However, this is cumbersome. As a result, my definition of deploying a proper virtual server resembles the following aspects of a true physical server:
  • You have a physical MAC address tied to a network/VLAN broadcast domain
  • You can deal with that MAC address in any way you would deal with a true physical NIC: ARP, assign a static IP, (static) DHCP, etc.
  • You can have unrestricted outbound and inbound network access within that network/VLAN broadcast domain, a must requirement for a server system.
In order to achieve this, we need to employ the technique of interface bridging. For references on bridges, you can consult a variety of sources such as:
i)The IEEE 802.1D standard
ii)The older (out of date but still useful) Ethernet Bridge + netfilter HOW TO from TDLP.
iii)A copy of A. S. Tanenbaum's  Computer Networks classic textbook.
However, prior explaining how this works, let's throw in a realistic production environment scenario.

Figure 2

Figure 2 displays the network topology of a production VM server scenario.  There are two networks. One Class C internal (192.168.14.24), where hosts may or may not have outbound connectivity. Inbound connectivity to this network is prohibited by the top server which offers FTP, DMZ, FIREWALL, DHCP, and DNS services on the INTERNAL net. The other network is a world routable Class B (129.230/16).  

The VM host server needs to serve a number of virtual servers that have different network access criteria:

  • Guest_01: Linux server to run an LAMP stack, exposed on the internal network.
  • Guest_02: Development Windows 7 box, which needs to be accessible via non standard port ranges on the internal network, but also needs Internet access.
  • Guest_03: Legacy SCADA Windows XP based system which needs to be accessible only via the internal network.
Clearly, Guest_01 is the least restricted system, so it makes sense to place it on the INTERNET/EXTERNAL Class B net. Guest_02 needs some protection so the outside folks cannot reach it, only it should reach the outside world by means of IP Masquerading, by using the Public routable IP of the FTP/DMZ/FIREWALL/DHCP/DNS server (129.230.135.131). Thus, it's a candidate for the INTERNAL Class C net. The same goes for Guest_03, which is the most isolated environment we need to protect, accessible only by INTERNAL network hosts.
At this point, it is useful to modify Figure 1 to illustrate the virtual network data path of our new scenario.  
  Figure 3

Figure 3 above illustrates the virtual network data path of our production scenario (Figure 2). In this case, instead of the virbr0 we have bridging modules bound to physical interfaces. Each physical interface is connected to the proper network/VLAN and has a bridge bound to it (we will illustrate how this is done). The role of the bridge is to create a data channel and forward traffic between the vnetx interfaces of the virtual switch and the physical interfaces. The objective is to enable the MAC address of the Guest_X machines to connect to the actual physical network/VLAN, as stated earlier. As a result, via bridge br3, we enable the virtual  servers Guest_02 and Guest_03 for the internal network and via br4, we connect Guest_01 to the external world. 

The practice

The previous section presented the theory. It's time now for the hands-on practical part. First of all, if you are dealing with a fresh installation, make sure you yum install the following groups, in order to have the full range of virtualization utilities and install your guests.

yum groupinstall Virtualization "Virtualization Client" "Virtualization Platform" "Virtualization Tools"

You should also install the bridge utilities, as they are needed:

yum install bridge-utils

The next thing you should ensure is that you have enough physical network interfaces on your VM host server. In order to implement our production scenario, Figure 2 indicates clearly that we need four Ethernet NIC ports: Two of them (eth2, eth3) are used to enable the server to have IP connectivity and routing on both networks. In contrast, eth4 and eth5 will be dedicated to carry the virtual server traffic.

We will not need IP addresses for interfaces eth4 and eth5. They will be brought up only to carry the bridged VM traffic. Make sure you identify the NIC ports properly and connect them to the proper network/VLAN Ethernet switch ports. To do that, you can remove their network cables and use the ethtool command to blink the NIC lights on the server side by doing a:
ethtool -p eth4

and
ethtool -p eth5 

to respectively identify the proper NIC ports. The next step is to connect them to the proper switch ports. In principle, once you identify the NIC port side with ethtool you should be OK. In practice, it is easy to make mistakes in messy/unlabelled network panels. Thus, after connecting the cables to the switch ports, one easy check is to bring the interface to promiscuous mode and watch for traffic indicating you are indeed on the right network/VLAN, by doing things like:
tcpdump -i eth4

and amongst the rest of the traffic, you would get something like the ARP or UDP broadcasts below confirming that eth4 is indeed on the internal network (Figures 2 and 3):

tcpdump: WARNING: eth4: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth4, link-type EN10MB (Ethernet), capture size 65535 bytes
16:51:47.089529 ARP, Request who-has intfn1.internal.net tell esxfarm.internal.net, length 46
16:51:47.407363 STP 802.1d, Config, Flags [none], bridge-id 8005.00:1e:14:e6:48:80.800a, length 43
16:51:49.936209 IP winsys01.internal.net.17500 > 255.255.255.255.17500: UDP, length 119
16:51:49.936588 IP winsys02.internal.net.17500 > 192.168.14.255.17500: UDP, length 119


Now that the cables are connected properly we can start configuring the Ethernet bridges. A bridge is just another interface and the best way to configure this on a RHEL 6 system is by getting your hands dirty. Go right under the /etc/sysconfig/network-scripts directory and use your favourite text editor (vim, nano, Emacs) to make two files, one for each bridge interface device

ifcfg-br3 with the following contents:
DEVICE=br3
BOOTPROTO=none
TYPE=Bridge
ONBOOT=yes
DELAY=0


ifcfg-br4 with the following contents:
DEVICE=br4
BOOTPROTO=none
TYPE=Bridge
ONBOOT=yes
DELAY=0



This takes care of the bridge interface declaration. What's left is to associate the newly defined bridges with the right physical interface. Thus, under the same directory (/etc/sysconfig/network-scripts), we create two more files:

ifcfg-eth4 with the following contents:
DEVICE=eth4
HWADDR=00:10:18:31:5A:5B
NM_CONTROLLED=no
ONBOOT=yes
BRIDGE=br3


ifcfg-eth5 with the following contents:
DEVICE=eth5
HWADDR=00:10:18:19:4F:5C
NM_CONTROLLED=no
ONBOOT=yes
BRIDGE=br4


In short, with these four files we ensure that we have a persistent config where all interfaces (bridges and physical ones) are up on boot and we associate br3 to eth4 and br4 to eth5 (Figure 3). Fans of the brctl utility could also achieve the same result by doing a:

brctl addbr br3
brctl addif br3 eth4
brctl addbr br4
brctl addif br4 eth5


At that point, it is good to issue a:

service network stop; service network start

and check that the bridges and physical interfaces are up and available by issuing an ifconfig command. If all is well, you should see output like the one below (I have excluded some of the non relevant output for length reduction purposes):

br3       Link encap:Ethernet  HWaddr 00:10:18:31:5A:5B 
          inet6 addr: fe80::210:18ff:fe31:5a4b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:386265 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46672357 (44.5 MiB)  TX bytes:578 (578.0 b)

br4       Link encap:Ethernet  HWaddr
00:10:18:19:4F:5C         

          inet6 addr: fe80::210:18ff:fe19:4f33/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:616409 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:58946648 (56.2 MiB)  TX bytes:578 (578.0 b)
...

eth4      Link encap:Ethernet  HWaddr
00:10:18:31:5A:5B 
          inet6 addr: fe80::210:18ff:fe31:5a4b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:600933 errors:0 dropped:0 overruns:0 frame:0
          TX packets:128158 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:270119283 (257.6 MiB)  TX bytes:10497306 (10.0 MiB)
          Interrupt:16

eth5      Link encap:Ethernet  HWaddr
00:10:18:19:4F:5C 
          inet6 addr: fe80::210:18ff:fe19:4f33/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:708614 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9547 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:96954226 (92.4 MiB)  TX bytes:986694 (963.5 KiB)
          Interrupt:16

...

Note that all relevant interfaces are up and do not have an IP address . The second thing you should note is that the each bridge interface has the same MAC address as the physical interface it is associated with.

If you have reached that point, you are almost done. What you need to do now is to build your virtual machines. I assume you are familiar with how to build VMs on virt-manager. If not, I have written a quick summary of the procedures. Alternatively, if you have already existing VMs, you could reconfigure their networking to use the bridge interfaces.

Figure 4

Figure 4 above illustrates the network config for Guest_02. Make sure that the 'Source device' is one the available vnet interfaces that connects to br3 and apply the changes. You can do the same for the rest of the virtual server VMs. When you are done, you can now check with the brctl utility the final configuration by doing a:

brctl show

and you should get output similar to the one below:

Figure 5

Note the interfaces column which should correctly list all the physical and vnet interfaces associated to each bridge.  When you fire up any of the virtual servers, you should be able to see it with its vnet's interface MAC address on the virtual network. Let's take Guest_02 as an example.  From our VM host server console, we type:

[root@vmserver ~]# ping win01
PING win01.internal.net (192.168.14.23) 56(84) bytes of data.
64 bytes from win01.internal.net (192.168.14.23): icmp_seq=1 ttl=128 time=2.13 ms
64 bytes from win01.internal.net (192.168.14.23): icmp_seq=2 ttl=128 time=0.518 ms
^C
--- win01.internal.net ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1360ms
rtt min/avg/max/mdev = 0.518/1.324/2.131/0.807 ms
[root@vmserver ~]# arp -a | grep win01
win01.internal.net (192.168.14.23) at 52:54:00:28:23:af [ether] on eth2


Note Guest_02's MAC address from Figure 4. That's the one replying and bridged into the internal network. This means that for all intents and purposes, Guest_02 is just another server on the internal network. Mission accomplished.

Happy KVM sponsored virtual server hosting!

10 comments:

  1. Good post. I might be viewing this simplistically, but it seems that your criticism of the RedHat docs can be summed up simply by saying: RedHat should not be referring to a software bridge as a "virtual switch".

    I may be completely missing some critical knowledge in regards to the inner workings of the libvirt system, but in my experience, they are mere bridges. I would even take exception to anyone referring to one as a "switch"--more like a hub. ebtables can be used to restrict cross-talk at layer-2, but that has nothing to do with the bridge module and its associated tools.

    On top of that there is software technology that actually does satisfy the definition of a "switch". openvswitch (http://openvswitch.org) is a good example.

    I believe if RH were to remove the buzz words from their docs, it would alleviate some of the confusion.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Hi Stephen,

      Thanks for your comments. My Redhat documentation 'rant' boils down to the fact that there should be an easier way to instruct folks to setup a virtual server. I agree with you that some buzzwords should be removed. In essence, there should be more concrete documentation about the concept of 'virtual networking'. At the moment, emphasis is given only on how to setup and tune a VM.

      GM

      Delete
  3. Thanks for sharing useful information,regarding virtual servers.virtual servers europe

    ReplyDelete
  4. managed virtual servers
    The most limiting factor to how many virtual machines can be run on a Virtual Server is the amount of physical memory in the hardware. The host operating system and each running virtual machine all require adequate memory. To calculate the total memory needed, you must allocate enough memory to each virtual machine to run its operating system (and its applications) in addition to the memory required by the host operating system.

    ReplyDelete
  5. i had a doubt on networking in kvm for a long time , thank a ton for such a clear explanation .

    ReplyDelete
  6. Yes very nice, thanks!!!

    ReplyDelete