Thursday, September 1, 2016

Ceph Monitors Deadlock

Introduction
A part of my role scope is general maintenance of the QE department Ceph cluster. Usually it doesn’t consume a lot of my time, apart from general monitoring, providing keyrings and pools, etc, you know, general management.
‘My’ cluster has 3 servers with 9 disks each:
  • 2 for the OS, RHEL 7 (RAID 1)
  • 1 SSD for journaling
  • 6 disks as OSDs
And it works pretty well.

The Problem

A colleague asked me a question first thing in the morning about the relations of Ceph and Openstack, as a huge believer of teaching by examples I logged in into one of the servers and ran rbd command, showing the list of images in the pool.  
$ sudo rbd -p <pool name> --id <client> ls
The client failed to connect to the monitors, all 3 of them
2016-09-01 14:00:04.946448 7f2a2d2cc700  0 -- <IP address>:6789/0 >> <IP address>:6789/0 pipe(0x4cee000 sd=13 :0 s=1 pgs=0 cs=0 l=0 c=0x4967080).fault

Troubleshooting

What I could go with

First of all, when the RBD client fails to connect, it probably mean that the ceph client will not be effective as well. Thus no reason, IMO, to check for cluster health
$ sudo ceph health
Cause the reply will be the same.
The first thing on my mind is checking the monitors daemon status in all servers in the cluster
$ sudo service ceph status mon
The result was
=== mon.ceph1 ===
mon.ceph1: not running.
OK, then the daemon is down, let us bring it back up
$ sudo service ceph start mon
No joy - the daemon is down.
After that, I went to the Ceph’s monitor log, /var/log/ceph/ceph-mon-ceph.1.log it show me these following log entries. Two messages were highlighted, in my eyes, starting with:
2016-09-01 09:23:49.490950 7efd5a7137c0 -1 WARNING: 'mon addr' config option 10.35.65.98:0/0 does not match monmap file continuing with monmap configuration
With this line as the punchline:
2016-09-01 09:23:49.762012 7efd5021c700  0 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed:-8190
So the problem is either with the monitors keyring, meaning it failed with the authentication, or there’s a problem with the monitors map configuration.

Dead ends (but should be checked)

  • The keyrings of the monitors were identical, no authentication problem (still might be permission issue, daemon fails to read file)
  • NTP service is up and running, all the clocks are in sync  

The Solution

Fixing this issue required me to use monmaptool command, for that I used
Though Sébastien Han recommends not to do it on a live cluster, I did it anyhow, with the minor risk of data lost in a staging environment.  
I got the cluster FSID from /etc/ceph/ceph.conf and created a new monmap with monmaptool
$ sudo monmaptool --create --add ceph1 <IP address>:6789 --add ceph2 <IP address>:6789 --add ceph3:6789 --fsid <Ceph’s cluster FSID> --clobber monmap
Once the file is available, I copied it to all the servers in the cluster and stopped all Ceph’s daemons
$ sudo service ceph stop


Now that the cluster is down and out, I can inject the newly created map to the monitors
$ sudo ceph-mon -i ceph<X> -inject-monmap monmap
Timidly I started the monitors daemons together (as much as I could) and behold!
=== mon.ceph1 ===
mon.ceph1: running {"version":"0.94.5-9.el7cp"}
Afterwards the rest of the Ceph’s daemons are available
$ sudo service ceph start
And the cluster status is HEALTH_OK

Thursday, March 31, 2016

Connect VMs running on KVM to VLAN with OpenVswitch


Background

In the course of my work I had to PXE boot on virtual machines that are running on 2 different physical servers. The limitations of the lab that I am working in would not allow me to run DHCP and PXE on the lab’s network. I did have a VLAN trunk connected to the physical servers at my disposal. At this stage I searched for a solution which would connect all VMs to the same network, to rise DHCP and PXE server and deploy the OS with it on them.

Why I chose OpenVswitch?

Due to my familiarity to Openstack I knew that OVS has these abilities that I needed. I presumed that it will be fairly easy to make it work, I was mistaken. The documentation about OVS and virtual machine is available but there was nothing that suits my exact needs which were pretty basic.

OpenVswitch Basic Commands

Presuming that OVS is installed there are some basic commands that one should be familiar with.
ovs-vsctl show - display the detail of bridges and ports
ovs-vsctl add-br <bridge name> - create bridge
ovs-vsctl del-br <bridge name> - delete bridge
ovs-vsctl add-port <bridge> <device> - add port to bridge
ovs-vsctl del-port <bridge> <device> - delete port in bridge
I know these commands seems trivial, but new users will find this helpful.

Disclaimers and Assumptions

I  work on RHEL 7, and I use Libvirt. If you’re using other Linux distributions, please make the necessary adjustments.


I don’t like  network manager, I’m not use to it and it only interfere with what I did, so please, please, 
please stop and disable the network manager service.
systemctl stop NetworkManager
systemctk disable NetworkManager

I do assume that Libvirt and OpenVswitch are installed and running on the machine.
yum install libvirt openvswitch -y
systemctl enable libvirt
systemctl enable openvswitch
systemctl start libvirt
systemctl start openvswitch

I’ll name all the bridges of the OVS with the tag of the VLAN, for an example, VLAN with the tag 101 will be named br-101.

Preparing the Physical Machines

As I said, I had 2 physical servers each with a dual port NIC, one port connected to lab network the other port to vlan trunk. The trunk’s range in this example is 1-10, I’ll use just two here, to simplify matters.

Creating Devices for the VLANs

When we start, the network configuration of the physical machines is as follow
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
   inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever

2: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
   link/ether 00:8c:fa:02:91:0a brd ff:ff:ff:ff:ff:ff
   inet 192.168.0.124/24 brd 10.35.160.255 scope global dynamic enp2s0f0
      valid_lft 35723sec preferred_lft 35723sec
   inet6 2620:52:0:23a0:28c:faff:fe02:910a/64 scope global mngtmpaddr dynamic
      valid_lft 2591974sec preferred_lft 604774sec
   inet6 fe80::28c:faff:fe02:910a/64 scope link
      valid_lft forever preferred_lft forever
3: enp2s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
   link/ether 00:8c:fa:02:91:0b brd ff:ff:ff:ff:ff:ff
   inet6 fe80::28c:faff:fe02:910b/64 scope link
      valid_lft forever preferred_lft forever


The trunk is connected to NIC enp2s0f1. This NIC network script should configured: DEVICE=enp2s0f1
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet


For each VLAN create a network script in /etc/sysconfig/network-scripts/ifcfg.<vlan tag>, for an example, with VLAN tag 1:
DEVICE=enp2s0f1.1
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
VLAN=yes"


Restart the network service
systemctl restart network
The end result is
ip a ...
3: enp2s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
   link/ether 00:8c:fa:02:91:0b brd ff:ff:ff:ff:ff:ff
   inet6 fe80::28c:faff:fe02:910b/64 scope link
      valid_lft forever preferred_lft forever
4: enp2s0f1.1@enp2s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP
   link/ether 00:8c:fa:02:91:0b brd ff:ff:ff:ff:ff:ff
   inet6 fe80::28c:faff:fe02:910b/64 scope link
      valid_lft forever preferred_lft forever
5: enp2s0f1.2@enp2s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP
   link/ether 00:8c:fa:02:91:0b brd ff:ff:ff:ff:ff:ff
   inet6 fe80::28c:faff:fe02:910b/64 scope link
      valid_lft forever preferred_lft forever

Create OpenVswitch Bridge

Create a bridges for VLANs 1
ovs-vsctl add-br br-1
Create a port that connects the bridge to the VLAN
ovs-vsctl add-port br-1 enp2s0f1.1@enp2s0f1
After the creation of the bridges and the ports additional interfaces will be visible.

Create Network in Libvirt

Creating a network in Libvirt requires the following XML file to define the network.

<network>
 <name>br-1</name>
 <forward mode='bridge'/>
 <bridge name='br-1'/>
 <virtualport type='openvswitch'/>
</network>
Save this XML file with a simple name, for an example: br-1.xml

Define the network in Libvirt
virsh net-define br-1.xml
Start the network
virsh net-start br-1
Set the network to autostart (if not, the next time you’ll restart the Libvirt service the network will not be available)
virsh net-autostart br-1

Now Libvirt has an available network connected to VLAN and virtual machines don’t need NAT no more.