Percona Cluster on Ubuntu 12.10 (Quantal Quetzal)

Recently it's been a task of mine to setup a new SQL database cluster. My favourite database engine of late has been Percona's XtraDB, and so naturally I was quick to choose Percona Cluster as a solution.

This installation was a Cluster 5.5.28 install performed on a RackSpace public cloud server using a clean Ubuntu 12.10 image- your mileage with other platforms, OSes and Percona Server versions may vary. All commands are run as the root user, so you should sudo everything if you're running via a non-privileged account.

Unfortunately, documentation for Percona is scarcely distributed; by which I mean it's out there, but the information you'll need is spread over several websites. There are a couple of quick tutorials on their website, but those are laughably brief on details. So far as I can tell, Percona Cluster is actually just Percona Server (read: their XtraDB storage engine) with Galera running separately to handle the replication. I may be wrong, but the separation runs as deep as replication configuration being a case of linking directly to the Galera *.so file in my.cnf. Consequently, one needs to have a grasp of Galera as well as Percona in order to be able to set this up - luckily I have had some exposure to Galera before.

You can use the index below to jump around, or skip straight to the config file templates.

Secure the network

We'll assume you're using iptables here - if not, the port numbers 3306, 4444, 4567, and 4568 should be all you need to enable on your internal network.

If you've not yet secured your external interface, let's run through a default configuration quickly. Most likely, you'll just want a subset of the following ports open:

$ iptables -F INPUT                             # clear out all current input rules and start fresh
$ iptables -A INPUT -i lo -j ACCEPT             # accept all connections on loopback interface
$ iptables -A INPUT -i eth0 -m state --state ESTABLISHED,RELATED -j ACCEPT  # accept and continue all established connections
$ iptables -A INPUT -p tcp --dport ssh -j ACCEPT
$ iptables -A INPUT -p tcp --dport smtp -j ACCEPT   # NOTE: mailservers only
$ iptables -A INPUT -p tcp --dport www -j ACCEPT    # NOTE: webservers only
$ iptables -A INPUT -p tcp --dport https -j ACCEPT  # NOTE: webservers only
$ iptables -P INPUT DROP                        # set default policy to DROP. IMPORTANT: do this LAST or you'll lock yourself out!

Of course, if you're spinning up boxes to be used only as database servers then you might not want any of these ports open at all. But I digress: let's move on to the internal interface, which we should firewall as well (remember, RackSpace ServiceNet is public to all cluster customers)..

$ iptables -A INPUT -i eth1 -m state --state ESTABLISHED,RELATED -j ACCEPT  # accept and continue all established connections
$ iptables -A INPUT -i eth1 -p tcp --dport 3306 -j ACCEPT
$ iptables -A INPUT -i eth1 -p tcp --dport 4444 -j ACCEPT
$ iptables -A INPUT -i eth1 -p tcp --dport 4567 -j ACCEPT
$ iptables -A INPUT -i eth1 -p tcp --dport 4568 -j ACCEPT

If you wish, you can also add --source with a comma-delimited list of the IP addresses of every host in your cluster to ensure complete network security. Obviously this is best practise, but the downside is this becomes difficult to maintain when your cluster becomes large and you're adding and removing nodes all the time. It's up to your needs how you configure this, really.

When done, save your firewall configuration: iptables-save > /etc/iptables.rules

If this is the first time you've done this, then you'll be dismayed to know that these settings will only have an effect on the currently running session. Once your server reboots, all your firewall settings will be lost! Time to setup a script to automatically load them for you at boot time. Create a new file at /etc/network/if-pre-up.d/iptablesload, and add:

#!/bin/sh
iptables-restore < /etc/iptables.rules
exit 0

You should also chmod +x /etc/network/if-pre-up.d/iptablesload. Now your server will load your firewall settings whenever it connects to a network interface, and from now on running iptables-save > /etc/iptables.rules will update these settings for boot time as well.

Installation

Since in most cases you'll be setting up multiple servers for different purposes, I've broken down the install steps into 'All nodes' / 'Seed node' / 'Subsequent nodes'. Since the first node you boot up won't be connected to anything except itself, its configuration will be a little less complicated than the others. If you're planning on deploying a large cluster, you'll probably image the second node you create after configuring it and getting it to connect and then deploy further ones from there.

Following on from standard setups, I've also included instructions on a few other things you'll possibly want to do - get several nodes setup on a single box quickly for testing, and setup an arbiter node on a two-server cluster using garbd.

All nodes

Install packages

The first step is going to be getting the needed software on your servers, which luckily is pretty easy since Percona provide apt packages for you. You'll just need to sign their keys...

$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
$ gpg -a --export CD2EFD2A | sudo apt-key add -

...and then add the package sources to the end of your configuration in /etc/apt/sources.list:

# Percona Repositories
deb http://repo.percona.com/apt quantal main
deb-src http://repo.percona.com/apt quantal main

Once done, get a fresh package list with apt-get update and then install the required packages:

$ apt-get install percona-xtradb-cluster-client-5.5 percona-xtradb-cluster-server-5.5 xtrabackup

The xtrabackup SST module is optional, but really a requirement in any production deployment as it allows crashed nodes to 'catch up' data without locking the entire cluster, as can happen when using rsync or mysqldump. Some people report problems with xtrabackup, but it has performed fine for me without any configuration.

Bind to all interfaces

Once installed, your first edit to /etc/mysql/my.cnf should be to set bind_address = 0.0.0.0 so that the server is available on all interfaces (your firewall should take care of all access control anyway).

Seed node

Set root password

My experience was that the apt-get process fails to set the root password after installing Percona. If this happens to you, set one now with mysqladmin -uroot password "YOUR_PASS". Ensure a secure password as your SQL ports will be open to the internal network (unless you've set an explicit source IP whitelist in your iptables config, of course), and this will be the root password used by the entire cluster. The reasons for this are obvious when you think about it: the core MySQL users table will be replicated across the cluster along with all other data.

Configuration

While making the usual configuration edits to /etc/mysql/my.cnf, you should pay attention to the following for the first node you initialise:

  • wsrep_cluster_name = [YOUR_CLUSTER_NAME]
    The first node will define the name of your cluster (you may later use this identifier when connecting multiple clusters). This must be set the same in the config files of each subsequent node you create.
  • wsrep_node_name = [YOUR_NODE_NAME]
    Each node needs a unique name in the cluster to be identified by.
  • wsrep_node_address = [YOUR_INTERNAL_IP]
    Unless this is set, your nodes will probably broadcast their external IP address to other nodes as the address to connect with. I believe this is something to do with the internal DNS resolution order of the system, and the fact that Galera only performs a single DNS lookup on each hostname or IP before transmitting. Either way, it's best to set this here.
  • wsrep_cluster_address = gcomm:// should be set, exactly like this. This tells the node to only connect to itself as the first cluster member.

Subsequent nodes

Copy deb-sys-maint password

The first problem you'll encounter when booting a second node is an 'access denied' type error while checking its tables, so let's pre-emptively fix this. This is because the second node will be inheriting the database from the first, which will have a different autogenerated deb-sys-maint user password. You will have to inspect /etc/mysql/debian.conf on your seed node, make a note of its password and then set the same one in the corresponding file on your new node. I am pretty sure this will cover it - but if you still encounter the error when booting a second node for the first time, you can always login to the MySQL console and run SET PASSWORD FOR 'debian-sys-maint'@'localhost' = PASSWORD('XXXX'); to force them the same (yes, you can login even though it says the service failed to start - it didn't).

Configuration

Again, make your usual edits, but this time you should set:

  • wsrep_cluster_name to the same name as you've already defined on your seed node.
  • wsrep_node_name to a new unique identifier for this node.
  • wsrep_node_address, and
  • wsrep_cluster_address = gcomm://[ANY_OTHER_IP]
    This simply needs to be set to the IP of any other host in the cluster, and the node will connect there to exchange information and autodiscover all other members. You can also use wsrep_urls with a comma-separated list of the above (under the [mysqld_safe] section of my.cnf only) if you wish to specify a pool of IPs to attempt connection with.

Booting

Once your config file is ready, you should restart MySQL as it will already be running in standard (non-replicated) mode. Stopping the running service can be interesting - if you see a red 'fail' when running service mysql stop then you may have to ps aux | grep mysql and kill off all the running server processes first. After that, it's service mysql start and you should be running! You should probably be tailing /var/log/mysql/error.log by this point, that window will be invaluable to you as you continue through your next nodes.

Other stuff

Running multiple instances on a single server

Running several nodes on one machine is just a case of running multiple instances of the mysql daemon. For each node you wish to create, make a separate .cnf file somewhere (but NOT in /etc/mysql/conf.d/ or it'll be included in my.cnf) with different values for: port, socket, pid-file, datadir and (if you wish) log_error.

There will be a couple of extra configuration variables you need to set to enable the node to run on the same host without conflicting ports:

  • wsrep_sst_receive_address = [OUR_IP]:[DIFFERENT_PORT]
  • wsrep_provider_options = "gmcast.listen_addr=tcp://[OUR_IP]:[DIFFERENT_PORT];"

This tells the new node to listen on a different port for replication and receiving SST transfers. The ports for each setting should not be the same.

You may also copy the /etc/init.d/mysql script to another file if you wish, and add a --defaults-file argument to the underlying command it executes. This will allow you to use service [NEW_FILENAME] stop/start/reset etc as with mysql.

A note on using wsrep_sst_method = xtrabackup: you cannot use this method easily when running a cluster on a single machine, I had it conflict a lot with itself and generally fail to run. Since all your nodes are on the same server, however, it's perfectly fine to use rsync as your SST method. Besides, you're only testing stuff :p

Running arbiter nodes

Arbiter processes are executed by the command garbd. All parameters are passed as arguments to the command, there is no configuration file required. A brief list of the options you'll need:

  • -a: cluster connection address (like wsrep_cluster_address). This is also where you specify extra options to the Galera daemon.
  • -g: the name of the cluster to join, which matches wsrep_cluster_name in the *.cnf files.
  • -l: sets the location for logfile to write to.
  • -d: if this flag is present, the command will fork to spawn a daemon instead of running inline.

When run on the same machine as a database node, the most important thing to note is that the arbiter needs to listen on a different port to the mysqld process. You can use the -a switch to specify these ports, using a Galera options string: gcomm://[OUR_INTERNAL_IP]:[DB_NODE_LISTEN_PORT]?gmcast.listen_addr=tcp://0.0.0.0:[ARB_NODE_LISTEN_PORT]. The DB listen port is probably 4567 if set as the default.

my.cnf template

All these entries go under the [mysqld] section of the config file.

# General replication options, same on all nodes:

wsrep_provider = /usr/lib/libgalera_smm.so  # your path *may* differ, but shouldn't
wsrep_slave_threads = 8                 # to taste...
binlog_format = ROW
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = 1
wsrep_sst_method = xtrabackup   

# Cluster options:

wsrep_cluster_name = MY_CLUSTER
wsrep_cluster_address = gcomm://[ANY_OTHER_NODE_IP]

# Node options:

wsrep_node_name = MY_UNIQUE_NAME
wsrep_node_address = MY_INTERNAL_IP