How To Run a Multi-Node Cluster Database with Cassandra on Ubuntu 14.04
Apache Cassandra is a scalable, open source database system capable of reaching amazing performance on multi-node setups.
In the last tutorial, we taught you how to run a single-node Cassandra cluster. Now, we will teach you how to install and make Cassandra run a multi-node cluster on a Ubuntu 14.04
Since you are going to build a multi-node Cassandra cluster, you will have to determine the amount of servers you would like to have in your cluster and configure every one of them. It is suggested, but not required, that they have the same or similar specifications.
In order to use this tutorial, you will require the following things:
- Two Ubuntu 14.04 servers configured with our initial server setup guide.
- Every server has to be secured with a firewall.
- Every server needs to have Cassandra Installed using our Cassandra Installation guide.
Deleting Default Data
Servers in a Cassandra cluster are generally referred to as nodes. As of now, you will have a single node on each server. In this part of the tutorial, you’ll set up the nodes to work as a multi-node Cassandra cluster.
Every command in this and subsequent steps has to be repeated on every node in the cluster, so be sure to have as many terminals open as you have nodes in the cluster.
What you will want to do first is to run the command to stop the Cassandra daemon on each node.
sudo service cassandra stop
After that is done, delete the default data set.
sudo rm -rf /var/lib/cassandra/data/system/*
Configuring the Cluster
Cassandra’s configuration file is within the ‘/etc/Cassandra’ directory. The configuration file, ‘Cassandra.yam’, includes a lot of directives and is very well commented, you will want to edit that file to set up the cluster.
In front of you will be a list of directives which have to be edited in order to set up a multi-node Cassandra cluster:
- cluster_name: This is going to be the name of your cluster.
- -seeds: This is a comma-delimited list of the IP address of every node in the cluster.
- listen_address: That is the IP address which other nodes in the cluster will use to connect to this one. It defaults to ‘localhost’and has to be replaced to the IP address of the node.
- rpc_address: This is the IP address for remote procedure calls. It defaults to ‘localhost’. If the server’s hostname is configured correctly leave this as is. Otherwise, change to server’s IP address or the loopback address (127.0.0.1).
- endpoint_snitch: Name of the snitch, which is what tells Cassandra what its network looks like. This defaults to ‘SimpleSnitch’, which is used for networks in one datacenter. In our case, we’ll change it to ‘GossipingPropertyFileSnitch’ which is preferred for production setups.
- auto_bootstrap: This directive is not in the configuration file, so this will need to be added and set to ‘false’. This makes new nodes automatically use the right data. It is optional if you’re adding nodes to an existing cluster. Though once you’re initializing a fresh cluster, that is, one with no data.
Now you will use nano to modify the configuration file or any other text editor you would like.
sudo nano /etc/cassandra/cassandra.yaml
Look in the file for the directives we have mentioned and edit them as shown below so that it matches your cluster. Replace ‘your_server_ip’ with the IP address of the server you are working on currently. The ‘–seeds:’ list will have to be the same on each server and will include each server’s IP address separated by commas.
File: /etc/cassandra/cassandra.yaml . . . cluster_name: 'CassandraDOCluster' . . . seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "your_server_ip,your_server_ip_2,...your_server_ip_n" . . . listen_address: your_server_ip . . . rpc_address: your_server_ip . . . endpoint_snitch: GossipingPropertyFileSnitch . . .
At the bottom of the file, append the following directive: ‘auto_bootstrap’ by putting in the line below.
Once you are done editing this file, save and exit.
Afterwards, repeat this step for each server you want to have in the server.
Configuring the Firewall
Now the cluster should have been configured, however the nodes are not yet communicating. Now you must configure the firewall to allow Cassandra traffic.
To begin, restart the Cassandra daemon on each server.
sudo service cassandra start
If you check the status of the cluster, you will see that only the local node is going to be listed since it’s yet not prepared to communicate with the other nodes.
sudo nodetool status
Output: Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.1.3 147.48 KB 256 ? f50799ee-8589-4eb8-a0c8-241cd254e424 rack1
Note: Non-system key spaces don’t have the same replication settings, effective ownership information is meaningless
To grant communications, you will have to open the following network ports for every node:
- 7000, This is going to be the TCP port for commands and data.
- 9042, This is going to be the TCP port for the native transport server. cqlsh, the Cassandra command line utility, is going to connect to the cluster through this port.
To edit the firewall rules, open the rules file for IPv4.
sudo nano /etc/iptables/rules.v4
Copy and paste the following line inside the Input chain, this will allow traffic on the ports we mentioned earlier.
The IP address with the ‘–s’ flag has to be the IP address of a different node in the cluster. If you have two nodes with IP addresses 188.8.131.52 and 184.108.40.206, it is recommended that the machine uses the 220.127.116.11 IP address.
New firewall rule:
-A INPUT -p tcp -s your_other_server_ip -m multiport --dports 7000,9042 -m state --state NEW,ESTABLISHED -j ACCEPT
Once you’ve added the rule, save and exit the file before restarting IPTables.
sudo service iptables-persistent restart
Check the Cluster Status
You have now finished all the steps required to turn the nodes into a multi-node cluster. You may confirm that they are all communicating by looking at their status.
sudo nodetool status
Output: Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.1.4 147.48 KB 256 ? f50799ee-8589-4eb8-a0c8-241cd254e424 rack1 UN 192.168.1.6 139.04 KB 256 ? 54b16af1-ad0a-4288-b34e-cacab39caeec rack1
Note: Non-system key spaces do not have the same replication settings, effective ownership information is meaningless.
If you happen to see every node you have configured, this means you have successfully set up a multi-node Cassandra cluster.
You may also verify if you are able to connect to the cluster with ‘cqlsh’, the Cassandra command line client.
Reminder that you may specify the IP address of any node in the cluster with the command below.
cqlsh your_server_ip 9042
You will then see it connect.
Connected to My DO Cluster at 192.168.1.6:9042. [cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh>
Afterwards, you may quit the CQL terminal with the following.
Now you should have yourself a multi-node Cassandra cluster running on Ubuntu 14.04
If you are looking for more information about Cassandra, you may go to the project’s website. If you are in need of troubleshooting the cluster, the first place you should go to for clues is the log files located in the ‘/var/log/Cassandra’ directory.