...
Featured image of post How-To Setup GlusterFS Thin Arbiter at HomeLab

How-To Setup GlusterFS Thin Arbiter at HomeLab

In this article you can find a way how-to get functional High Availability GlusterFS Replicated volume config with only two regular bricks to save resources and money...

For High Availability HomeLab, due to quorum requirements, minimum nodes count is three. But if the HomeLab is on budget then often is used some low power and resource consuming node as cluster Arbiter. This Arbiter is used just for quorum management purposes and nothing else.

TL;DR

I have only 2 regular cluster nodes where I’m running Proxmox. and the Proxmox uses Corosync for the cluster nodes management. Corosync has well known possibility to define one node only as Arbiter by the corosync-qnet service.

But the Corosync is only one piece of puzzle of th HA cluster. Next one is Replicated filesystem. For small budget sensitive HomeLab, GlusterFS is nice choice. From the GlusterFS documentation the Thin Arbiter is here also available. But when I started to looking for a way how to make GlusterFS Thin Arbiter Up and Running, I have been faced to very low amount of unclear resources. But “Person grows with tasks”. Then I started some research activities and finally I can share this guide with you.

Thin Arbiter Node

Important Notice

I’m using AlmaLinux distribution for my VM. I run the Thin Arbiter as VM on my HomeLab NAS and then the guide is focused on AlmaLinux (or other CentOS based) distribution. But I believe that it can be used on other Linux flawors without any complications too.

Enable GlusterFS Repository on AlmaLinux

The GlusterFS binaries are part of the CentOS SIG (Special Interest Group) repository. So we have to enable this repository. Additionally, the GlusterFS version on Proxmox 8 (what I’m currently using) is 10, then I’m using same version on my Thin Arbiter Node:

1
sudo dnf install centos-release-gluster10

Now we are ready for Thin Arbiter “Magic”. If you take look into the GlusterFS Thin Arbiter documentation, then you discover cryptic command:

1
/usr/local/sbin/glusterfsd -N --volfile-id ta-vol -f /var/lib/glusterd/vols/thin-arbiter.vol --brick-port 24007 --xlator-option ta-vol-server.transport.socket.listen-port=24007

which doesn’t work. For example it expects a volume file, but this file doesn’t exist. If you try to find a recipe how-to create the file or where to get one, you don’t get clean answer. I found a clue by one issue related to GlusterFS on the Github where a setup-thin-arbiter.sh script has been mentioned. Then I found this at the /extras/ sub-folder of the glusterfs repository where the requested thin-arbiter.vol file resides also.

Finally, I have found fosdem presentation related to the Thin Arbiter and this pushed me to the right way. So, here is the recipe:

Install and Configure the Thin Arbiter

Thin Arbiter has own package and then we can install this as any other Linux package:

1
sudo dnf install glusterfs-thin-arbiter

Now we can use the setup-thin-arbiter.sh script. You can run it without parameter to get some additional info but we can go directly to the setup process:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sudo /usr/share/glusterfs/scripts/setup-thin-arbiter.sh -s

******************************************************
User will be required to enter a path/folder for arbiter volume.
Please note that this path will be used for ALL VOLUMES using this
node to host thin-arbiter. After setting, if a volume
has been created using this host and path then path for
thin-arbiter can not be changed
******************************************************

Enter brick path for thin arbiter volumes:

You can specify here any path what you want, but based on the GlusterFS Brick Naming Conventions I’m using this path: /data/glusterfs/brick_ta

1
2
3
4
5

Entered brick path : /data/glusterfs/brick_ta
Please note that this brick path will be used for ALL
VOLUMES using this node to host thin-arbiter brick
Want to continue? (y/N):

Type Y here

1
2
3
4
5
6
7
8
y

Directory path to be used for thin-arbiter volume is: /data/glusterfs/brick_ta

========================================================
Starting thin-arbiter process
Created symlink /etc/systemd/system/multi-user.target.wants/gluster-ta-volume.service → /usr/lib/systemd/system/gluster-ta-volume.service.
thin-arbiter process has been setup and running

Voila ! The Thin Arbiter is up and running as the Linux service. You can verify this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sudo service gluster-ta-volume status

Redirecting to /bin/systemctl status gluster-ta-volume.service
● gluster-ta-volume.service - GlusterFS, Thin-arbiter process to maintain quorum for replica volume
     Loaded: loaded (/usr/lib/systemd/system/gluster-ta-volume.service; enabled; preset: disabled)
     Active: active (running) since Fri 2023-08-25 06:58:37 UTC; 8min ago
   Main PID: 3287 (glusterfsd)
      Tasks: 14 (limit: 12082)
     Memory: 16.8M
        CPU: 51ms
     CGroup: /system.slice/gluster-ta-volume.service
             └─3287 /usr/sbin/glusterfsd -N --volfile-id ta -f /var/lib/glusterd/thin-arbiter/thin-arbiter.vol --b>

Aug 25 06:58:37 bbb systemd[1]: Started GlusterFS, Thin-arbiter process to maintain quorum for replica volume.

Now, the Thin Arbiter node is ready to accept connections from regular GlusterFS nodes for your volume(s). Yes, you can configure more then one GlusterFS volume with this Thin Arbiter and only this one brick as you can see from setup script messages. We just need to enable this Thin Arbiter communication on the firewall. The Thin Arbiter doesn’t need to enable same ports as regular glusterFS node and then is enough to allow the TCP 24007 port only. You can restrict this rule also only to IP addresses of the regular nodes, but it’s not necessary in cluster protected environment.

1
2
firewall-cmd --permanent --zone=public --add-port=24007/tcp
firewall-cmd --reload

Let’s continue to setup regular GlusterFS data nodes, in my case, Proxmox instances.

Regular Nodes

HD or SSD drives on regular nodes must be configured and mounted to allow to create GlusterFS Volume Bricks at first. This is well docummented and straighforward process but I provide here quick how-to steps:

Install the GlusgterFS Server on each Proxmox node if not already installed:

1
sudo apt-get install glusterfs-server

List harddrives available on the system to identify the disk for GlusterFS brick(s):

1
sudo lsblk -o NAME,SERIAL,MODEL,TRAN,TYPE,SIZE | grep disk

In my case, the disk designatged for GlusterFS is /dev/sdb.

I can use whole disk for the GlusterFS and then I’m overwrite whold disk partition table, and create new partition with 100% disk size. The new partition table must be GPT type. I’m fan of as optimal as possible ways to getting things done, so here is command to do al this in one step:

1
echo ';' | sudo sfdisk /dev/sdb

WARNING!! the command doesn’t prompt for anything then be sure that you specify correct disk! If you are unsure, or can use different partition scheme, I recommend to use classical fdisk way. You can find many tutorials how to do that on the Internet.

Format the new partition by xfs filesystem:

1
sudo mkfs.xfs -f -i size=512 -L glusterfs_brick1 /dev/sdb1

You can change the partition label “glusterfs_brick1” on whatever you want and fit your disk management system.

For next step we must identify the partitioon GUID to allow to mount it correctly:

1
2
3
sudo lsblk -o NAME,UUID | grep sdb1
|
--/dev/sdb1 0fc63daf-8483-4772-8e79-3d69d8477de4

Because the GlusterFS is user space filesystem, then we need to mount the partition now. The GlusterFS documentation recommends to create mount points at /data sub-folders. See GlusterFS Brick Naming Conventions. Based on this I have created this mount point for my GlusterFS pve volume:

1
sudo mkdir -p /data/glusterfs/pve/

And then add this line to the /etc/fstab to mount previously created partition to this mount point. UUID must be the UUID discovered by lsblk command above, in my case 0fc63daf….

1
UUID=0fc63daf-8483-4772-8e79-3d69d8477de4 /data/glusterfs/pve/ xfs defaults 0 2

now, verify the result of previous operations:

1
2
3
4
5
sudo mount /data/glusterfs/pve/
sudo systemctl daemon-reload
sudo lsblk -o NAME,FSTYPE,MOUNTPOINT | grep sdb1
|
─sdb1       xfs         /data/glusterfs/pve

We can see, that sdb1 XFS partition is mounted at /data/glusterfs/pve

Finally, set correct attributes to the GlusterFS volume locations:

1
2
sudo setfattr -x trusted.glusterfs.volume-id /data/glusterfs/pve
setfattr -x trusted.gfid /data/glusterfs/pve

If you don’t provide this, GlusgterFs will complain during brick creation process.

!!! Do not create brick subfolders for your volume here. This will be done later during volume bring-up process !!!

Now we must allow GlusterFS communication on the Proxmox Firewall. Because the Proxmox 8 is using GlusterFS version 10, here is one very important change in networking behavior:

From Gluster-10 onwards, the brick ports will be randomized. A port is randomly selected within the range of base-port to max-port as defined in the glusterd.vol file and then assigned to the brick. For example: if you have five bricks, you need to have at least 5 ports open within the given range of base-port and max-port. To reduce the number of open ports (for best security practices), one can lower the max-port value in the glusterd.vol file and restart glusterd to get it into effect.

Because I don’t like too much randomness in my network environment, I have restricted this new random port range little bit by modyfying the /etc/glusterfs/glusterd.vol file. GlusterFS uses one port per brick, so this port range must be equal or higher than bricks count:

1
2
3
4
5
....
#   option transport.address-family inet6
    option base-port 49152
    option max-port  49155
end-volume

Finally, create appropriate firewall rules to allow GlusterFS server communicate with other bricks. Please allow these ports, bricks port range modify to fit your glusterd.vol file modifications or set it to default range 49152-60999 If you do not modified the glusterd.vol file.

Required ports:

  • 24007 - GlusterD
  • 24008 - GlusterD RDMA port management
  • 49152~49155 - Brick ports

Start GlusterFS service:

1
2
sudo systemctl enable glusterd
sudo systemctl start glusterd

And verify that service is correctly runing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
sudo systemctl status glusterd

● glusterd.service - GlusterFS, a clustered file-system server
     Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; preset: enabled)
     Active: active (running) since Fri 2023-08-11 13:07:45 CEST; 1 week 6 days ago
       Docs: man:glusterd(8)
    Process: 396193 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS>
   Main PID: 396195 (glusterd)
      Tasks: 37 (limit: 38343)
    ......
Aug 11 13:07:44 pve1 systemd[1]: Starting glusterd.service - GlusterFS, a clustered file-system server...
Aug 11 13:07:45 pve1 systemd[1]: Started glusterd.service - GlusterFS, a clustered file-system server.

Now we have ready all bricks for our GlusterFS wall. Let’s go to join them together.

Getting Up the Thin Arbiter GlusterFS Volume

For this task purpose, expect these nodes and it’s appropriate DNS or /etc/hosts records:

  • pve1-storage
  • pve2-storage
  • thin-arbiter

Important notice:

These steps must be provided on one of the real GlustgerFS nodes, not on the Thin Arbiter node!! I’ll provide full setup on the pve1-storage node.

At first, inform nodes about each other and establish it’s trust:

1
2
sudo gluster peer probe pve2-storage
sudo gluster peer probe thin-arbiter

Now we can create the GlusterFS volume. !!The Thin Arbiter brick must be specified as last brick on the command line!! You can see that I’m not use force parameter what is very common on other GlusterFS recipes. It’s because we don’t create brick sub-folders and set correct volume folder attributes by xattr in previous steps. I think that let things go through without warnings or errors is the best way to getting things done:

1
gluster volume create pve transport tcp replica 2 thin-arbiter 1 pve1-storage:/data/glusterfs/pve/brick1 pve2-storage:/data/glusterfs/pve/brick2  thin-arbiter:/data/glusterfs/brick_ta

Finaly, start the volume and verify it’s status:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sudo gluster volume start pve
sudo  gluster volume status pve
Status of volume: pve
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1-storage:/data/glusterfs/pve/bric
k1                                          49154     0          Y       396231
Brick pve2-storage:/data/glusterfs/pve/bric
k2                                          49152     0          Y       367767
Self-heal Daemon on localhost               N/A       N/A        Y       396248
Self-heal Daemon on pve2-storage            N/A       N/A        Y       367784

Task Status of Volume pve
------------------------------------------------------------------------------
There are no active volume tasks

You can see that Thin Arbiter is not mentioned here as a node, same as if you try to show peer status:

1
2
3
4
5
6
gluster peer status
Number of Peers: 1

Hostname: pve2-storage
Uuid: a6226ade-97a5-4d28-a004-b593fdb40949
State: Peer in Cluster (Connected)

How we can verify that the Thin Arbiter is working correctly then? Of course, you can verify quorum by shuttting down nodes and writing to the volume, but the easiest way id to verify that a special Thin Arbiter Volume file is present at it’s brick location. So login to the thin-arbiter node and execute:

1
2
3
cd cd /data/glusterfs/brick_ta/
ls -l
-rw-rw-r--. 2 root root 0 Aug 11 12:52 trusted.afr.pve-ta-2.aabb0110-fe6a-4497-aadd-19223322fee5

If you can see this special file with beginning trusted.afr.<volume_name>-ta-2.xxxx followed by GUID then the Thin Arbiter is functional, because the file is used to maintain volume arbiter status.

If you are curious, you can verify that everything works by mount a GlusterFS client to the volume and try to write to the volume during shutting down some nodes. With Thin Arbiter write to node even the second one is down is still possible and when you bring up the node back, node will be healed and both voilume bricks will hold exactly same data.

So, that’s all about the GlusterFS and Thin Arbiter. I believe that you will find this article useful.

Built with Hugo
Theme Stack designed by Jimmy