Serving NFS on Amazon EC2

2016-02-23 by Oz Akan

howto

aws , nfs

UPDATE: Check out Amazon Elastic File System which might be a better alternative than managing your own NFS setup in many cases.

What is NFS?

You know what NFS is, but as for details you may check out Network File System on wiki. If you want to go further, NFS v4 specs are here. If you like history, 1989 specs for NFS is here.

Goal

Let’s say you want to run a highly available NFS service but you don’t want to have two instances up all the time as this service have an SLA which gives you a room of 5 minutes to fail over. So you make a decision to have a single instance and want to failover to another instance once there is a problem as quickly as possible within the 5 minute window.

Challenges of Running HA NFS on AWS

1- There is no shared block storage on Amazon Web Services. In a traditional environment you could export a block device from a highly available storage to two different servers and then have NFS service configured on these both. If primary server fails, secondary would take over after making sure primary server unmounted the block device likely by powering off the server using an API on the power switch or making SAN disconnect the storage from primary server. Secondary NFS server then would mount the block device, do file system checks and start NFS service after also moving the virtual IP address to its network interface.

On AWS we have Elastic Block Storage which can be attached to one EC2 instance. Also EC2 instance and the EBS volume has to be in the same availibilty zone.

2- On AWS, an EC2 instance can’t just assign an IP address to one of the network interfaces by calling an OS command like ifconfig. Still it can call the EC2 service API in order to ask for an IP to be assigned to one of the network interfaces (if it has many). This requires to have a mechanism on EC2 instance other than native linux commands. We will accomplish this by using EC2 roles and AWS CLI without a sweat.

3- There is no support for multicast and broadcast in AWS. Communication within the instances that will failover has to be unicast.

Designing The Solution

There are a few situations we have to take care of.

1- Instance fails in the zone: This is easy to overcome as an auto scaling group can check the health of the instance and start a new instance when there is a problem. We will just have to set min and max instance numbers to 1.

2- Instance is ok but NFS is not running: In this case we may try to restart the NFS service by running a service that will be periodically checking NFS. But what if NFS doesn’t start after a few tries? We may have a custom health metric which we report to CloudWatch. This metric can be used by autoscaling group and can trigger auto scaling operation.

3- There is a problem with the Availability Zone: In this case we will have to fail over to a different zone. Challenge here is we won’t be able to use the same IP address so we will use Route53 in order to change the IP address for the hostname we will use to point clients to our NFS service.

Steps for Building NFS Service

For some of us, most of the steps we will go through below are pretty basic but I still wanted to go one by one in order to have a complete overview on all required actions needed in order to have an NFS service with a failover capability.

Create VPC

I will create a VPC in us-west with a CIDR block of 10.10.10.0/24. This will support 256 IP addresses and we will break it into multiple subnets.

Open Amazon VPC console, and click Create VPC. We will name the VPC as nfs-test-vpc.

Create VPC

Create Security Group

We will also need a security group for the launch configuration later on.

On Amazon VPC Console, select Security Groups under Security and click Create Security Group. Give nfs-security-group as name, add a description and click Yes, Create. Create NFS Security Group

Create Subnets

We will break 10.10.10.0/24 into four subnets to have private and public subnets on two different availabilty zones. It is important to remember for each subnet, AWS will reserve fist four IP addresses and the last IP address.

We need four subnets so we will use netmask 26. This will give us the networks below:

10.10.10.0 / 26 -> private subnet 1
10.10.10.64 / 26 -> private subnet 2
10.10.10.128 / 26 -> public subnet 1
10.10.10.192 / 26 -> public subnet 2

As a reminder, public and private subnets are not really different subnet types in AWS but with configuration we will provide access to one of them from internet and we will call it public subnet. Private subnet won’t have access from internet.

Also, it is good to remember that AWS will reserve first four IP address and the last IP address. Taking 10.10.10.0/26 as an example;

10.10.10.0: Network address.
10.10.10.1: Reserved by AWS for the VPC router.
10.10.10.2: Reserved by AWS for mapping to the Amazon-provided DNS.
10.10.10.3: Reserved by AWS for future use.
10.10.10.63: Network broadcast address. AWS does not support broadcast in a VPC, therefore we reserve this address.

On Amazon VPC console, click Subnets / Create Subnet

Let’s create private subnet 1.

private subnet 1

Let’s create private subnet 2.

private subnet 1

Let’s create public subnet 1.

private subnet 1

Let’s create public subnet 2.

private subnet 1

Create EBS Volume

We will use an EBS volume to store the files which will be used by NFS service. If the instance that will run NFS service becomes unavailable for a reason, another instance will be created by auto scaling group and it will attach and mount this EBS volume.

Go to Amazon EC2 Console and select Volumes on navigation pane and then click Create Volume. For the sake of this document I left default values there but you may choose different volume type or size depending on the need. Click Create. Create EBS

While EBS volume is being created change the name of the volume to nfs-master-volume. Create EBS

When you see available in the state for this volume, we can attach it to an instance but it won’t have a file system on it. So before we can use this volume on our NFS server, we have to attach it to a temporary instance, mount, format, unmount and detach.

I won’t go over how to create an instance but just be sure to create the instance in the same zone you created the EBS volume. Create EBS

Once the instance is up, go back to Elastic Block Storage \ Volumes and attach nfs-master-volume to this temporary intance you just created. Create EBS

Verify device is there

[ec2-user@ip-172-31-41-87 ~]$ ls -1 /dev/sdf
/dev/sdf

Format the device

[ec2-user@ip-172-31-41-87 ~]$ sudo mkfs.ext4 /dev/sdf
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: b4983031-4fbd-423e-9c3b-0089ceeed476
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
    4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Create a folder and mount /dev/sdf

[ec2-user@ip-172-31-41-87 ~]$ sudo mkdir /mnt/nfs-master
[ec2-user@ip-172-31-41-87 ~]$ sudo mount /dev/sdf /mnt/nfs-master/

Check disk space

[ec2-user@ip-172-31-41-87 ~]$ df -h /mnt/nfs-master/
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdf        99G   60M   94G   1% /mnt/nfs-master

Create a signature file .NFSMASTER

[root@ip-172-31-41-87 mnt]# sudo touch /mnt/nfs-master/.NFSMASTER
[root@ip-172-31-41-87 mnt]# ls -la /mnt/nfs-master/
total 24
drwxr-xr-x 3 root root  4096 Feb 24 19:17 .
drwxr-xr-x 3 root root  4096 Feb 24 19:12 ..
drwx------ 2 root root 16384 Feb 24 19:13 lost+found
-rw-r--r-- 1 root root     0 Feb 24 19:17 .NFSMASTER

Unmount the volume

[ec2-user@ip-172-31-41-87 ~]$ sudo umount /mnt/nfs-master/

Finally detach the volume using Amazon EC2 Console. Create EBS

Our volume is ready to be mounted automatically.

Create User Script

Below you can find the script which will attach and mount the EBS volume.

#!/bin/bash

REGION=us-west-2
NFSVOLUME=vol-073769c7
DEVICE=/dev/sdf
DIRECTORY=/mnt/nfs
INSTANCEID=$(curl -s "http://169.254.169.254/latest/meta-data/instance-id")

# wait until volume is available
n=0
until [ $n -ge 5 ]
do
   aws ec2 describe-volumes --volume-ids $NFSVOLUME  --region $REGION|grep -q available && break
   echo "waiting $NFSVOLUME to be available"
   ((n += 1))
   sleep 2
done

# if the volume is available
if aws ec2 describe-volumes --volume-ids $NFSVOLUME  --region us-west-2|grep -qi available; then
  # attach the volume
  aws ec2 attach-volume --volume-id $NFSVOLUME --device $DEVICE --instance-id "$INSTANCEID" --region $REGION

  # wait until volume is attached
  n=0
  until [ $n -ge 5 ]
  do
     aws ec2 describe-volumes --volume-ids $NFSVOLUME  --region $REGION|grep -q attached && break
     echo "waiting $NFSVOLUME to be attached"
     ((n += 1))
     sleep 2
  done

  # if the volume is attached
  if aws ec2 describe-volumes --volume-ids $NFSVOLUME  --region $REGION|grep -q attached; then

      # create directory it not created before, then mount
      if [ ! -d "$DIRECTORY" ]; then
        sudo mkdir $DIRECTORY
      fi
      sudo mount $DEVICE $DIRECTORY
      # check if .NFSMASTER file is present
      if [ -f "$DIRECTORY/.NFSMASTER" ]; then
        echo "$DEVICE is attached and mounted at $DIRECTORY"
        df -h $DIRECTORY
      else
        echo "A problem occurred, $DEVICE is not attached and/or couldn't be mounted  "
      fi

    else
        echo "A problem occurred, $NFSVOLUME is not attached"
    fi


else
    echo "A problem occurred, $NFSVOLUME is not available"
fi

# start NFS service
echo "/mnt/nfs *(rw,sync)" > /etc/exports
service nfs restart


# Route53 update
LOCALIP=$(curl -s "http://169.254.169.254/latest/meta-data/local-ipv4")
DOMAIN=nfs.helloawsworld.com
HOSTEDZONEID=Z1G9ZB2OW5990Z


cat > /tmp/route53-record.txt <<- EOF
{
  "Comment": "A new record set for the zone.",
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "$DOMAIN",
        "Type": "A",
        "TTL": 60,
        "ResourceRecords": [
          {
            "Value": "$LOCALIP"
          }
        ]
      }
    }
  ]
}
EOF

aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONEID \
  --change-batch file:///tmp/route53-record.txt

Create an IAM Role

We want NFS server to be able to attach an EBS volume which requires instance to have a role with the access right to the API. So we will create a role and attach a policy to this role.

Create Policy

Open Amazon IAM Console. Click Policies / Create Policy.

Select Policy Generator Create policy

Select Amazon EC2 in AWS Services and select Attach Volume and Detach Volume actions. Create policy

Provide * for ARN and click Add Statement. Click Next Step. Create policy

Change the name of the policy to ec2-nfs-policy and click Create Policy.

Create Role using the Policy

Open Amazon IAM Console. Click Roles / Create New Role.

Name the role ec2-nfs-roleand click Next Step. create role

Select Amazon EC2 as role type and click Select. create role

Select ec2-nfs-policy we created in the previous step and click Next Step. create role

Review and click Create Role.

Finally we have the role that we will use with the launch configuration.

Configure AutoScaling

Autoscaling has a few components. Now we will create these in order.

Create Launch Configuration

Open Amazon EC2 Console.

On the navigation pane, under Auto Scaling, click on Launch configuration.

On the next page click Create Auto Scaling group Create Launch Configuration

Click Create Lunch Configuration Create Launch Configuration

Click Select next to 64-bit Amazon Linux AMI listed at the top. Create Launch Configuration

Select an instance type. Here I choose m4.large which has Moderate network performance. Click Next:Configure details. Create Launch Configuration

Enter nfs-launch-config into Name box, choose ec2-nfs-role as IAM Role and click Next: Add Storage Create Launch Configuration

Leave storage section as it is for now. Later we will attach an EBS volume. Click Create launch configuration.

Choose the key for the region. Create Launch Configuration

Enter nfs-autoscaling-group for group name and choose the VPC we created earlier. For subnet, choose two private subnets we created earlier.Finally click Next: Configure scaling policies. Create Launch Configuration

Don’t do anything on this screen and click Next: Configure Notifications Create Launch Configuration

Click new topic and enter nfs-autoscale and emails for recipients. Click Next: Configure Tags. Create Launch Configuration

You can have anything here. I just put Service:NFS key, value pair which may help later on when searching for resources for NFS service. Click Review. Create Launch Configuration

If all looks ok, click Create Auto Scaling Group Create Launch Configuration

Hopefully, you will see the notification about success. Click Close. Create Launch Configuration

Going back to auto scaling screen, hit refresh icon on the right top and you will see the instance up and running in a few minutes. Create Launch Configuration

Test What We Have

Let’s terminate the instance created by auto scaling group to see if another instance will be created automatically by auto scaling group.

On Amazon EC2 Console, select the instance and terminate. Test Terminate instance

If you go back to auto scaling screen, in a few minutes you will see a new instance is being created. Test Terminate instance

At this point we did a very basic auto scaling group. In order to make this setup useful for an NFS service, we will need to make an EBS volume and attach it to the instance during the boot.

Create EBS Volume

a big TODO here…