Tuning Linux for vSphere 6.0 NFS v4.1 Datastore Performance

*Preliminary document - subject to change*. Much of this has been ported from NFSv3 and may change over the next few weeks. Constructive comments welcome. The original document is http://veerapen.blogspot.com/2011/09/tuning-redhat-enterprise-linux-rhel-54.html


NFSv4.1 Linux Server Support

NFSv4.1 is supported as of Redhat Linux 7.0

Mixing NFS v3 and NFS v4.1 Words of Caution

Do not present the same NFS volume as NFS v3 to one ESXi host, and NFS v4.1 to another ESXi host. Mounting one or more hosts to the same NFS datastore using different NFS versions will lead to data corruption. NFS v3 uses client side co-operative locking. NFS v4.1 uses server-side locking.


Configuring the Linux Kernel for Hardware RAID

On my system, the 2.7TB datastore was in a RAID 5 served by an HP P822 SAS controller. Initially, the average I/O speed did not exceed 15MB/s. Network speed was 1GBit/s, without NIC bonding.

The first point to note here is that the RAID is hardware based. There is absolutely no need for the Linux kernel to perform any kind of disk buffering. Let the RAID controller handle all of the disk I/O.

Configuring the Linux scheduler and changing the default from [cfq] gives I/O improvements. On my system, this change alone with NFSv3 increased average I/O speeds to 54MB/s with a maximum read speed of 95MB/s.

echo "noop" > /sys/block/cciss\!c0d0/queue/scheduler
echo "noop" > /sys/block/cciss\!c0d1/queue/scheduler

The RAID controller and Network cards must not be on shared interrupts.

To check for shared interrupts:
cat /proc/interrupts

Stop all unnecessary services as shown in the example below.

chkconfig --list
chkconfig --level 3 xfs off
chkconfig --level 3 sendmail off
chkconfig --level 3 gpm off

Increasing Linux NFSD Threads for ESXi Performance

For heavy workloads you will need to increase the number of NFS server threads. Use 128 threads as a starting point. 

/etc/sysconfig/nfs
RPCNFSCOUNTD=128

The thread value can be configured in real time
echo 128 > /proc/fs/nfsd/threads
ps ax | grep nfs

Monitoring Linux NFS Server Thread Performance

Now you will need to monitor the performance of those 128 threads.

cat /proc/net/rpc/nfsd

From all those lines, we are interested with the line that starts with "th"

For example:
th 128 133007 2233.461 1127.896 3895.966 462.992 [...keeps on going]

128 = number of nfs server threads
133007 = number of times that all 128 threads had work to do. 

The last 3 numbers which indicate usage of 80%, 90%, 100% must be low. If they are high, means your system needs additional NFS threads to cope with the load.

Configuring RAID Controllers

Using the nfsstat command, we saw 80% reads and 20% writes on /mnt/nfs. Set RAID Controller cache ratio to 75% read and 25% write.

Configuring Linux Filesystems

For faster writes to RAID 5, use the ext4 data=journal option and prevent updates to file access times which in itself results in additional data written to the disk.

/dev/cciss/c0d1 /mnt/nfs ext4 defaults,noatime,data=journal

Configuring Linux NFS Exports

Configure the nfs threads to write without delay. Let the RAID controller handle the job. Although using "async" might produce better results, for data integrity I prefer using the "sync" option.

/etc/exports
/mnt/nfs *(rw,fsid=0,insecure,all_squash,sync,no_wdelay)


NFS Version 4 Caching Features

The data and metadata caching behavior of NFS version 4 clients is similar to that of earlier versions. However, NFS version 4 adds two features that improve cache behavior: change attributes and file delegation.

The change attribute is a new part of NFS file and directory metadata which tracks data changes. It replaces the use of a file's modification and change time stamps as a way for clients to validate the content of their caches. Change attributes are independent of the time stamp resolution on either the server or client, however.

A file delegation is a contract between an NFS version 4 client and server that allows the client to treat a file temporarily as if no other client is accessing it. The server promises to notify the client (via a callback request) if another client attempts to access that file. Once a file has been delegated to a client, the client can cache that file's data and metadata aggressively without contacting the server.

File delegations come in two flavors: read and write. A read delegation means that the server notifies the client about any other clients that want to write to the file. A write delegation means that the client gets notified about either read or write access.

Servers grant file delegations when a file is opened, and can recall delegations at any time when another client wants access to the file that conflicts with any delegations already granted. Delegations on directories are not supported.

In order to support delegation callback, the server checks the network return path to the client during the client's initial contact with the server. If contact with the client cannot be established, the server simply does not grant any delegations to that client.