This report has been written due to a number of write latency related bugs I’ve witnessed in recent versions on ONTAP 7-mode. It’s a modified version of an internal report that I wrote up in some free time. I thought I’d summarise my findings in case others are seeing similar issues.

Symptoms

  • High Write latency alarms being received.
  • Some write latencies up to 180ms (yes, 180 milliseconds)
  • CPU usage is not excessive

Diagnostics

  •  High write latency is sometimes associated with high CPU usage, but in this case CPU usage is <100% across all cores (priv set diag; sysstat -M)
  • IOPS going through the system is not abnormal (as in, it’s done a similar amount of IOPS, or more, in the past)
  • Network throughput on the system is not maxed out
  • NetApp perfstat analysis (at first level support) comes back as “system is being pushed to its limit, it’s probably time to upgrade” or “everything seems fine”. If you get this response, ask them to check the signature against the bugs below.

Known bugs

There are a number of known bugs in ONTAP 7-mode that can cause this, based on experience. I’ve tried to summarise these below.

ONTAP 8.2.2P2

Bug 855574: Sequential appends to user file results in excessive write latency

Details:

  • Present in ONTAP 8.2.2P2
  • Introduced in 8.2 codebase (8.1 is immune)
  • Fixed in ONTAP 8.2.3P3 onwards

Recommendation:

  • Upgrade to ONTAP 8.2.4P6

ONTAP 8.2.3P3, possibly earlier versions too

Bug 647449: Use of default quota rules can impact I/O latency and throughput

Details:

  • Present in ONTAP 8.2.3P3
  • Fixed in ONTAP 8.2.3P4

Recommendation:

  • Upgrade to ONTAP 8.2.4P6

ONTAP 8.2.3P3, ONTAP 8.2.3P4, ONTAP 8.2.3P6

Bug 928593: Write operations are not performed resulting in severe write latencies

Details:

  • First introduced in ONTAP 8.2.3P3. Remains in subsequent 8.2.3 P-releases
  • Mostly fixed in ONTAP 8.2.4. Waiting on P1 or P2 for more complete fix.
  • This bug is a direct relation of 855574, so if your workloads were “tickling” that bug you may see this bug, too.

Recommendation:

  • Stay on current version for now. While this bug is fixed in the recently-released 8.2.4, NetApp have asked us to hold off until 8.2.4P1 is released in Q1 2016, as they haven’t ironed out all the write performance bugs.

Written by Phil Wiffen

Phil is an IT Professional working in Cambridge, England. He generally blogs about useful solutions that he comes across in his work/play.

4 Comments

Phil Wiffen

Hi Fab, I re-read correspondence with NetApp and they alluded to the bugs being “internal”. Not sure what that means, but it’s interesting that there are no performance related fixes in 8.2.4P1 (besides the ones fixed in 8.2.4). I’d recommend you get in touch with NetApp to get some clarification. I’ve also asked, and if I hear back, will update.

Fab

Hi again,

8.2.4P2 is out since February 18th. Again, nothing about performance issues…
Thanks for the feedback. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *