This report has been written due to a number of write latency related bugs I’ve witnessed in recent versions on ONTAP 7-mode. It’s a modified version of an internal report that I wrote up in some free time. I thought I’d summarise my findings in case others are seeing similar issues.

Symptoms

  • High Write latency alarms being received.
  • Some write latencies up to 180ms (yes, 180 milliseconds)
  • CPU usage is not excessive

Diagnostics

  •  High write latency is sometimes associated with high CPU usage, but in this case CPU usage is <100% across all cores (priv set diag; sysstat -M)
  • IOPS going through the system is not abnormal (as in, it’s done a similar amount of IOPS, or more, in the past)
  • Network throughput on the system is not maxed out
  • NetApp perfstat analysis (at first level support) comes back as “system is being pushed to its limit, it’s probably time to upgrade” or “everything seems fine”. If you get this response, ask them to check the signature against the bugs below.
  • Known bugs

    There are a number of known bugs in ONTAP 7-mode that can cause this, based on experience. I’ve tried to summarise these below.

    ONTAP 8.2.2P2

    Bug 855574: Sequential appends to user file results in excessive write latency

    Details:

    • Present in ONTAP 8.2.2P2
    • Introduced in 8.2 codebase (8.1 is immune)
    • Fixed in ONTAP 8.2.3P3 onwards

    Recommendation:

    • Upgrade to ONTAP 8.2.4P6

    ONTAP 8.2.3P3, possibly earlier versions too

    Bug 647449: Use of default quota rules can impact I/O latency and throughput

    Details:

    • Present in ONTAP 8.2.3P3
    • Fixed in ONTAP 8.2.3P4

    Recommendation:

    • Upgrade to ONTAP 8.2.4P6

    ONTAP 8.2.3P3, ONTAP 8.2.3P4, ONTAP 8.2.3P6

    Bug 928593: Write operations are not performed resulting in severe write latencies

    Details:

    • First introduced in ONTAP 8.2.3P3. Remains in subsequent 8.2.3 P-releases
    • Mostly fixed in ONTAP 8.2.4. Waiting on P1 or P2 for more complete fix.
    • This bug is a direct relation of 855574, so if your workloads were “tickling” that bug you may see this bug, too.

    Recommendation:

    • Stay on current version for now. While this bug is fixed in the recently-released 8.2.4, NetApp have asked us to hold off until 8.2.4P1 is released in Q1 2016, as they haven’t ironed out all the write performance bugs.