K 10 svn:author V 7 git2svn K 8 svn:date V 27 2022-05-01T19:19:45.461288Z K 7 svn:log V 1535 lacp: short timeout erroneously declares link-flapping Panasas was seeing a higher-than-expected number of link-flap events. After joint debugging with the switch vendor, we determined there were problems on both sides; either of which might cause the occasional event, but together caused lots of them. On the switch side, an internal queuing issue was causing LACP PDUs -- which should be sent every second, in short-timeout mode -- to sometimes be sent slightly later than they should have been. In some cases, two successive PDUs were late, but we never saw three late PDUs in a row. On the FreeBSD side, we saw a link-flap event every time there were two late PDUs, while the spec says that it takes *three* seconds of downtime to trigger that event. It turns out that if a PDU was received shortly before the timer code was run, it would decrement less than a full second after the PDU arrived. Then two delayed PDUs would cause two additional decrements, causing it to reach zero less than three seconds after the most-recent on-time PDU. The solution is to note the time a PDU arrives, and only decrement if at least a full second has elapsed since then. Reported by: Greg Foster Reviewed by: gallatin Tested by: Greg Foster MFC after: 3 days Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D35070 (cherry picked from commit 00a80538b4471b2978c5a1990f48189f2c692e24) Git Hash: 3cbc8109a9855edfa24425e7ed7abafa2300148a Git Author: gfoster@panasas.com END