K 10
svn:author
V 3
bde
K 8
svn:date
V 27
2007-02-15T10:33:49.000000Z
K 7
svn:log
V 1186
MFC (if_bge.c 1.108, etc., less some style bugs: eliminate one PCI
read per call to bge_start()).

In packet blasting tests using ttcp with tiny udp packets on an A64-3200
with a 64-bit 5701 on a 32-bit 33MHz PCI bus, this gives a speedup
from 347 kpps to 623 kpps.  sendto() has a lot of software overheads,
but even with these the single PCI write per call to bge_start() almost
doubled the per-packet time.  This is partly because the software
overheads are so large that the CPU can't keep up with a Gbps NIC that
can actually get anywhere near Gbps speed for tiny packets (347 kppps
for tiny packets is only about 21% of wire speed).  When the CPU can't
keep up, it gets further behind because it ends up calling bge_start()
at least once for every packet, so any overheads in bge_start() are
not amortized across multiple packets.  Thus the PCI read had an
especially high overhead.

For larger packets, the speedup is closer to the 1.8% claimed in rev.
1.108.

Rev.1.108 claims to eliminate a PCI write but actually eliminates a
PCI read.  The write of the tx product index is not so costly as its
read, and cannot be eliminated completely.  It could be coalesced in
some cases.

END