K 10 svn:author V 3 bde K 8 svn:date V 27 2007-02-15T10:33:49.000000Z K 7 svn:log V 1186 MFC (if_bge.c 1.108, etc., less some style bugs: eliminate one PCI read per call to bge_start()). In packet blasting tests using ttcp with tiny udp packets on an A64-3200 with a 64-bit 5701 on a 32-bit 33MHz PCI bus, this gives a speedup from 347 kpps to 623 kpps. sendto() has a lot of software overheads, but even with these the single PCI write per call to bge_start() almost doubled the per-packet time. This is partly because the software overheads are so large that the CPU can't keep up with a Gbps NIC that can actually get anywhere near Gbps speed for tiny packets (347 kppps for tiny packets is only about 21% of wire speed). When the CPU can't keep up, it gets further behind because it ends up calling bge_start() at least once for every packet, so any overheads in bge_start() are not amortized across multiple packets. Thus the PCI read had an especially high overhead. For larger packets, the speedup is closer to the 1.8% claimed in rev. 1.108. Rev.1.108 claims to eliminate a PCI write but actually eliminates a PCI read. The write of the tx product index is not so costly as its read, and cannot be eliminated completely. It could be coalesced in some cases. END