K 10 svn:author V 6 scottl K 8 svn:date V 27 2006-09-11T06:48:53.000000Z K 7 svn:log V 888 The run_filter() procedure is a means of working around DMA engine bugs in old/broken hardware. Unfortunately, it adds cache pressure and possible mispredicted branches to the fast path of the bus_dmamap_load collection of functions. Since it's meant for slow path exception processing, de-inline it and allow its conditions to be pre-computed at tag_create time and thus short-circuited at runtime. While here, cut down on the size of _bus_dmamap_load_buffer() by pushing the bounce page logic into a non-inlined function. Again, this helps with cache pressure and mispredicted branches. According to the TSC, this shaves off a few cycles on average. Unfortunately, the data varies quite a bit due to interrupts and preemption, so it's hard to get a good measurement. Real world measurements of network PPS are welcomed. A merge to amd64 and other arches is pending more testing. END