ƒ¾W201627 86 164 127 259 193 238 226 5232 198 135 381 117 205 234 132 132 118 243 144 535 3383 248 239 147 152 3575 339 492 690 860 194 272 593 117 364 247 262 773 156 442 139 379 379 299 135 436 212 366 132 117 202 127 257 479 295 338 270 157 406 1373 1373 2897 170 360 292 2897 675 675 1080 183 1080 999 436 999 426 255 2602 679 2602 229 2773 2773 271 112 107 403 403 K 10 svn:author V 7 delphij K 8 svn:date V 27 2010-01-06T00:20:37.091140Z K 7 svn:log V 68 Fix build: getopt() returns int so use an integer to get the value. END K 10 svn:author V 5 kmacy K 8 svn:date V 27 2010-01-06T01:59:20.298972Z K 7 svn:log V 33 implement most of sleep / wakeup END K 10 svn:author V 3 imp K 8 svn:date V 27 2010-01-06T05:58:07.311178Z K 7 svn:log V 166 Merge from head at r201628. # This hasn't been tested, and there are at least three bad commits # that need to be backed out before the branch will be stable again. END K 10 svn:author V 8 kientzle K 8 svn:date V 27 2010-01-06T06:35:10.705706Z K 7 svn:log V 96 When restoring files, use the mode for the mode. Thanks to: Jun Kuriyama for pointing this out END K 10 svn:author V 4 neel K 8 svn:date V 27 2010-01-06T06:42:08.797405Z K 7 svn:log V 144 Remove all CFE-specific code from locore.S. The CFE entrypoint initialization is now done in platform-specific code. Approved by: imp (mentor) END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T07:50:27.496706Z K 7 svn:log V 134 Make OpenSSH and utmpx actually work. I wonder why I need these modifications. I would have expected OpenSSH to work out of the box. END K 10 svn:author V 8 netchild K 8 svn:date V 27 2010-01-06T08:18:49.306921Z K 7 svn:log V 5133 MFC several ZFS related commits: - taskq changes - fixes for race conditions - locking fixes - bug fixes - ... r185310: ---snip--- Remove unused variable. Found with: Coverity Prevent(tm) CID: 3669,3671 ---snip--- r185319: ---snip--- Fix locking (file descriptor table and Giant around VFS). Most submitted by: kib Reviewed by: kib ---snip--- r192689: ---snip--- Fix comment. ---snip--- r193110: ---snip--- work around snapshot shutdown race reported by Henri Hennebert ---snip--- r193440: ---snip--- Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr ---snip--- ATTENTION: this commit to releng7 does not allow shared vnode locks (there are some VFS changes needed before it can be enabled), it only provides the infrastructure and serves mostly as a diff reduction in the ZFS code. A comment has been added to the locking part to explain why no shared locks are used. r195627: ---snip--- In nvpair_native_embedded_array(), meaningless pointers are zeroed. The programmer was aware that alignment was not guaranteed in the packed structure and used bzero() to NULL out the pointers. However, on ia64, the compiler is quite agressive in finding ILP and calls to bzero() are often replaced by simple assignments (i.e. stores). Especially when the width or size in question corresponds with a store instruction (i.e. st1, st2, st4 or st8). The problem here is not a compiler bug. The address of the memory to zero-out was given by '&packed->nvl_priv' and given the type of the 'packed' pointer the compiler could assume proper alignment for the replacement of bzero() with an 8-byte wide store to be valid. The problem is with the programmer. The programmer knew that the address did not have the alignment guarantees needed for a regular assignment, but failed to inform the compiler of that fact. In fact, the programmer told the compiler the opposite: alignment is guaranteed. The fix is to avoid using a pointer of type "nvlist_t *" and instead use a "char *" pointer as the basis for calculating the address. This tells the compiler that only 1-byte alignment can be assumed and the compiler will either keep the bzero() call or instead replace it with a sequence of byte-wise stores. Both are valid. ---snip--- r195822: ---snip--- Fix extattr_list_file(2) on ZFS in case the attribute directory doesn't exist and user doesn't have write access to the file. Without this fix, it returns bogus value instead of 0. For some reason this didn't manifest on my kernel compiled with -O0. PR: kern/136601 Submitted by: Jaakko Heinonen ---snip--- r195909 ---snip--- We don't support ephemeral IDs in FreeBSD and without this fix ZFS can panic when in zfs_fuid_create_cred() when userid is negative. It is converted to unsigned value which makes IS_EPHEMERAL() macro to incorrectly report that this is ephemeral ID. The most reasonable solution for now is to always report that the given ID is not ephemeral. PR: kern/132337 Submitted by: Matthew West Tested by: Thomas Backman , Michael Reifenberger ---snip--- r196291: ---snip--- - Fix a race where /dev/zfs control device is created before ZFS is fully initialized. Also destroy /dev/zfs before doing other deinitializations. - Initialization through taskq is no longer needed and there is a race where one of the zpool/zfs command loads zfs.ko and tries to do some work immediately, but /dev/zfs is not there yet. Reported by: pav ---snip--- r196269: ---snip--- Fix misalignment in nvpair_native_embedded() caused by the compiler replacing the bzero(). See also revision 195627, which fixed the misalignment in nvpair_native_embedded_array(). ---snip--- r196295: ---snip--- Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue. ---snip--- The taskqueue_member() function is different due to kproc/kthread changes in releng8 and head, the function was... Revieved by: jhb r196297: ---snip--- Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have 0 usecount. Reported by: Thomas Backman ---snip--- r196299: ---snip--- - We need to recycle vnode instead of freeing znode. Submitted by: avg - Add missing vnode interlock unlock. - Remove redundant znode locking. ---snip--- r196301: ---snip--- If z_buf is NULL, we should free znode immediately. Noticed by: avg ---snip--- r196307: ---snip--- Manage asynchronous vnode release just like Solaris. Discussed with: kmacy ---snip--- END K 10 svn:author V 2 ru K 8 svn:date V 27 2010-01-06T08:26:43.125686Z K 7 svn:log V 106 MFC r201290: Treat an empty argument as an error, instead of fetching the contents of the root directory. END K 10 svn:author V 8 netchild K 8 svn:date V 27 2010-01-06T09:09:30.970880Z K 7 svn:log V 38 Fix meta data, r196309 is not merged. END K 10 svn:author V 8 netchild K 8 svn:date V 27 2010-01-06T10:21:16.873403Z K 7 svn:log V 283 MFC several ZFS related commits: r196456: ---snip--- - Give minclsyspri and maxclsyspri real values (consulted with kmacy). - Honour 'pri' argument for thread_create(). ---snip--- r196457: ---snip--- Set priority of vdev_geom threads and zvol threads to PRIBIO. ---snip--- END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T10:29:53.957646Z K 7 svn:log V 26 Make utmpx actually work. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T10:41:27.467889Z K 7 svn:log V 113 Reintroduce getutxuser(). This function will be used by applications that crawl through the lastlogin database. END K 10 svn:author V 3 imp K 8 svn:date V 27 2010-01-06T12:15:10.226616Z K 7 svn:log V 141 Revert r200892, 200893 and 200894. There's companion changes elsewhere that aren't quite ready, and these break the world in the mean time. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T12:42:16.657013Z K 7 svn:log V 41 Make more tools use the utmpx interface. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T12:42:46.567346Z K 7 svn:log V 41 Remove the utmpx interface from libulog. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T12:43:19.706902Z K 7 svn:log V 27 Let pam_lastlog use utmpx. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T12:43:59.080506Z K 7 svn:log V 151 Remove the utmp manpage and add it to ObsoleteFiles. Also add the libulog manpages to ObsoleteFiles that were missing in one of the previous commits. END K 10 svn:author V 6 rpaulo K 8 svn:date V 27 2010-01-06T13:13:14.935899Z K 7 svn:log V 49 len must be int, not size_t Submitted by: novel END K 10 svn:author V 3 mav K 8 svn:date V 27 2010-01-06T13:14:37.327106Z K 7 svn:log V 442 Change the way in which zero stripesize is handled. Instead of reporting zero stripeoffset in such case (as if device has no stripes), report offset from the beginning of the media (as if device has single infinite stripe). This gives partitioning tools information, required to guess better partition alignment, in case if hardware doesn't report it's stripe size. For example, it should give disklabel info about odd offset made by fdisk. END K 10 svn:author V 8 netchild K 8 svn:date V 27 2010-01-06T13:56:49.683266Z K 7 svn:log V 3284 MFC several ZFS related commits: r196662: ---snip--- Add missing mountpoint vnode locking. This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when regular user tries to mount dataset owned by him. ---snip--- r196703: ---snip--- Backport the 'dirtying dbuf' panic fix from newer ZFS version. Reported by: Thomas Backman ---snip--- r196919: ---snip--- bzero() on-stack argument, so mutex_init() won't misinterpret that the lock is already initialized if we have some garbage on the stack. PR: kern/135480 Reported by: Emil Mikulic ---snip--- r196927: ---snip--- Changing provider size is not really supported by GEOM, but doing so when provider is closed should be ok. When administrator requests to change ZVOL size do it immediately if ZVOL is closed or do it on last ZVOL close. PR: kern/136942 Requested by: Bernard Buri ---snip--- r196943: ---snip--- - Avoid holding mutex around M_WAITOK allocations. - Add locking for mnt_opt field. ---snip--- r196944: ---snip--- Don't recheck ownership on update mount. This will eliminate LOR between vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway. Noticed by: kib Reviewed by: kib ---snip--- r196954: ---snip--- If we have to use avl_find(), optimize a bit and use avl_insert() instead of avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()). Fix similar case in the code that is currently commented out. ---snip--- Note: This also seems to fix a previous merge hickup. r196965: ---snip--- Fix reference count leak for a case where snapshot's mount point is updated. Such situation is not supported. This problem was triggered by something like this: # zpool create tank da0 # zfs snapshot tank@snap # cd /tank/.zfs/snapshot/snap (this will mount the snapshot) # cd # mount -u nosuid /tank/.zfs/snapshot/snap (refcount leak) # zpool export tank cannot export 'tank': pool is busy ---snip--- r196985: ---snip--- Only log successful commands! Without this fix we log even unsuccessful commands executed by unprivileged users. Action is not really taken, but it is logged to pool history, which might be confusing. Reported by: Denis Ahrens ---snip--- r197151: ---snip--- Be sure not to overflow struct fid. ---snip--- r197152: ---snip--- Extend scope of the z_teardown_lock lock for consistency and "just in case". ---snip--- r200124: ---snip--- Avoid using additional variable for storing an error if we are not going to do anything with it. ---snip--- r200158: ---snip--- We have to eventually look for provider without checking guid as this is need for attaching when there is no metadata yet. Before r200125 the order of looking for providers was wrong. It was: 1. Find provider by name. 2. Find provider by guid. 3. Find provider by name and guid. Where it should have been: 1. Find provider by name and guid. 2. Find provider by guid. 3. Find provider by name. ---snip--- Note: This was already there, but it was not recorded as merged. This commit fixes a mis-merge (reversed logic). END K 10 svn:author V 5 gavin K 8 svn:date V 27 2010-01-06T14:01:28.480034Z K 7 svn:log V 153 Print leading zeros in the UFS2 FSID. PR: bin/142155 Submitted by: Efstratios Karatzas gpf.kira gmail.com Approved by: ed (mentor) MFC after: 2 weeks END K 10 svn:author V 8 netchild K 8 svn:date V 27 2010-01-06T14:14:59.984784Z K 7 svn:log V 141 - Remove doublet creation of the zio_cache which was introduced in the zpool v13 commit. - Diff reduction to 8-stable (add an empty line). END K 10 svn:author V 3 rrs K 8 svn:date V 27 2010-01-06T16:04:56.479714Z K 7 svn:log V 55 adds back missing mtuexen line, LOST in CVS->SVN merge END K 10 svn:author V 3 rrs K 8 svn:date V 27 2010-01-06T16:05:33.074030Z K 7 svn:log V 60 Remove Michael From the list, he is now a full commitor :-) END K 10 svn:author V 8 netchild K 8 svn:date V 27 2010-01-06T16:09:58.919433Z K 7 svn:log V 3476 MFC several ZFS related commits: r196980: ---snip--- When we automatically mount snapshot we want to return vnode of the mount point from the lookup and not covered vnode. This is one of the fixes for using .zfs/ over NFS. ---snip--- r196982: ---snip--- We don't export individual snapshots, so mnt_export field in snapshot's mount point is NULL. That's why when we try to access snapshots over NFS use mnt_export field from the parent file system. ---snip--- r197131: ---snip--- Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain NULL, but also can point to dead vnode, take that into account. PR: kern/132068 Reported by: Edward Fisk" <7ogcg7g02@sneakemail.com>, kris Fix based on patch from: Jaakko Heinonen ---snip--- r197133: ---snip--- - Protect reclaim with z_teardown_inactive_lock. - Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if z_dbuf field is NULL - this might happen in case of rollback or forced unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete(). - On forced unmount wait for all znodes to be destroyed - destruction can be done asynchronously via zfs_reclaim_complete(). ---snip--- r197153: ---snip--- When zfs.ko is compiled with debug, make sure that znode and vnode point at each other. ---snip--- r197167: ---snip--- Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories by just returning EOPNOTSUPP. This will allow NFS server to fall back to regular READDIR. Note that converting inode number to snapshot's vnode is expensive operation. Snapshots are stored in AVL tree, but based on their names, not inode numbers, so to convert inode to snapshot vnode we have to interate over all snalshots. This is not a problem in OpenSolaris, because in their READDIRPLUS implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on d_fileno as we do. PR: kern/125149 Reported by: Weldon Godfrey Analysis by: Jaakko Heinonen ---snip--- r197177: ---snip--- Support both case: when snapshot is already mounted and when it is not yet mounted. ---snip--- r197201: ---snip--- - Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows ZFS route of not listing snapshots by default with 'zfs list' command. - Add UPDATING entry to note that ZFS snapshots are no longer visible in mount(8) and df(1) output by default. Reviewed by: kib ---snip--- Note: the MNT_IGNORE part is commented out in this commit and the UPDATING entry is not merged, as this would be a POLA violation on a stable branch. This revision is included here, as it also makes locking changes and makes sure that a snapshot is mounted RO. r197426: ---snip--- Restore BSD behaviour - when creating new directory entry use parent directory gid to set group ownership and not process gid. This was overlooked during v6 -> v13 switch. PR: kern/139076 Reported by: Sean Winn ---snip--- r197458: ---snip--- Close race in zfs_zget(). We have to increase usecount first and then check for VI_DOOMED flag. Before this change vnode could be reclaimed between checking for the flag and increasing usecount. ---snip--- END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T16:39:16.210182Z K 7 svn:log V 247 MFC 182846: Convert SYSCTL_INTs for tcp_mssdflt and tcp_v6mssdflt to SYSCTL_PROCs and check that the default mss for neither v4 nor v6 goes below the minimum MSS constant (216). This prevents people from shooting themselves in the foot. END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T16:43:09.989366Z K 7 svn:log V 400 MFC r182851: Split tcp_mss() in tcp_mss() and tcp_mss_update() where the former calls the latter. Merge tcp_mss_update() with code from tcp_mtudisc() basically doing the same thing. This gives us one central place where we calcuate and check mss values to update t_maxopd (maximum mss + options length) instead of two slightly different but almost equal implementations to maintain. END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T16:48:59.872897Z K 7 svn:log V 598 MFC r184720: Fix a bug introduced with r182851 (r201653 in stable/7) splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. In case we return early and got a metricptr to pass the hostcache info back to the caller we need to initialize the data to a defined state (zero it) as tcp_hc_get() would do if there was no hit. Without that the caller would check on random stack garbage which could lead to undefined results. This only affected tcp_mss() if there was no routing entry for the peer, tcp_mtudisc() was not affected. END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T16:51:57.929098Z K 7 svn:log V 768 MFC r184722: Fix a bug introduced with r182851 (r201653 in stable/7) splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. Move the TSO logic back to tcp_mss() and out of tcp_mss_update(). We tried to avoid that initially but if were are called from tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb there, called into tcp_mtudisc() and tcp_mss_update() which then would reenable TSO on the tcpcb based on TSO capabilities of the interface as learnt in tcp_maxmtu/6(). So if TSO was enabled on the (possibly new) outgoing interface it was turned back on, which lead to an endless loop between tcp_output() and tcp_mtudisc() until we overflew the stack. Reported by: kmacy END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T16:53:57.747418Z K 7 svn:log V 102 MFC r184731: Fix typo and while here another one. Reviewed by: keramida Reported by: keramida END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T16:56:31.804549Z K 7 svn:log V 180 MFC r184721: Adopt the comment for tcp_maxmtu(); we are returning a number not a pointer. While here update the rest of the comment to better match what we have these days. END K 10 svn:author V 3 mav K 8 svn:date V 27 2010-01-06T17:12:18.188782Z K 7 svn:log V 500 Increase default block size from 4K to 64K. It was reduces 6 yeard ago, when trees were big and FAST mode was enabled by default. So small block size doesn't benefits linear I/O operations in FAST and significantly slowdowns in ECONOMIC (default) mode. For single stream random I/Os so small block doesn't give much benefits, as access time is usually bigger then transfer time there. Same time it requires all heads to seek together for every single request, reducing performance on parallel load. END K 10 svn:author V 3 imp K 8 svn:date V 27 2010-01-06T18:21:22.374837Z K 7 svn:log V 25 Sync to r201658 on head. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T19:29:18.807180Z K 7 svn:log V 272 Remove silly bugs from getutxent() and pututxline(). - Unbreak getutxent() on UTXDB_LOG (wtmp) files by not always returning NULL instead of the proper entry. - Unbreak UTXDB_LOG writing of pututxline() of DEAD_PROCESS by properly breaking from the switch statement. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T19:57:38.646747Z K 7 svn:log V 155 Make ac(8) work with utmpx. For DEAD_PROCESS entries we should always just compare ut_id's. DEAD_PROCESS entries don't have a user name nor a TTY device. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T20:04:36.735955Z K 7 svn:log V 170 Add some checks to time stamps: - Prevent login sessions from ever getting a negative duration. - Don't update lastlogin when the timestamp is lower than the old value. END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T20:07:18.698460Z K 7 svn:log V 681 MFC r186948 (w/o the IPv6 parts): Make SIOCGIFADDR and related, jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. If there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail. PR: kern/114325 Thanks to: Axel Scheepers (axel.scheepers nl.clara.net) END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T20:08:20.288582Z K 7 svn:log V 65 Remove a " * " from the comment, caused by vim(1) line wrapping. END K 10 svn:author V 4 jkim K 8 svn:date V 27 2010-01-06T20:28:47.880934Z K 7 svn:log V 348 MFC: r200251 - Try pre-allocating all FIBs upfront. Previously we tried pre-allocating 128 FIBs first and allocated more later if necessary. Remove now unused definitions from the header file[1]. - Force sequential bus scanning. It seems parallel scanning is in fact slower and causes more harm than good[1]. Adjust a comment to reflect that. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T20:39:57.206833Z K 7 svn:log V 48 Let libulog use ut_id properly, now we have it. END K 10 svn:author V 5 gavin K 8 svn:date V 27 2010-01-06T20:40:41.781567Z K 7 svn:log V 284 MFC r200820: Support the tablet in (at least) the Toshiba Portege M200 Tablet PC. This device only appears on the ACPI bus, so isn't caught by the current entry for it in the uart(4) ISA attachment. PR: kern/140172 Reviewed by: jhb, marcel Approved by: ed (mentor, implicit) END K 10 svn:author V 5 gavin K 8 svn:date V 27 2010-01-06T20:41:12.829020Z K 7 svn:log V 284 MFC r200820: Support the tablet in (at least) the Toshiba Portege M200 Tablet PC. This device only appears on the ACPI bus, so isn't caught by the current entry for it in the uart(4) ISA attachment. PR: kern/140172 Reviewed by: jhb, marcel Approved by: ed (mentor, implicit) END K 10 svn:author V 3 jhb K 8 svn:date V 27 2010-01-06T20:43:40.262150Z K 7 svn:log V 206 Use _pthread_once() rather than _once() for localtime() and gmtime(). These methods are only invoked when __isthreaded is true at which point it is safe to use _pthread_once() directly. MFC after: 1 week END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T20:46:05.643186Z K 7 svn:log V 44 We can also add the pid for logout entries. END K 10 svn:author V 4 jkim K 8 svn:date V 27 2010-01-06T20:51:04.542257Z K 7 svn:log V 342 MFC: r200251 - Try pre-allocating all FIBs upfront. Previously we tried pre-allocating 128 FIBs first and allocated more later if necessary. Remove now unused definitions from the header file. - Force sequential bus scanning. It seems parallel scanning is in fact slower and causes more harm than good. Adjust a comment to reflect that. END K 10 svn:author V 5 gavin K 8 svn:date V 27 2010-01-06T20:54:04.643639Z K 7 svn:log V 117 MFC r200819: Grammar and minor tweaks to powerd(8) man page. PR: docs/133186 Approved by: ed (mentor, implicit) END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T21:11:37.642235Z K 7 svn:log V 274 Treat ut_id as binary information, not a string. Looking at both the implementation from Solaris and NetBSD, ut_id isn't supposed to be a string. It makes sense, because I can imagine certain applications just write random binary data into this field to identify sessions. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T21:12:38.335677Z K 7 svn:log V 41 Also fix ac(8) now that ut_id is binary. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T21:13:28.654816Z K 7 svn:log V 26 Fix stupidity on my side. END K 10 svn:author V 5 luigi K 8 svn:date V 27 2010-01-06T21:22:29.828035Z K 7 svn:log V 107 probable fix for broken keepalives. These packets go to ip_output() so they need ip_len in network format. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T21:27:06.518427Z K 7 svn:log V 36 Respect the byte ordering of fu_tv. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T21:36:33.891308Z K 7 svn:log V 165 Calling ulog_logout() here makes no sense at all. Now that we have ut_id, we should walk through all sessions using getutxline() and terminate all of them by hand. END K 10 svn:author V 5 simon K 8 svn:date V 27 2010-01-06T21:45:30.661656Z K 7 svn:log V 384 Fix BIND named(8) cache poisoning with DNSSEC validation. [SA-10:01] Fix ntpd mode 7 denial of service. [SA-10:02] Fix ZFS ZIL playback with insecure permissions. [SA-10:03] Various FreeBSD 8.0-RELEASE improvements. [EN-10:01] Security: FreeBSD-SA-10:01.bind Security: FreeBSD-SA-10:02.ntpd Security: FreeBSD-SA-10:03.zfs Errata: FreeBSD-EN-10:01.freebsd Approved by: so (simon) END K 10 svn:author V 7 thompsa K 8 svn:date V 27 2010-01-06T21:46:08.670287Z K 7 svn:log V 198 scratch_size was incorrectly passed as language ID when retrieving the language ID table, this broke string retrieval on some devices. Submitted by: Hans Petter Selasky Reported by: Renato Botelho END K 10 svn:author V 7 thompsa K 8 svn:date V 27 2010-01-06T22:14:05.508052Z K 7 svn:log V 241 Improve u3g device ejecting by providing additional methods for the eject command in the usb_msctest routines, as well as a general tidyup. This now properly ejects the ZTE MF636, Option Gi0322 and Novatel MC950D devices I have on my desk. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T22:15:26.261435Z K 7 svn:log V 178 Improve hacks on OpenSSH: - Reduce the diff against config.h generated by openssh-portable. - Add some bits to make Last login: work. This should eventually be sent upstream. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T22:19:57.901582Z K 7 svn:log V 66 The size argument needs to be the size of the destination buffer. END K 10 svn:author V 3 pjd K 8 svn:date V 27 2010-01-06T22:39:40.298167Z K 7 svn:log V 313 Teach the (gpt)zfsboot and zfsloader raidz code to use its buffers more efficiently. Before this patch, in the worst case memory use would increase exponentially on the number of drives in the raidz vdev. Submitted by: Matt Reimer Sponsored by: VPOP Technologies, Inc. Silence from: dfr END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T22:45:49.518283Z K 7 svn:log V 1275 MFC r198923-198924,198927-198928 r198923: Use correct dma tag for jumbo buffer. r198924: Covert bge_newbuf_std to use bus_dmamap_load_mbuf_sg(9). Note, bge_newbuf_std still has a bug for handling dma map load failure under high network load. Just reusing mbuf is not enough as driver already unloaded the dma map of the mbuf. Graceful recovery needs more work. Ideally we can just update dma address part of a Rx descriptor because the controller never overwrite the Rx descriptor. This requires some Rx initialization code changes and it would be done later after fixing other incorrect bus_dma(9) usages. r198927: Remove common DMA tag used for TX/RX mbufs and create Tx DMA tag and Rx DMA tag separately. Previously it used a common mbuf DMA tag for both Tx and Rx path but Rx buffer(standard ring case) should have a single DMA segment and maximum buffer size of the segment should be less than or equal to MCLBYTES. This change also make it possible to add TSO with minor changes. r198928: Make bge_newbuf_std()/bge_newbuf_jumbo() returns actual error code for buffer allocation. If driver know we are out of Rx buffers let controller stop. This should fix panic when interface is run even if it had no configured Rx buffers. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T22:49:10.973779Z K 7 svn:log V 1275 MFC r198923-198924,198927-198928 r198923: Use correct dma tag for jumbo buffer. r198924: Covert bge_newbuf_std to use bus_dmamap_load_mbuf_sg(9). Note, bge_newbuf_std still has a bug for handling dma map load failure under high network load. Just reusing mbuf is not enough as driver already unloaded the dma map of the mbuf. Graceful recovery needs more work. Ideally we can just update dma address part of a Rx descriptor because the controller never overwrite the Rx descriptor. This requires some Rx initialization code changes and it would be done later after fixing other incorrect bus_dma(9) usages. r198927: Remove common DMA tag used for TX/RX mbufs and create Tx DMA tag and Rx DMA tag separately. Previously it used a common mbuf DMA tag for both Tx and Rx path but Rx buffer(standard ring case) should have a single DMA segment and maximum buffer size of the segment should be less than or equal to MCLBYTES. This change also make it possible to add TSO with minor changes. r198928: Make bge_newbuf_std()/bge_newbuf_jumbo() returns actual error code for buffer allocation. If driver know we are out of Rx buffers let controller stop. This should fix panic when interface is run even if it had no configured Rx buffers. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:02:35.746577Z K 7 svn:log V 2799 MFC r198967,199009-199011,199014,199020,199035-199036,199054 r198967: Correct MSI mode register bits. r199009: bge(4) already switched to use UMA backed page allocator and local memory allocator for jumbo frame was removed long time ago. Remove no more used macros. r199010: Do bus_dmamap_sync call only if frame size is greater than standard buffer size. If controller is not capable of handling jumbo frame, interface MTU couldn't be larger than standard MTU which in turn the received should be fit in standard buffer. This fixes bus_dmamap_sync call for jumbo ring is called even if interface is configured to use standard MTU. Also if total frame size could be fit into standard buffer don't use jumbo buffers. r199011: Reimplement Rx buffer allocation to handle dma map load failure. Introduce two spare dma maps for standard buffer and jumbo buffer respectively. If loading a dma map failed reuse previously loaded dma map. This should fix unloaded dma map is used in case of dma map load failure. Also don't blindly unload dma map and defer dma map sync and unloading operation until we know dma map for new buffer is successfully loaded. This change saves unnecessary dma load/unload operation. Previously bge(4) tried to reuse mbuf with unloaded dma map which is really bad thing in bus_dma(9) perspective. While I'm here update if_iqdrops if we can't allocate Rx buffers. r199014: Fix I mssied in r199011. Rx ring index also should be updated. If we fill Rx ring full instead of half we can simplify this logic but this requires more experimentation. r199020: Tell upper layer we support long frames. ether_ifattach() initializes it to ETHER_HDR_LEN so we have to override it after calling ether_ifattch(). While I'm here remove setting if_mtu value, it's initialized in ether_ifattach(). r199035: Don't count input errors twice, we always read input errors from MAC in bge_tick. Previously it used to show more number of input errors. I noticed actual input errors were less than 8% even for 64 bytes UDP frames generated by netperf. Since we always access BGE_RXLP_LOCSTAT_IFIN_DROPS register in bge_tick, remove useless code protected by #ifdef notyet. r199036: Count number of inbound packets which were chosen to be discarded as input errors. Also count out of receive BDs as input errors. r199054: Partially revert r199035. Revision 1.158 says only lower ten bits of BGE_RXLP_LOCSTAT_IFIN_DROPS register is valid. For BCM5761 case it seems the controller maintains 16bits value for the register. However 16bits are still too small to count all dropped packets happened in a second. To get a correct counter we have to read the register in bge_rxeof() which would be too expensive. END K 10 svn:author V 2 bz K 8 svn:date V 27 2010-01-06T23:05:00.722146Z K 7 svn:log V 79 Correct a typo. Submitted by: sn_ (sn_ gmx.net) on hackers@ MFC after: 3 days END K 10 svn:author V 7 delphij K 8 svn:date V 27 2010-01-06T23:09:23.765825Z K 7 svn:log V 263 Instead of assuming all vdevs are healthy, check the newest vdev label for each vdev's status. Booting from a degraded vdev should now be more robust. Submitted by: Matt Reimer Sponsored by: VPOP Technologies, Inc. MFC after: 2 weeks END K 10 svn:author V 7 delphij K 8 svn:date V 27 2010-01-06T23:11:56.076204Z K 7 svn:log V 195 Space cleanup for revision 201689 committed separately for easier review. This commit is purely space changes. Submitted by: Matt Reimer Sponsored by: VPOP Technologies, Inc. MFC after: 2 weeks END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:15:07.205597Z K 7 svn:log V 2799 MFC r198967,199009-199011,199014,199020,199035-199036,199054 r198967: Correct MSI mode register bits. r199009: bge(4) already switched to use UMA backed page allocator and local memory allocator for jumbo frame was removed long time ago. Remove no more used macros. r199010: Do bus_dmamap_sync call only if frame size is greater than standard buffer size. If controller is not capable of handling jumbo frame, interface MTU couldn't be larger than standard MTU which in turn the received should be fit in standard buffer. This fixes bus_dmamap_sync call for jumbo ring is called even if interface is configured to use standard MTU. Also if total frame size could be fit into standard buffer don't use jumbo buffers. r199011: Reimplement Rx buffer allocation to handle dma map load failure. Introduce two spare dma maps for standard buffer and jumbo buffer respectively. If loading a dma map failed reuse previously loaded dma map. This should fix unloaded dma map is used in case of dma map load failure. Also don't blindly unload dma map and defer dma map sync and unloading operation until we know dma map for new buffer is successfully loaded. This change saves unnecessary dma load/unload operation. Previously bge(4) tried to reuse mbuf with unloaded dma map which is really bad thing in bus_dma(9) perspective. While I'm here update if_iqdrops if we can't allocate Rx buffers. r199014: Fix I mssied in r199011. Rx ring index also should be updated. If we fill Rx ring full instead of half we can simplify this logic but this requires more experimentation. r199020: Tell upper layer we support long frames. ether_ifattach() initializes it to ETHER_HDR_LEN so we have to override it after calling ether_ifattch(). While I'm here remove setting if_mtu value, it's initialized in ether_ifattach(). r199035: Don't count input errors twice, we always read input errors from MAC in bge_tick. Previously it used to show more number of input errors. I noticed actual input errors were less than 8% even for 64 bytes UDP frames generated by netperf. Since we always access BGE_RXLP_LOCSTAT_IFIN_DROPS register in bge_tick, remove useless code protected by #ifdef notyet. r199036: Count number of inbound packets which were chosen to be discarded as input errors. Also count out of receive BDs as input errors. r199054: Partially revert r199035. Revision 1.158 says only lower ten bits of BGE_RXLP_LOCSTAT_IFIN_DROPS register is valid. For BCM5761 case it seems the controller maintains 16bits value for the register. However 16bits are still too small to count all dropped packets happened in a second. To get a correct counter we have to read the register in bge_rxeof() which would be too expensive. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:26:09.416778Z K 7 svn:log V 578 MFC r199065,199115-199116,199153,199661-199662 r199065: Correct disabling checksum offloading for BCM5700 B0. r199115: Add missing bus_dmamap_sync(9) before issuing kick command. r199116: Zero out Tx/Rx descriptors before using them. Also add missing bus_dmamap_sync(9) after Tx descriptor initialization. r199153: Controller does not update Tx descriptors(send BDs) after sending frames so remove unnecessary BUS_DMASYNC_PREREAD and BUS_DMASYNC_POSTREAD of bus_dmamap_sync(9). r199661: Remove extra white space. r199662: Fix typo introduced in r199011. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:28:39.366678Z K 7 svn:log V 578 MFC r199065,199115-199116,199153,199661-199662 r199065: Correct disabling checksum offloading for BCM5700 B0. r199115: Add missing bus_dmamap_sync(9) before issuing kick command. r199116: Zero out Tx/Rx descriptors before using them. Also add missing bus_dmamap_sync(9) after Tx descriptor initialization. r199153: Controller does not update Tx descriptors(send BDs) after sending frames so remove unnecessary BUS_DMASYNC_PREREAD and BUS_DMASYNC_POSTREAD of bus_dmamap_sync(9). r199661: Remove extra white space. r199662: Fix typo introduced in r199011. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:34:53.781071Z K 7 svn:log V 983 MFC 199663-199666 r199663: Due to newly added PCIe capabilities fallback code for finding the PCIe capability did not work right on recent controllers. Remove FreeBSD 6.x support code. r199664: Use capability pointer to access PCIe registers rather than directly access them at fixed address. While I'm here don't touch other bits of PCIe device control register except max payload size. r199665: Controller does not write Rx descriptors, remove BUS_DMASYNC_PREREAD. r199666: Rearrange bge_start_locked to see we can send more frames by checking IFF_DRV_RUNNING and IFF_DRV_OACTIVE flags. Also if we have less than 16 free send BDs set IFF_DRV_OACTIVE and try it later. Previously bge(4) used to reserve 16 free send BDs after loading dma maps but hardware just need one reserved send BD. If prouder index has the same value of consumer index it means the Tx queue is empty. While I'm here check IFQ_DRV_IS_EMPTY first to save one lock operation. END K 10 svn:author V 2 ed K 8 svn:date V 27 2010-01-06T23:36:14.903375Z K 7 svn:log V 92 Truncate utx.active when (re)booting. This makes sure utmp entries never survive a reboot. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:37:13.744706Z K 7 svn:log V 983 MFC 199663-199666 r199663: Due to newly added PCIe capabilities fallback code for finding the PCIe capability did not work right on recent controllers. Remove FreeBSD 6.x support code. r199664: Use capability pointer to access PCIe registers rather than directly access them at fixed address. While I'm here don't touch other bits of PCIe device control register except max payload size. r199665: Controller does not write Rx descriptors, remove BUS_DMASYNC_PREREAD. r199666: Rearrange bge_start_locked to see we can send more frames by checking IFF_DRV_RUNNING and IFF_DRV_OACTIVE flags. Also if we have less than 16 free send BDs set IFF_DRV_OACTIVE and try it later. Previously bge(4) used to reserve 16 free send BDs after loading dma maps but hardware just need one reserved send BD. If prouder index has the same value of consumer index it means the Tx queue is empty. While I'm here check IFQ_DRV_IS_EMPTY first to save one lock operation. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:42:15.593601Z K 7 svn:log V 902 MFC r199667-199668 r199667: Cache Rx producer/Tx consumer index as soon as we know status block update and then clear status block. Previously it used to access these index without synchronization which may cause problems when bounce buffers are used. Also add missing bus_dmamap_sync(9) in polling handler. Since we now update status block in driver, adjust bus_dmamap_sync(9) for status block. r199668: For MSI case, interrupt is not shared and we don't need to force PCI flush to get correct status block update. Add an optimized interrupt handler that is activated for MSI case. Actual interrupt handling is done by taskqueue such that the handler does not require driver lock for Rx path. The MSI capable bge(4) controllers automatically disables further interrupt once it enters interrupt state so we don't need PIO access to disable interrupt in interrupt handler. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-06T23:57:17.793547Z K 7 svn:log V 339 MFC r196370: - Do not try to reevaluate current RX production index on each loop iteration as it can be updated by the card while we process the RX ring forcing us to process RX descriptors for which DMA synchronisation operation has not been performed. This fixes the bug when bge(4) drops packets under high load. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-07T00:04:29.326496Z K 7 svn:log V 902 MFC r199667-199668 r199667: Cache Rx producer/Tx consumer index as soon as we know status block update and then clear status block. Previously it used to access these index without synchronization which may cause problems when bounce buffers are used. Also add missing bus_dmamap_sync(9) in polling handler. Since we now update status block in driver, adjust bus_dmamap_sync(9) for status block. r199668: For MSI case, interrupt is not shared and we don't need to force PCI flush to get correct status block update. Add an optimized interrupt handler that is activated for MSI case. Actual interrupt handling is done by taskqueue such that the handler does not require driver lock for Rx path. The MSI capable bge(4) controllers automatically disables further interrupt once it enters interrupt state so we don't need PIO access to disable interrupt in interrupt handler. END K 10 svn:author V 8 mckusick K 8 svn:date V 27 2010-01-07T00:17:36.674103Z K 7 svn:log V 328 This corrects a bug that manifested itself as identifying the last cylinder group of a UFS1 filesystem as bad. The error was in the check and not in the cylinder group itself. So even though fsck fixed the cylinder group correctly, it was still endlessly reported as bad. PR: 141992 MFC after: 2 weeks Reported by: Dan Strick END K 10 svn:author V 7 thompsa K 8 svn:date V 27 2010-01-07T00:30:59.342513Z K 7 svn:log V 158 Add new umass quirks for Western Digital MYBook and JMicron JM20337. PR: usb/142225, usb/142228 Submitted by: Thomas Ward, Yoshikazu GOTO MFC after: 1 week END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-07T00:44:54.264377Z K 7 svn:log V 2504 MFC r199670-199671,199674,199679,199761,199807-199808 r199670: Fix two long standing bugs on bge(4). Most pre BCM5755 controllers have a DMA bug when buffer address crosses a multiple of the 4GB boundary(e.g. 4GB, 8GB, 12GB etc). Limit DMA address to be within 4GB address for these controllers. The second DMA bug limits DMA address to be within 40bit address space. This bug applies to BCM5714 and BCM5715 and 5708(bce(4) controller). This is not actually a MAC controller bug but an issue with the embedded PCIe to PCI-X bridge in the device. So for BCM5714/BCM5715 controllers also limit the DMA address to be within 40bit address space. Special thanks to davidch@ who gave me detailed errata information. I think this change will fix long standing bge(4) instability issues on systems with more than 4GB memory. r199671: Implement TSO for BCM5755 or newer controllers. Some controllers seem to require a special firmware to use TSO. But the firmware is not available to FreeBSD and Linux claims that the TSO performed by the firmware is slower than hardware based TSO. Moreover the firmware based TSO has one known bug which can't handle TSO if ethernet header + IP/TCP header is greater than 80 bytes. The workaround for the TSO bug exist but it seems it's too expensive than not using TSO at all. Some hardwares also have the TSO bug so limit the TSO to the controllers that are not affected TSO issues (e.g. 5755 or higher). While I'm here set VLAN tag bit to all descriptors that belengs to a frame instead of the first descriptor of a frame. The datasheet is not clear how to handle VLAN tag bit but it worked either way in my testing. This makes it simplify TSO configuration a little bit. Big thanks to davidch@ who sent me detailed TSO information. Without this I was not able to implement it. r199674: Add missing function prototype in r199671. r199679: Reduce status block size DMAed by controller. bge(4) uses single Tx/Rx/Rx return ring such that large part of status block was not used at all. All bge(4) controllers except BCM5700 AX/BX has a feature to control the size of status block. So use minimum status block size allowed in controller. This reduces number of DMAed status block size to 32 bytes from 80 bytes. r199761: BGE_FLAG_40BIT_BUG should be set before creating DMA tags. r199807: Make sure one shot MSI is enabled. r199808: Fix typo which inversed the logic which in turn disabled MSI. END K 10 svn:author V 7 attilio K 8 svn:date V 27 2010-01-07T00:47:50.212473Z K 7 svn:log V 582 Exclusive waiters sleeping with LK_SLEEPFAIL on and using interruptible sleeps/timeout may have left spourious lk_exslpfail counts on, so clean it up even when accessing a shared queue acquisition, giving to lk_exslpfail the value of 'upper limit'. In the worst case scenario, infact (mixed interruptible sleep / LK_SLEEPFAIL waiters) what may happen is that both queues are awaken even if that's not necessary, but still no harm. Reported by: Lucius Windschuh Reviewed by: kib Tested by: pho, Lucius Windschuh END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-07T00:48:10.084733Z K 7 svn:log V 2504 MFC r199670-199671,199674,199679,199761,199807-199808 r199670: Fix two long standing bugs on bge(4). Most pre BCM5755 controllers have a DMA bug when buffer address crosses a multiple of the 4GB boundary(e.g. 4GB, 8GB, 12GB etc). Limit DMA address to be within 4GB address for these controllers. The second DMA bug limits DMA address to be within 40bit address space. This bug applies to BCM5714 and BCM5715 and 5708(bce(4) controller). This is not actually a MAC controller bug but an issue with the embedded PCIe to PCI-X bridge in the device. So for BCM5714/BCM5715 controllers also limit the DMA address to be within 40bit address space. Special thanks to davidch@ who gave me detailed errata information. I think this change will fix long standing bge(4) instability issues on systems with more than 4GB memory. r199671: Implement TSO for BCM5755 or newer controllers. Some controllers seem to require a special firmware to use TSO. But the firmware is not available to FreeBSD and Linux claims that the TSO performed by the firmware is slower than hardware based TSO. Moreover the firmware based TSO has one known bug which can't handle TSO if ethernet header + IP/TCP header is greater than 80 bytes. The workaround for the TSO bug exist but it seems it's too expensive than not using TSO at all. Some hardwares also have the TSO bug so limit the TSO to the controllers that are not affected TSO issues (e.g. 5755 or higher). While I'm here set VLAN tag bit to all descriptors that belengs to a frame instead of the first descriptor of a frame. The datasheet is not clear how to handle VLAN tag bit but it worked either way in my testing. This makes it simplify TSO configuration a little bit. Big thanks to davidch@ who sent me detailed TSO information. Without this I was not able to implement it. r199674: Add missing function prototype in r199671. r199679: Reduce status block size DMAed by controller. bge(4) uses single Tx/Rx/Rx return ring such that large part of status block was not used at all. All bge(4) controllers except BCM5700 AX/BX has a feature to control the size of status block. So use minimum status block size allowed in controller. This reduces number of DMAed status block size to 32 bytes from 80 bytes. r199761: BGE_FLAG_40BIT_BUG should be set before creating DMA tags. r199807: Make sure one shot MSI is enabled. r199808: Fix typo which inversed the logic which in turn disabled MSI. END K 10 svn:author V 7 thompsa K 8 svn:date V 27 2010-01-07T00:50:45.811613Z K 7 svn:log V 132 Sync to p4 - Add new quirks commands and the '-d' option optionally to specify the ugen device. Submitted by: Hans Petter Selasky END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-07T00:55:07.804994Z K 7 svn:log V 2675 MFC r200088,200227-200228,200246,200264,201446 r200088: Add workaround to overcome hardware limitation which allows only a single outstanding DMA read operation. Most controllers targeted to client with PCIe bus interface(e.g. BCM5761) may have this limitation. All controllers for servers does not have this limitation. Collapsing mbuf chains to reduce number of memory reads before transmitting was most effective way to workaround this. I got about 940Mbps from 850Mbps with mbuf collapsing on BCM5761. However it takes a lot of CPU cycles to collapse mbuf chains so add tunable to control the number of allowed TX buffers before collapsing. The default value is 0 which effectively disables the forced collapsing. For most cases 2 would yield best performance(about 930Mbps) without much sacrificing CPU cycles. Note the collapsing is only activated when the controller is on PCIe bus and the frame does not need TSO operation. TSO does not seem to suffer from the hardware limitation because the payload size is much bigger than normal IP datagram. Thanks to davidch@ who told me the limitation of client controllers and actually gave possible workarounds to mitigate the limitation. r200227: Remove PHY isolate/power down code in bge_stop(). The isolation handler in brgphy(4) does not exist and brgphy(4) just resets the PHY and returns EINVAL as it has no isolation handler. I also agree on Marius's opinion that stop handler of every NIC driver seems to be the wrong place for implementing PHY isolate/power down. If we need PHY isolate/power down it should be implemented in brgphy(4) and users should administratively down the PHY. r200228: Don't access jumbo frame related registers if controller lacks the feature. These registers are reserved on controllers that have no support for jumbo frame. Only BCM5700 has mini ring so do not poke mini ring related registers if controller is not BCM5700. r200246: Partially revert r200228. For mini RCB case, bge(4) still have to disable mini ring withtout regard to mini ring support. r200264: Create sysctl node(dev.bge.%d.focred_collapse) instead of hw.bge.forced_collapse. hw.bge.forced_collapse affects all bge(4) controllers on system which may not desirable behavior of the sysctl node. Also allow the sysctl node could be modified at any time. r201446: Fix regression introduced in r198318. BCM5754/BCM5754M uses the same ASIC ID of BCM5758 such that r198318 incorecctly enabled TSO on BCM5754.BCM5754M controllers. BCM5754/BCM5754M needs a special firmware to enable TSO and bge(4) does not support firmware based TSO. END K 10 svn:author V 7 yongari K 8 svn:date V 27 2010-01-07T00:57:40.480606Z K 7 svn:log V 2675 MFC r200088,200227-200228,200246,200264,201446 r200088: Add workaround to overcome hardware limitation which allows only a single outstanding DMA read operation. Most controllers targeted to client with PCIe bus interface(e.g. BCM5761) may have this limitation. All controllers for servers does not have this limitation. Collapsing mbuf chains to reduce number of memory reads before transmitting was most effective way to workaround this. I got about 940Mbps from 850Mbps with mbuf collapsing on BCM5761. However it takes a lot of CPU cycles to collapse mbuf chains so add tunable to control the number of allowed TX buffers before collapsing. The default value is 0 which effectively disables the forced collapsing. For most cases 2 would yield best performance(about 930Mbps) without much sacrificing CPU cycles. Note the collapsing is only activated when the controller is on PCIe bus and the frame does not need TSO operation. TSO does not seem to suffer from the hardware limitation because the payload size is much bigger than normal IP datagram. Thanks to davidch@ who told me the limitation of client controllers and actually gave possible workarounds to mitigate the limitation. r200227: Remove PHY isolate/power down code in bge_stop(). The isolation handler in brgphy(4) does not exist and brgphy(4) just resets the PHY and returns EINVAL as it has no isolation handler. I also agree on Marius's opinion that stop handler of every NIC driver seems to be the wrong place for implementing PHY isolate/power down. If we need PHY isolate/power down it should be implemented in brgphy(4) and users should administratively down the PHY. r200228: Don't access jumbo frame related registers if controller lacks the feature. These registers are reserved on controllers that have no support for jumbo frame. Only BCM5700 has mini ring so do not poke mini ring related registers if controller is not BCM5700. r200246: Partially revert r200228. For mini RCB case, bge(4) still have to disable mini ring withtout regard to mini ring support. r200264: Create sysctl node(dev.bge.%d.focred_collapse) instead of hw.bge.forced_collapse. hw.bge.forced_collapse affects all bge(4) controllers on system which may not desirable behavior of the sysctl node. Also allow the sysctl node could be modified at any time. r201446: Fix regression introduced in r198318. BCM5754/BCM5754M uses the same ASIC ID of BCM5758 such that r198318 incorecctly enabled TSO on BCM5754.BCM5754M controllers. BCM5754/BCM5754M needs a special firmware to enable TSO and bge(4) does not support firmware based TSO. END K 10 svn:author V 8 mckusick K 8 svn:date V 27 2010-01-07T01:10:49.546224Z K 7 svn:log V 173 Add some error messages suggested in PR bin/138043. The code to correct the problem was added in r176575 by delphij on 2008-02-25. PR: 138043 Reported by: Heikki Suonsivu END K 10 svn:author V 7 attilio K 8 svn:date V 27 2010-01-07T01:19:01.138640Z K 7 svn:log V 16 Tweak comments. END K 10 svn:author V 7 attilio K 8 svn:date V 27 2010-01-07T01:24:09.222342Z K 7 svn:log V 11 Fix typos. END K 10 svn:author V 7 delphij K 8 svn:date V 27 2010-01-07T01:55:34.409939Z K 7 svn:log V 306 MFC r176575: In pass1(), cap inosused to fs_ipg rather than allowing arbitrary number read from cylinder group. Chances that we read a smarshed cylinder group, and we can not 100% trust information it has supplied. fsck_ffs(8) will crash otherwise for some cases. PR: bin/138043 Reminded by: mckusick END K 10 svn:author V 7 delphij K 8 svn:date V 27 2010-01-07T01:56:35.968287Z K 7 svn:log V 306 MFC r176575: In pass1(), cap inosused to fs_ipg rather than allowing arbitrary number read from cylinder group. Chances that we read a smarshed cylinder group, and we can not 100% trust information it has supplied. fsck_ffs(8) will crash otherwise for some cases. PR: bin/138043 Reminded by: mckusick END