K 10 svn:author V 7 delphij K 8 svn:date V 27 2015-04-17T01:22:43.152446Z K 7 svn:log V 8730 MFV r277428: Various improvements the dmu buf user API. Submitted by: Justin Gibbs Submitted by: Will Andrews Sponsored by: Spectra Logic Corporation Collect dmu buf user API support data into a new structure, dmu_buf_user_t. Consumers of this interface must include a dmu_buf_user_t as a member of the user data structure that will be attached to a dmu buffer. This reduces the size of dmu_buf_impl_t by two pointers. Queue dmu buf user eviction processing to a taskq. This prevents FreeBSD witness(4) lock-order reversal warnings, potential deadlocks, and reduces stack depth. Convert objset eviction from a synchronous to an asynchronous process to accommodate the asynchronous invocation, via dmu buf user eviction, of dnode_buf_pageout(). Modify existing users of the dmu buf user API to never access the dbuf to which their data was attached after user eviction has occurred. Accessing the dbuf from the callback is no longer safe now that callbacks occur without the locks that used to protect them. Enforce this in ZFS_DEBUG kernel builds by clearing the user's pointer to the dbuf, if any, at the time of eviction. Callbacks have also been modified to clear their dbuf pointer so most errors are caught even on non ZFS_DEBUG kernels. However, this will not catch accesses from other contexts that occur between the time of eviction and the processing of the callback. Clarify programmer intent and improve readability by providing specialized functions for the common user data update actions "remove" and "replace". Provide code-comment documentation for each API call. Perform runtime validation of proper API usage on ZFS_DEBUG kernels. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Add dbuf_verify_user() and call it from the dbuf user API and during dbuf eviction processing to verify dbuf user API state. Replace calls to dbuf_set_data(db, NULL) with more explicit db_clear_data(). dbuf_set_data() now asserts that its buffer argument is never NULL. Implement new dmu buf API functions. Add the dmu_evict_taskq for processing dmu buf user evictions. Add dmu_buf_user_evict_wait() which allows spa, dsl pool, and dmu close/fini functions to drain pending user evictions. In dbuf_rele_and_unlock(), immediately evict dbufs with a zero refcount for an objset that is being evicted. This allows the indirect dbufs in a dnode to be evicted asynchronously after the zero refcount dbufs that reference them are cleared via dmu_objset_evict()->dnode_evict_dbufs(). uts/common/fs/zfs/sys/dmu_objset.h: uts/common/fs/zfs/dmu_objset.c: End the practice of including special dnodes in os->os_dnodes. This allows os->os_dnodes to be managed completely by one eviction path: dnode_buf_pageout()->dnode_destroy(). Split objset eviction processing into two pieces. The first marks the objset as evicting, evicts any dbufs that have a refcount of zero, and then queues up the objset for the second phase of eviction. Once os->os_dnodes has been cleared by dnode_buf_pageout()->dnode_destroy(), the second phase is executed. The second phase closes the special dnodes, dequeues the objset from the list of those undergoing eviction, and finally frees the objset. NOTE: Due to asynchronous eviction processing (invocation of dnode_buf_pageout()), it is possible for the meta dnode for the objset to have no holds even though os->os_dnodes is not empty. uts/common/fs/zfs/sys/dnode.h: uts/common/fs/zfs/dnode.c: Collapse the initialization of a dnode from dnode_hold_impl() into dnode_create(). Since we already grab os_lock, use it to provide mutual exclusion for dnh->dnh_dnode and to arbitrate the winner of the initial open race. The only way to unset the handle is to page out an entire set of dnodes, which uses the user eviction mechanism to arbitrate. In dnode_destroy(), invoke final stage of objset eviction if the destroyed dnode is the last regular dnode in the objset. Modify dnode_buf_pageout() so that it doesn't reference the evicted dbuf. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs(), remove multiple passes over dn->dn_dbufs. This is possible now that objset eviction is asynchronously completed in a different context once dbuf eviction completes. In the case of objset eviction, any dbufs held by children will be evicted via dbuf_rele_and_unlock() once their refcounts go to zero. Even when objset eviction is not active, the ordering of the avl tree guarantees that children will be released before parents, allowing the parent's refcounts to naturally drop to zero before they are inspected in this single loop. In dnode_sync_free(), remove assertion that the dn_bonus of a dnode is NULL when the dnode is freed. This assertion likely wasn't true before this commit (due to races with zfs_obj_to_path()), and is not a requirement for the free to be successful. Now that user eviction is asynchronous it is easy to have evictions for destroyed dsl dirs and datasets still outstanding at the time their dnode is freed. uts/common/fs/zfs/sys/spa.h: uts/common/fs/zfs/sys/spa_impl.h: uts/common/fs/zfs/spa.c: uts/common/fs/zfs/spa_misc.c: Track evicting objsets in the spa. When recording or validating the min spa reference count, wait for objset eviction processing to complete so that the reference count is stable. uts/common/fs/zfs/sys/spa.h: uts/common/fs/zfs/sys/dsl_dir.h: uts/common/fs/zfs/spa_misc.c: uts/common/fs/zfs/dsl_dir.c: Add spa_async_close() and dsl_dir_async_rele(). These APIs are used during the async eviction process of dsl datasets and dirs to indicate that the normal rules for releasing a reference count on the spa do not apply. Async releases occur from a taskq without the namespace lock held and may be for objects contributing to spa_minref (e.g. during pool export). Thus these APIs do not enforce the namespace lock being held or the spa refcount being greater than spa_minref on entry. uts/common/fs/zfs/dsl_deadlist.c: uts/common/fs/zfs/dsl_dataset.c: Modify dsl_dataset_evict() so that it doesn't reference the evicted dbuf to determine if the deadlist needs to be closed. This is achieved by checking ds->ds_deadlist.dl_os instead which is now properly cleared when a deadlist is closed prior to eviction. uts/common/fs/zfs/dsl_pool.c: In dsl_pool_close(), flush the user evictions for any just released dsl dirs with a call to dmu_buf_user_evict_wait(). The dsl dirs have back references to the dsl_pool_t which are accessed during eviction so these must complete before the dsl_pool_t is destroyed. uts/common/fs/zfs/dsl_dir.c: uts/common/fs/zfs/sys/dsl_dir.h: uts/common/fs/zfs/dsl_dataset.c: uts/common/fs/zfs/sys/dsl_dataset.h: uts/common/fs/zfs/dsl_dir.c: uts/common/fs/zfs/dsl_prop.c: uts/common/fs/zfs/sa.c: uts/common/fs/zfs/sys/sa_impl.h: uts/common/fs/zfs/zap.c: uts/common/fs/zfs/sys/zap_impl.h: uts/common/fs/zfs/sys/zap_leaf.h: uts/common/fs/zfs/zap_micro.c: Conform to new dbuf user API. uts/common/fs/zfs/dsl_dir.c: In dsl_dir_hold(), change a dsl_dir_t* variable so it's type isn't confusing: child_ds -> child_dd. uts/common/fs/zfs/sys/dsl_dataset.h: uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/dmu_send.c: uts/common/fs/zfs/dmu_traverse.c uts/common/fs/zfs/dsl_bookmark.c: uts/common/fs/zfs/dsl_dataset.c: uts/common/fs/zfs/dsl_deleg.c: uts/common/fs/zfs/dsl_destroy.c: uts/common/fs/zfs/dsl_prop.c: uts/common/fs/zfs/dsl_scan.c: uts/common/fs/zfs/dsl_userhold.c: uts/common/fs/zfs/zil.c: Record whether or not a dsl dataset is a snapshot in the upon its creation in the ds_is_snapshot field rather than rely on access to the ds_num_children in the dsl_dataset_phys_t. This ensures that the snapshot trait can be determined during eviction processing when the dbuf holding the dsl_dataset_phys_t is no longer available. The conditional snapshot logic in dmu_objset_evict() is currently the only place where a ds_is_snapshot is tested during eviction. uts/common/fs/zfs/sys/dmu.h: Add prototypes, data structures, and code comment documentation for the dmu buf user api. Pull in ASSERT() via zfs_context.h. uts/common/fs/zfs/zfs_sa.c: Use zfs_context.h instead of manual includes of sys/types.h and sys/params.h so this file is compatible with the use of zfs_context.h in sys/dmu.h. Illumos issue: 5056 ZFS deadlock on db_mtx and dn_holds END