18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci=============================================================== 48c2ecf20Sopenharmony_ciInotify - A Powerful yet Simple File Change Notification System 58c2ecf20Sopenharmony_ci=============================================================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciDocument started 15 Mar 2005 by Robert Love <rml@novell.com> 108c2ecf20Sopenharmony_ci 118c2ecf20Sopenharmony_ciDocument updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com> 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ci - Deleted obsoleted interface, just refer to manpages for user interface. 148c2ecf20Sopenharmony_ci 158c2ecf20Sopenharmony_ci(i) Rationale 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciQ: 188c2ecf20Sopenharmony_ci What is the design decision behind not tying the watch to the open fd of 198c2ecf20Sopenharmony_ci the watched object? 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_ciA: 228c2ecf20Sopenharmony_ci Watches are associated with an open inotify device, not an open file. 238c2ecf20Sopenharmony_ci This solves the primary problem with dnotify: keeping the file open pins 248c2ecf20Sopenharmony_ci the file and thus, worse, pins the mount. Dnotify is therefore infeasible 258c2ecf20Sopenharmony_ci for use on a desktop system with removable media as the media cannot be 268c2ecf20Sopenharmony_ci unmounted. Watching a file should not require that it be open. 278c2ecf20Sopenharmony_ci 288c2ecf20Sopenharmony_ciQ: 298c2ecf20Sopenharmony_ci What is the design decision behind using an-fd-per-instance as opposed to 308c2ecf20Sopenharmony_ci an fd-per-watch? 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ciA: 338c2ecf20Sopenharmony_ci An fd-per-watch quickly consumes more file descriptors than are allowed, 348c2ecf20Sopenharmony_ci more fd's than are feasible to manage, and more fd's than are optimally 358c2ecf20Sopenharmony_ci select()-able. Yes, root can bump the per-process fd limit and yes, users 368c2ecf20Sopenharmony_ci can use epoll, but requiring both is a silly and extraneous requirement. 378c2ecf20Sopenharmony_ci A watch consumes less memory than an open file, separating the number 388c2ecf20Sopenharmony_ci spaces is thus sensible. The current design is what user-space developers 398c2ecf20Sopenharmony_ci want: Users initialize inotify, once, and add n watches, requiring but one 408c2ecf20Sopenharmony_ci fd and no twiddling with fd limits. Initializing an inotify instance two 418c2ecf20Sopenharmony_ci thousand times is silly. If we can implement user-space's preferences 428c2ecf20Sopenharmony_ci cleanly--and we can, the idr layer makes stuff like this trivial--then we 438c2ecf20Sopenharmony_ci should. 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_ci There are other good arguments. With a single fd, there is a single 468c2ecf20Sopenharmony_ci item to block on, which is mapped to a single queue of events. The single 478c2ecf20Sopenharmony_ci fd returns all watch events and also any potential out-of-band data. If 488c2ecf20Sopenharmony_ci every fd was a separate watch, 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ci - There would be no way to get event ordering. Events on file foo and 518c2ecf20Sopenharmony_ci file bar would pop poll() on both fd's, but there would be no way to tell 528c2ecf20Sopenharmony_ci which happened first. A single queue trivially gives you ordering. Such 538c2ecf20Sopenharmony_ci ordering is crucial to existing applications such as Beagle. Imagine 548c2ecf20Sopenharmony_ci "mv a b ; mv b a" events without ordering. 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_ci - We'd have to maintain n fd's and n internal queues with state, 578c2ecf20Sopenharmony_ci versus just one. It is a lot messier in the kernel. A single, linear 588c2ecf20Sopenharmony_ci queue is the data structure that makes sense. 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_ci - User-space developers prefer the current API. The Beagle guys, for 618c2ecf20Sopenharmony_ci example, love it. Trust me, I asked. It is not a surprise: Who'd want 628c2ecf20Sopenharmony_ci to manage and block on 1000 fd's via select? 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci - No way to get out of band data. 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_ci - 1024 is still too low. ;-) 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ci When you talk about designing a file change notification system that 698c2ecf20Sopenharmony_ci scales to 1000s of directories, juggling 1000s of fd's just does not seem 708c2ecf20Sopenharmony_ci the right interface. It is too heavy. 718c2ecf20Sopenharmony_ci 728c2ecf20Sopenharmony_ci Additionally, it _is_ possible to more than one instance and 738c2ecf20Sopenharmony_ci juggle more than one queue and thus more than one associated fd. There 748c2ecf20Sopenharmony_ci need not be a one-fd-per-process mapping; it is one-fd-per-queue and a 758c2ecf20Sopenharmony_ci process can easily want more than one queue. 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciQ: 788c2ecf20Sopenharmony_ci Why the system call approach? 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ciA: 818c2ecf20Sopenharmony_ci The poor user-space interface is the second biggest problem with dnotify. 828c2ecf20Sopenharmony_ci Signals are a terrible, terrible interface for file notification. Or for 838c2ecf20Sopenharmony_ci anything, for that matter. The ideal solution, from all perspectives, is a 848c2ecf20Sopenharmony_ci file descriptor-based one that allows basic file I/O and poll/select. 858c2ecf20Sopenharmony_ci Obtaining the fd and managing the watches could have been done either via a 868c2ecf20Sopenharmony_ci device file or a family of new system calls. We decided to implement a 878c2ecf20Sopenharmony_ci family of system calls because that is the preferred approach for new kernel 888c2ecf20Sopenharmony_ci interfaces. The only real difference was whether we wanted to use open(2) 898c2ecf20Sopenharmony_ci and ioctl(2) or a couple of new system calls. System calls beat ioctls. 908c2ecf20Sopenharmony_ci 91