18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci==== 48c2ecf20Sopenharmony_ciFUSE 58c2ecf20Sopenharmony_ci==== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciDefinitions 88c2ecf20Sopenharmony_ci=========== 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciUserspace filesystem: 118c2ecf20Sopenharmony_ci A filesystem in which data and metadata are provided by an ordinary 128c2ecf20Sopenharmony_ci userspace process. The filesystem can be accessed normally through 138c2ecf20Sopenharmony_ci the kernel interface. 148c2ecf20Sopenharmony_ci 158c2ecf20Sopenharmony_ciFilesystem daemon: 168c2ecf20Sopenharmony_ci The process(es) providing the data and metadata of the filesystem. 178c2ecf20Sopenharmony_ci 188c2ecf20Sopenharmony_ciNon-privileged mount (or user mount): 198c2ecf20Sopenharmony_ci A userspace filesystem mounted by a non-privileged (non-root) user. 208c2ecf20Sopenharmony_ci The filesystem daemon is running with the privileges of the mounting 218c2ecf20Sopenharmony_ci user. NOTE: this is not the same as mounts allowed with the "user" 228c2ecf20Sopenharmony_ci option in /etc/fstab, which is not discussed here. 238c2ecf20Sopenharmony_ci 248c2ecf20Sopenharmony_ciFilesystem connection: 258c2ecf20Sopenharmony_ci A connection between the filesystem daemon and the kernel. The 268c2ecf20Sopenharmony_ci connection exists until either the daemon dies, or the filesystem is 278c2ecf20Sopenharmony_ci umounted. Note that detaching (or lazy umounting) the filesystem 288c2ecf20Sopenharmony_ci does *not* break the connection, in this case it will exist until 298c2ecf20Sopenharmony_ci the last reference to the filesystem is released. 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciMount owner: 328c2ecf20Sopenharmony_ci The user who does the mounting. 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ciUser: 358c2ecf20Sopenharmony_ci The user who is performing filesystem operations. 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ciWhat is FUSE? 388c2ecf20Sopenharmony_ci============= 398c2ecf20Sopenharmony_ci 408c2ecf20Sopenharmony_ciFUSE is a userspace filesystem framework. It consists of a kernel 418c2ecf20Sopenharmony_cimodule (fuse.ko), a userspace library (libfuse.*) and a mount utility 428c2ecf20Sopenharmony_ci(fusermount). 438c2ecf20Sopenharmony_ci 448c2ecf20Sopenharmony_ciOne of the most important features of FUSE is allowing secure, 458c2ecf20Sopenharmony_cinon-privileged mounts. This opens up new possibilities for the use of 468c2ecf20Sopenharmony_cifilesystems. A good example is sshfs: a secure network filesystem 478c2ecf20Sopenharmony_ciusing the sftp protocol. 488c2ecf20Sopenharmony_ci 498c2ecf20Sopenharmony_ciThe userspace library and utilities are available from the 508c2ecf20Sopenharmony_ci`FUSE homepage: <https://github.com/libfuse/>`_ 518c2ecf20Sopenharmony_ci 528c2ecf20Sopenharmony_ciFilesystem type 538c2ecf20Sopenharmony_ci=============== 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ciThe filesystem type given to mount(2) can be one of the following: 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ci fuse 588c2ecf20Sopenharmony_ci This is the usual way to mount a FUSE filesystem. The first 598c2ecf20Sopenharmony_ci argument of the mount system call may contain an arbitrary string, 608c2ecf20Sopenharmony_ci which is not interpreted by the kernel. 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ci fuseblk 638c2ecf20Sopenharmony_ci The filesystem is block device based. The first argument of the 648c2ecf20Sopenharmony_ci mount system call is interpreted as the name of the device. 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_ciMount options 678c2ecf20Sopenharmony_ci============= 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_cifd=N 708c2ecf20Sopenharmony_ci The file descriptor to use for communication between the userspace 718c2ecf20Sopenharmony_ci filesystem and the kernel. The file descriptor must have been 728c2ecf20Sopenharmony_ci obtained by opening the FUSE device ('/dev/fuse'). 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_cirootmode=M 758c2ecf20Sopenharmony_ci The file mode of the filesystem's root in octal representation. 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciuser_id=N 788c2ecf20Sopenharmony_ci The numeric user id of the mount owner. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_cigroup_id=N 818c2ecf20Sopenharmony_ci The numeric group id of the mount owner. 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_cidefault_permissions 848c2ecf20Sopenharmony_ci By default FUSE doesn't check file access permissions, the 858c2ecf20Sopenharmony_ci filesystem is free to implement its access policy or leave it to 868c2ecf20Sopenharmony_ci the underlying file access mechanism (e.g. in case of network 878c2ecf20Sopenharmony_ci filesystems). This option enables permission checking, restricting 888c2ecf20Sopenharmony_ci access based on file mode. It is usually useful together with the 898c2ecf20Sopenharmony_ci 'allow_other' mount option. 908c2ecf20Sopenharmony_ci 918c2ecf20Sopenharmony_ciallow_other 928c2ecf20Sopenharmony_ci This option overrides the security measure restricting file access 938c2ecf20Sopenharmony_ci to the user mounting the filesystem. This option is by default only 948c2ecf20Sopenharmony_ci allowed to root, but this restriction can be removed with a 958c2ecf20Sopenharmony_ci (userspace) configuration option. 968c2ecf20Sopenharmony_ci 978c2ecf20Sopenharmony_cimax_read=N 988c2ecf20Sopenharmony_ci With this option the maximum size of read operations can be set. 998c2ecf20Sopenharmony_ci The default is infinite. Note that the size of read requests is 1008c2ecf20Sopenharmony_ci limited anyway to 32 pages (which is 128kbyte on i386). 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ciblksize=N 1038c2ecf20Sopenharmony_ci Set the block size for the filesystem. The default is 512. This 1048c2ecf20Sopenharmony_ci option is only valid for 'fuseblk' type mounts. 1058c2ecf20Sopenharmony_ci 1068c2ecf20Sopenharmony_ciControl filesystem 1078c2ecf20Sopenharmony_ci================== 1088c2ecf20Sopenharmony_ci 1098c2ecf20Sopenharmony_ciThere's a control filesystem for FUSE, which can be mounted by:: 1108c2ecf20Sopenharmony_ci 1118c2ecf20Sopenharmony_ci mount -t fusectl none /sys/fs/fuse/connections 1128c2ecf20Sopenharmony_ci 1138c2ecf20Sopenharmony_ciMounting it under the '/sys/fs/fuse/connections' directory makes it 1148c2ecf20Sopenharmony_cibackwards compatible with earlier versions. 1158c2ecf20Sopenharmony_ci 1168c2ecf20Sopenharmony_ciUnder the fuse control filesystem each connection has a directory 1178c2ecf20Sopenharmony_cinamed by a unique number. 1188c2ecf20Sopenharmony_ci 1198c2ecf20Sopenharmony_ciFor each connection the following files exist within this directory: 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ci waiting 1228c2ecf20Sopenharmony_ci The number of requests which are waiting to be transferred to 1238c2ecf20Sopenharmony_ci userspace or being processed by the filesystem daemon. If there is 1248c2ecf20Sopenharmony_ci no filesystem activity and 'waiting' is non-zero, then the 1258c2ecf20Sopenharmony_ci filesystem is hung or deadlocked. 1268c2ecf20Sopenharmony_ci 1278c2ecf20Sopenharmony_ci abort 1288c2ecf20Sopenharmony_ci Writing anything into this file will abort the filesystem 1298c2ecf20Sopenharmony_ci connection. This means that all waiting requests will be aborted an 1308c2ecf20Sopenharmony_ci error returned for all aborted and new requests. 1318c2ecf20Sopenharmony_ci 1328c2ecf20Sopenharmony_ciOnly the owner of the mount may read or write these files. 1338c2ecf20Sopenharmony_ci 1348c2ecf20Sopenharmony_ciInterrupting filesystem operations 1358c2ecf20Sopenharmony_ci################################## 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ciIf a process issuing a FUSE filesystem request is interrupted, the 1388c2ecf20Sopenharmony_cifollowing will happen: 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ci - If the request is not yet sent to userspace AND the signal is 1418c2ecf20Sopenharmony_ci fatal (SIGKILL or unhandled fatal signal), then the request is 1428c2ecf20Sopenharmony_ci dequeued and returns immediately. 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ci - If the request is not yet sent to userspace AND the signal is not 1458c2ecf20Sopenharmony_ci fatal, then an interrupted flag is set for the request. When 1468c2ecf20Sopenharmony_ci the request has been successfully transferred to userspace and 1478c2ecf20Sopenharmony_ci this flag is set, an INTERRUPT request is queued. 1488c2ecf20Sopenharmony_ci 1498c2ecf20Sopenharmony_ci - If the request is already sent to userspace, then an INTERRUPT 1508c2ecf20Sopenharmony_ci request is queued. 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ciINTERRUPT requests take precedence over other requests, so the 1538c2ecf20Sopenharmony_ciuserspace filesystem will receive queued INTERRUPTs before any others. 1548c2ecf20Sopenharmony_ci 1558c2ecf20Sopenharmony_ciThe userspace filesystem may ignore the INTERRUPT requests entirely, 1568c2ecf20Sopenharmony_cior may honor them by sending a reply to the *original* request, with 1578c2ecf20Sopenharmony_cithe error set to EINTR. 1588c2ecf20Sopenharmony_ci 1598c2ecf20Sopenharmony_ciIt is also possible that there's a race between processing the 1608c2ecf20Sopenharmony_cioriginal request and its INTERRUPT request. There are two possibilities: 1618c2ecf20Sopenharmony_ci 1628c2ecf20Sopenharmony_ci 1. The INTERRUPT request is processed before the original request is 1638c2ecf20Sopenharmony_ci processed 1648c2ecf20Sopenharmony_ci 1658c2ecf20Sopenharmony_ci 2. The INTERRUPT request is processed after the original request has 1668c2ecf20Sopenharmony_ci been answered 1678c2ecf20Sopenharmony_ci 1688c2ecf20Sopenharmony_ciIf the filesystem cannot find the original request, it should wait for 1698c2ecf20Sopenharmony_cisome timeout and/or a number of new requests to arrive, after which it 1708c2ecf20Sopenharmony_cishould reply to the INTERRUPT request with an EAGAIN error. In case 1718c2ecf20Sopenharmony_ci1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT 1728c2ecf20Sopenharmony_cireply will be ignored. 1738c2ecf20Sopenharmony_ci 1748c2ecf20Sopenharmony_ciAborting a filesystem connection 1758c2ecf20Sopenharmony_ci================================ 1768c2ecf20Sopenharmony_ci 1778c2ecf20Sopenharmony_ciIt is possible to get into certain situations where the filesystem is 1788c2ecf20Sopenharmony_cinot responding. Reasons for this may be: 1798c2ecf20Sopenharmony_ci 1808c2ecf20Sopenharmony_ci a) Broken userspace filesystem implementation 1818c2ecf20Sopenharmony_ci 1828c2ecf20Sopenharmony_ci b) Network connection down 1838c2ecf20Sopenharmony_ci 1848c2ecf20Sopenharmony_ci c) Accidental deadlock 1858c2ecf20Sopenharmony_ci 1868c2ecf20Sopenharmony_ci d) Malicious deadlock 1878c2ecf20Sopenharmony_ci 1888c2ecf20Sopenharmony_ci(For more on c) and d) see later sections) 1898c2ecf20Sopenharmony_ci 1908c2ecf20Sopenharmony_ciIn either of these cases it may be useful to abort the connection to 1918c2ecf20Sopenharmony_cithe filesystem. There are several ways to do this: 1928c2ecf20Sopenharmony_ci 1938c2ecf20Sopenharmony_ci - Kill the filesystem daemon. Works in case of a) and b) 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ci - Kill the filesystem daemon and all users of the filesystem. Works 1968c2ecf20Sopenharmony_ci in all cases except some malicious deadlocks 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ci - Use forced umount (umount -f). Works in all cases but only if 1998c2ecf20Sopenharmony_ci filesystem is still attached (it hasn't been lazy unmounted) 2008c2ecf20Sopenharmony_ci 2018c2ecf20Sopenharmony_ci - Abort filesystem through the FUSE control filesystem. Most 2028c2ecf20Sopenharmony_ci powerful method, always works. 2038c2ecf20Sopenharmony_ci 2048c2ecf20Sopenharmony_ciHow do non-privileged mounts work? 2058c2ecf20Sopenharmony_ci================================== 2068c2ecf20Sopenharmony_ci 2078c2ecf20Sopenharmony_ciSince the mount() system call is a privileged operation, a helper 2088c2ecf20Sopenharmony_ciprogram (fusermount) is needed, which is installed setuid root. 2098c2ecf20Sopenharmony_ci 2108c2ecf20Sopenharmony_ciThe implication of providing non-privileged mounts is that the mount 2118c2ecf20Sopenharmony_ciowner must not be able to use this capability to compromise the 2128c2ecf20Sopenharmony_cisystem. Obvious requirements arising from this are: 2138c2ecf20Sopenharmony_ci 2148c2ecf20Sopenharmony_ci A) mount owner should not be able to get elevated privileges with the 2158c2ecf20Sopenharmony_ci help of the mounted filesystem 2168c2ecf20Sopenharmony_ci 2178c2ecf20Sopenharmony_ci B) mount owner should not get illegitimate access to information from 2188c2ecf20Sopenharmony_ci other users' and the super user's processes 2198c2ecf20Sopenharmony_ci 2208c2ecf20Sopenharmony_ci C) mount owner should not be able to induce undesired behavior in 2218c2ecf20Sopenharmony_ci other users' or the super user's processes 2228c2ecf20Sopenharmony_ci 2238c2ecf20Sopenharmony_ciHow are requirements fulfilled? 2248c2ecf20Sopenharmony_ci=============================== 2258c2ecf20Sopenharmony_ci 2268c2ecf20Sopenharmony_ci A) The mount owner could gain elevated privileges by either: 2278c2ecf20Sopenharmony_ci 2288c2ecf20Sopenharmony_ci 1. creating a filesystem containing a device file, then opening this device 2298c2ecf20Sopenharmony_ci 2308c2ecf20Sopenharmony_ci 2. creating a filesystem containing a suid or sgid application, then executing this application 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ci The solution is not to allow opening device files and ignore 2338c2ecf20Sopenharmony_ci setuid and setgid bits when executing programs. To ensure this 2348c2ecf20Sopenharmony_ci fusermount always adds "nosuid" and "nodev" to the mount options 2358c2ecf20Sopenharmony_ci for non-privileged mounts. 2368c2ecf20Sopenharmony_ci 2378c2ecf20Sopenharmony_ci B) If another user is accessing files or directories in the 2388c2ecf20Sopenharmony_ci filesystem, the filesystem daemon serving requests can record the 2398c2ecf20Sopenharmony_ci exact sequence and timing of operations performed. This 2408c2ecf20Sopenharmony_ci information is otherwise inaccessible to the mount owner, so this 2418c2ecf20Sopenharmony_ci counts as an information leak. 2428c2ecf20Sopenharmony_ci 2438c2ecf20Sopenharmony_ci The solution to this problem will be presented in point 2) of C). 2448c2ecf20Sopenharmony_ci 2458c2ecf20Sopenharmony_ci C) There are several ways in which the mount owner can induce 2468c2ecf20Sopenharmony_ci undesired behavior in other users' processes, such as: 2478c2ecf20Sopenharmony_ci 2488c2ecf20Sopenharmony_ci 1) mounting a filesystem over a file or directory which the mount 2498c2ecf20Sopenharmony_ci owner could otherwise not be able to modify (or could only 2508c2ecf20Sopenharmony_ci make limited modifications). 2518c2ecf20Sopenharmony_ci 2528c2ecf20Sopenharmony_ci This is solved in fusermount, by checking the access 2538c2ecf20Sopenharmony_ci permissions on the mountpoint and only allowing the mount if 2548c2ecf20Sopenharmony_ci the mount owner can do unlimited modification (has write 2558c2ecf20Sopenharmony_ci access to the mountpoint, and mountpoint is not a "sticky" 2568c2ecf20Sopenharmony_ci directory) 2578c2ecf20Sopenharmony_ci 2588c2ecf20Sopenharmony_ci 2) Even if 1) is solved the mount owner can change the behavior 2598c2ecf20Sopenharmony_ci of other users' processes. 2608c2ecf20Sopenharmony_ci 2618c2ecf20Sopenharmony_ci i) It can slow down or indefinitely delay the execution of a 2628c2ecf20Sopenharmony_ci filesystem operation creating a DoS against the user or the 2638c2ecf20Sopenharmony_ci whole system. For example a suid application locking a 2648c2ecf20Sopenharmony_ci system file, and then accessing a file on the mount owner's 2658c2ecf20Sopenharmony_ci filesystem could be stopped, and thus causing the system 2668c2ecf20Sopenharmony_ci file to be locked forever. 2678c2ecf20Sopenharmony_ci 2688c2ecf20Sopenharmony_ci ii) It can present files or directories of unlimited length, or 2698c2ecf20Sopenharmony_ci directory structures of unlimited depth, possibly causing a 2708c2ecf20Sopenharmony_ci system process to eat up diskspace, memory or other 2718c2ecf20Sopenharmony_ci resources, again causing *DoS*. 2728c2ecf20Sopenharmony_ci 2738c2ecf20Sopenharmony_ci The solution to this as well as B) is not to allow processes 2748c2ecf20Sopenharmony_ci to access the filesystem, which could otherwise not be 2758c2ecf20Sopenharmony_ci monitored or manipulated by the mount owner. Since if the 2768c2ecf20Sopenharmony_ci mount owner can ptrace a process, it can do all of the above 2778c2ecf20Sopenharmony_ci without using a FUSE mount, the same criteria as used in 2788c2ecf20Sopenharmony_ci ptrace can be used to check if a process is allowed to access 2798c2ecf20Sopenharmony_ci the filesystem or not. 2808c2ecf20Sopenharmony_ci 2818c2ecf20Sopenharmony_ci Note that the *ptrace* check is not strictly necessary to 2828c2ecf20Sopenharmony_ci prevent B/2/i, it is enough to check if mount owner has enough 2838c2ecf20Sopenharmony_ci privilege to send signal to the process accessing the 2848c2ecf20Sopenharmony_ci filesystem, since *SIGSTOP* can be used to get a similar effect. 2858c2ecf20Sopenharmony_ci 2868c2ecf20Sopenharmony_ciI think these limitations are unacceptable? 2878c2ecf20Sopenharmony_ci=========================================== 2888c2ecf20Sopenharmony_ci 2898c2ecf20Sopenharmony_ciIf a sysadmin trusts the users enough, or can ensure through other 2908c2ecf20Sopenharmony_cimeasures, that system processes will never enter non-privileged 2918c2ecf20Sopenharmony_cimounts, it can relax the last limitation with a 'user_allow_other' 2928c2ecf20Sopenharmony_ciconfig option. If this config option is set, the mounting user can 2938c2ecf20Sopenharmony_ciadd the 'allow_other' mount option which disables the check for other 2948c2ecf20Sopenharmony_ciusers' processes. 2958c2ecf20Sopenharmony_ci 2968c2ecf20Sopenharmony_ciKernel - userspace interface 2978c2ecf20Sopenharmony_ci============================ 2988c2ecf20Sopenharmony_ci 2998c2ecf20Sopenharmony_ciThe following diagram shows how a filesystem operation (in this 3008c2ecf20Sopenharmony_ciexample unlink) is performed in FUSE. :: 3018c2ecf20Sopenharmony_ci 3028c2ecf20Sopenharmony_ci 3038c2ecf20Sopenharmony_ci | "rm /mnt/fuse/file" | FUSE filesystem daemon 3048c2ecf20Sopenharmony_ci | | 3058c2ecf20Sopenharmony_ci | | >sys_read() 3068c2ecf20Sopenharmony_ci | | >fuse_dev_read() 3078c2ecf20Sopenharmony_ci | | >request_wait() 3088c2ecf20Sopenharmony_ci | | [sleep on fc->waitq] 3098c2ecf20Sopenharmony_ci | | 3108c2ecf20Sopenharmony_ci | >sys_unlink() | 3118c2ecf20Sopenharmony_ci | >fuse_unlink() | 3128c2ecf20Sopenharmony_ci | [get request from | 3138c2ecf20Sopenharmony_ci | fc->unused_list] | 3148c2ecf20Sopenharmony_ci | >request_send() | 3158c2ecf20Sopenharmony_ci | [queue req on fc->pending] | 3168c2ecf20Sopenharmony_ci | [wake up fc->waitq] | [woken up] 3178c2ecf20Sopenharmony_ci | >request_wait_answer() | 3188c2ecf20Sopenharmony_ci | [sleep on req->waitq] | 3198c2ecf20Sopenharmony_ci | | <request_wait() 3208c2ecf20Sopenharmony_ci | | [remove req from fc->pending] 3218c2ecf20Sopenharmony_ci | | [copy req to read buffer] 3228c2ecf20Sopenharmony_ci | | [add req to fc->processing] 3238c2ecf20Sopenharmony_ci | | <fuse_dev_read() 3248c2ecf20Sopenharmony_ci | | <sys_read() 3258c2ecf20Sopenharmony_ci | | 3268c2ecf20Sopenharmony_ci | | [perform unlink] 3278c2ecf20Sopenharmony_ci | | 3288c2ecf20Sopenharmony_ci | | >sys_write() 3298c2ecf20Sopenharmony_ci | | >fuse_dev_write() 3308c2ecf20Sopenharmony_ci | | [look up req in fc->processing] 3318c2ecf20Sopenharmony_ci | | [remove from fc->processing] 3328c2ecf20Sopenharmony_ci | | [copy write buffer to req] 3338c2ecf20Sopenharmony_ci | [woken up] | [wake up req->waitq] 3348c2ecf20Sopenharmony_ci | | <fuse_dev_write() 3358c2ecf20Sopenharmony_ci | | <sys_write() 3368c2ecf20Sopenharmony_ci | <request_wait_answer() | 3378c2ecf20Sopenharmony_ci | <request_send() | 3388c2ecf20Sopenharmony_ci | [add request to | 3398c2ecf20Sopenharmony_ci | fc->unused_list] | 3408c2ecf20Sopenharmony_ci | <fuse_unlink() | 3418c2ecf20Sopenharmony_ci | <sys_unlink() | 3428c2ecf20Sopenharmony_ci 3438c2ecf20Sopenharmony_ci.. note:: Everything in the description above is greatly simplified 3448c2ecf20Sopenharmony_ci 3458c2ecf20Sopenharmony_ciThere are a couple of ways in which to deadlock a FUSE filesystem. 3468c2ecf20Sopenharmony_ciSince we are talking about unprivileged userspace programs, 3478c2ecf20Sopenharmony_cisomething must be done about these. 3488c2ecf20Sopenharmony_ci 3498c2ecf20Sopenharmony_ci**Scenario 1 - Simple deadlock**:: 3508c2ecf20Sopenharmony_ci 3518c2ecf20Sopenharmony_ci | "rm /mnt/fuse/file" | FUSE filesystem daemon 3528c2ecf20Sopenharmony_ci | | 3538c2ecf20Sopenharmony_ci | >sys_unlink("/mnt/fuse/file") | 3548c2ecf20Sopenharmony_ci | [acquire inode semaphore | 3558c2ecf20Sopenharmony_ci | for "file"] | 3568c2ecf20Sopenharmony_ci | >fuse_unlink() | 3578c2ecf20Sopenharmony_ci | [sleep on req->waitq] | 3588c2ecf20Sopenharmony_ci | | <sys_read() 3598c2ecf20Sopenharmony_ci | | >sys_unlink("/mnt/fuse/file") 3608c2ecf20Sopenharmony_ci | | [acquire inode semaphore 3618c2ecf20Sopenharmony_ci | | for "file"] 3628c2ecf20Sopenharmony_ci | | *DEADLOCK* 3638c2ecf20Sopenharmony_ci 3648c2ecf20Sopenharmony_ciThe solution for this is to allow the filesystem to be aborted. 3658c2ecf20Sopenharmony_ci 3668c2ecf20Sopenharmony_ci**Scenario 2 - Tricky deadlock** 3678c2ecf20Sopenharmony_ci 3688c2ecf20Sopenharmony_ci 3698c2ecf20Sopenharmony_ciThis one needs a carefully crafted filesystem. It's a variation on 3708c2ecf20Sopenharmony_cithe above, only the call back to the filesystem is not explicit, 3718c2ecf20Sopenharmony_cibut is caused by a pagefault. :: 3728c2ecf20Sopenharmony_ci 3738c2ecf20Sopenharmony_ci | Kamikaze filesystem thread 1 | Kamikaze filesystem thread 2 3748c2ecf20Sopenharmony_ci | | 3758c2ecf20Sopenharmony_ci | [fd = open("/mnt/fuse/file")] | [request served normally] 3768c2ecf20Sopenharmony_ci | [mmap fd to 'addr'] | 3778c2ecf20Sopenharmony_ci | [close fd] | [FLUSH triggers 'magic' flag] 3788c2ecf20Sopenharmony_ci | [read a byte from addr] | 3798c2ecf20Sopenharmony_ci | >do_page_fault() | 3808c2ecf20Sopenharmony_ci | [find or create page] | 3818c2ecf20Sopenharmony_ci | [lock page] | 3828c2ecf20Sopenharmony_ci | >fuse_readpage() | 3838c2ecf20Sopenharmony_ci | [queue READ request] | 3848c2ecf20Sopenharmony_ci | [sleep on req->waitq] | 3858c2ecf20Sopenharmony_ci | | [read request to buffer] 3868c2ecf20Sopenharmony_ci | | [create reply header before addr] 3878c2ecf20Sopenharmony_ci | | >sys_write(addr - headerlength) 3888c2ecf20Sopenharmony_ci | | >fuse_dev_write() 3898c2ecf20Sopenharmony_ci | | [look up req in fc->processing] 3908c2ecf20Sopenharmony_ci | | [remove from fc->processing] 3918c2ecf20Sopenharmony_ci | | [copy write buffer to req] 3928c2ecf20Sopenharmony_ci | | >do_page_fault() 3938c2ecf20Sopenharmony_ci | | [find or create page] 3948c2ecf20Sopenharmony_ci | | [lock page] 3958c2ecf20Sopenharmony_ci | | * DEADLOCK * 3968c2ecf20Sopenharmony_ci 3978c2ecf20Sopenharmony_ciThe solution is basically the same as above. 3988c2ecf20Sopenharmony_ci 3998c2ecf20Sopenharmony_ciAn additional problem is that while the write buffer is being copied 4008c2ecf20Sopenharmony_cito the request, the request must not be interrupted/aborted. This is 4018c2ecf20Sopenharmony_cibecause the destination address of the copy may not be valid after the 4028c2ecf20Sopenharmony_cirequest has returned. 4038c2ecf20Sopenharmony_ci 4048c2ecf20Sopenharmony_ciThis is solved with doing the copy atomically, and allowing abort 4058c2ecf20Sopenharmony_ciwhile the page(s) belonging to the write buffer are faulted with 4068c2ecf20Sopenharmony_ciget_user_pages(). The 'req->locked' flag indicates when the copy is 4078c2ecf20Sopenharmony_citaking place, and abort is delayed until this flag is unset. 408