18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci====
48c2ecf20Sopenharmony_ciFUSE
58c2ecf20Sopenharmony_ci====
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciDefinitions
88c2ecf20Sopenharmony_ci===========
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciUserspace filesystem:
118c2ecf20Sopenharmony_ci  A filesystem in which data and metadata are provided by an ordinary
128c2ecf20Sopenharmony_ci  userspace process.  The filesystem can be accessed normally through
138c2ecf20Sopenharmony_ci  the kernel interface.
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciFilesystem daemon:
168c2ecf20Sopenharmony_ci  The process(es) providing the data and metadata of the filesystem.
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_ciNon-privileged mount (or user mount):
198c2ecf20Sopenharmony_ci  A userspace filesystem mounted by a non-privileged (non-root) user.
208c2ecf20Sopenharmony_ci  The filesystem daemon is running with the privileges of the mounting
218c2ecf20Sopenharmony_ci  user.  NOTE: this is not the same as mounts allowed with the "user"
228c2ecf20Sopenharmony_ci  option in /etc/fstab, which is not discussed here.
238c2ecf20Sopenharmony_ci
248c2ecf20Sopenharmony_ciFilesystem connection:
258c2ecf20Sopenharmony_ci  A connection between the filesystem daemon and the kernel.  The
268c2ecf20Sopenharmony_ci  connection exists until either the daemon dies, or the filesystem is
278c2ecf20Sopenharmony_ci  umounted.  Note that detaching (or lazy umounting) the filesystem
288c2ecf20Sopenharmony_ci  does *not* break the connection, in this case it will exist until
298c2ecf20Sopenharmony_ci  the last reference to the filesystem is released.
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ciMount owner:
328c2ecf20Sopenharmony_ci  The user who does the mounting.
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciUser:
358c2ecf20Sopenharmony_ci  The user who is performing filesystem operations.
368c2ecf20Sopenharmony_ci
378c2ecf20Sopenharmony_ciWhat is FUSE?
388c2ecf20Sopenharmony_ci=============
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ciFUSE is a userspace filesystem framework.  It consists of a kernel
418c2ecf20Sopenharmony_cimodule (fuse.ko), a userspace library (libfuse.*) and a mount utility
428c2ecf20Sopenharmony_ci(fusermount).
438c2ecf20Sopenharmony_ci
448c2ecf20Sopenharmony_ciOne of the most important features of FUSE is allowing secure,
458c2ecf20Sopenharmony_cinon-privileged mounts.  This opens up new possibilities for the use of
468c2ecf20Sopenharmony_cifilesystems.  A good example is sshfs: a secure network filesystem
478c2ecf20Sopenharmony_ciusing the sftp protocol.
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ciThe userspace library and utilities are available from the
508c2ecf20Sopenharmony_ci`FUSE homepage: <https://github.com/libfuse/>`_
518c2ecf20Sopenharmony_ci
528c2ecf20Sopenharmony_ciFilesystem type
538c2ecf20Sopenharmony_ci===============
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ciThe filesystem type given to mount(2) can be one of the following:
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ci    fuse
588c2ecf20Sopenharmony_ci      This is the usual way to mount a FUSE filesystem.  The first
598c2ecf20Sopenharmony_ci      argument of the mount system call may contain an arbitrary string,
608c2ecf20Sopenharmony_ci      which is not interpreted by the kernel.
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_ci    fuseblk
638c2ecf20Sopenharmony_ci      The filesystem is block device based.  The first argument of the
648c2ecf20Sopenharmony_ci      mount system call is interpreted as the name of the device.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_ciMount options
678c2ecf20Sopenharmony_ci=============
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_cifd=N
708c2ecf20Sopenharmony_ci  The file descriptor to use for communication between the userspace
718c2ecf20Sopenharmony_ci  filesystem and the kernel.  The file descriptor must have been
728c2ecf20Sopenharmony_ci  obtained by opening the FUSE device ('/dev/fuse').
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_cirootmode=M
758c2ecf20Sopenharmony_ci  The file mode of the filesystem's root in octal representation.
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciuser_id=N
788c2ecf20Sopenharmony_ci  The numeric user id of the mount owner.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_cigroup_id=N
818c2ecf20Sopenharmony_ci  The numeric group id of the mount owner.
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_cidefault_permissions
848c2ecf20Sopenharmony_ci  By default FUSE doesn't check file access permissions, the
858c2ecf20Sopenharmony_ci  filesystem is free to implement its access policy or leave it to
868c2ecf20Sopenharmony_ci  the underlying file access mechanism (e.g. in case of network
878c2ecf20Sopenharmony_ci  filesystems).  This option enables permission checking, restricting
888c2ecf20Sopenharmony_ci  access based on file mode.  It is usually useful together with the
898c2ecf20Sopenharmony_ci  'allow_other' mount option.
908c2ecf20Sopenharmony_ci
918c2ecf20Sopenharmony_ciallow_other
928c2ecf20Sopenharmony_ci  This option overrides the security measure restricting file access
938c2ecf20Sopenharmony_ci  to the user mounting the filesystem.  This option is by default only
948c2ecf20Sopenharmony_ci  allowed to root, but this restriction can be removed with a
958c2ecf20Sopenharmony_ci  (userspace) configuration option.
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_cimax_read=N
988c2ecf20Sopenharmony_ci  With this option the maximum size of read operations can be set.
998c2ecf20Sopenharmony_ci  The default is infinite.  Note that the size of read requests is
1008c2ecf20Sopenharmony_ci  limited anyway to 32 pages (which is 128kbyte on i386).
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_ciblksize=N
1038c2ecf20Sopenharmony_ci  Set the block size for the filesystem.  The default is 512.  This
1048c2ecf20Sopenharmony_ci  option is only valid for 'fuseblk' type mounts.
1058c2ecf20Sopenharmony_ci
1068c2ecf20Sopenharmony_ciControl filesystem
1078c2ecf20Sopenharmony_ci==================
1088c2ecf20Sopenharmony_ci
1098c2ecf20Sopenharmony_ciThere's a control filesystem for FUSE, which can be mounted by::
1108c2ecf20Sopenharmony_ci
1118c2ecf20Sopenharmony_ci  mount -t fusectl none /sys/fs/fuse/connections
1128c2ecf20Sopenharmony_ci
1138c2ecf20Sopenharmony_ciMounting it under the '/sys/fs/fuse/connections' directory makes it
1148c2ecf20Sopenharmony_cibackwards compatible with earlier versions.
1158c2ecf20Sopenharmony_ci
1168c2ecf20Sopenharmony_ciUnder the fuse control filesystem each connection has a directory
1178c2ecf20Sopenharmony_cinamed by a unique number.
1188c2ecf20Sopenharmony_ci
1198c2ecf20Sopenharmony_ciFor each connection the following files exist within this directory:
1208c2ecf20Sopenharmony_ci
1218c2ecf20Sopenharmony_ci	waiting
1228c2ecf20Sopenharmony_ci	  The number of requests which are waiting to be transferred to
1238c2ecf20Sopenharmony_ci	  userspace or being processed by the filesystem daemon.  If there is
1248c2ecf20Sopenharmony_ci	  no filesystem activity and 'waiting' is non-zero, then the
1258c2ecf20Sopenharmony_ci	  filesystem is hung or deadlocked.
1268c2ecf20Sopenharmony_ci
1278c2ecf20Sopenharmony_ci	abort
1288c2ecf20Sopenharmony_ci	  Writing anything into this file will abort the filesystem
1298c2ecf20Sopenharmony_ci	  connection.  This means that all waiting requests will be aborted an
1308c2ecf20Sopenharmony_ci	  error returned for all aborted and new requests.
1318c2ecf20Sopenharmony_ci
1328c2ecf20Sopenharmony_ciOnly the owner of the mount may read or write these files.
1338c2ecf20Sopenharmony_ci
1348c2ecf20Sopenharmony_ciInterrupting filesystem operations
1358c2ecf20Sopenharmony_ci##################################
1368c2ecf20Sopenharmony_ci
1378c2ecf20Sopenharmony_ciIf a process issuing a FUSE filesystem request is interrupted, the
1388c2ecf20Sopenharmony_cifollowing will happen:
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ci  -  If the request is not yet sent to userspace AND the signal is
1418c2ecf20Sopenharmony_ci     fatal (SIGKILL or unhandled fatal signal), then the request is
1428c2ecf20Sopenharmony_ci     dequeued and returns immediately.
1438c2ecf20Sopenharmony_ci
1448c2ecf20Sopenharmony_ci  -  If the request is not yet sent to userspace AND the signal is not
1458c2ecf20Sopenharmony_ci     fatal, then an interrupted flag is set for the request.  When
1468c2ecf20Sopenharmony_ci     the request has been successfully transferred to userspace and
1478c2ecf20Sopenharmony_ci     this flag is set, an INTERRUPT request is queued.
1488c2ecf20Sopenharmony_ci
1498c2ecf20Sopenharmony_ci  -  If the request is already sent to userspace, then an INTERRUPT
1508c2ecf20Sopenharmony_ci     request is queued.
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ciINTERRUPT requests take precedence over other requests, so the
1538c2ecf20Sopenharmony_ciuserspace filesystem will receive queued INTERRUPTs before any others.
1548c2ecf20Sopenharmony_ci
1558c2ecf20Sopenharmony_ciThe userspace filesystem may ignore the INTERRUPT requests entirely,
1568c2ecf20Sopenharmony_cior may honor them by sending a reply to the *original* request, with
1578c2ecf20Sopenharmony_cithe error set to EINTR.
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ciIt is also possible that there's a race between processing the
1608c2ecf20Sopenharmony_cioriginal request and its INTERRUPT request.  There are two possibilities:
1618c2ecf20Sopenharmony_ci
1628c2ecf20Sopenharmony_ci  1. The INTERRUPT request is processed before the original request is
1638c2ecf20Sopenharmony_ci     processed
1648c2ecf20Sopenharmony_ci
1658c2ecf20Sopenharmony_ci  2. The INTERRUPT request is processed after the original request has
1668c2ecf20Sopenharmony_ci     been answered
1678c2ecf20Sopenharmony_ci
1688c2ecf20Sopenharmony_ciIf the filesystem cannot find the original request, it should wait for
1698c2ecf20Sopenharmony_cisome timeout and/or a number of new requests to arrive, after which it
1708c2ecf20Sopenharmony_cishould reply to the INTERRUPT request with an EAGAIN error.  In case
1718c2ecf20Sopenharmony_ci1) the INTERRUPT request will be requeued.  In case 2) the INTERRUPT
1728c2ecf20Sopenharmony_cireply will be ignored.
1738c2ecf20Sopenharmony_ci
1748c2ecf20Sopenharmony_ciAborting a filesystem connection
1758c2ecf20Sopenharmony_ci================================
1768c2ecf20Sopenharmony_ci
1778c2ecf20Sopenharmony_ciIt is possible to get into certain situations where the filesystem is
1788c2ecf20Sopenharmony_cinot responding.  Reasons for this may be:
1798c2ecf20Sopenharmony_ci
1808c2ecf20Sopenharmony_ci  a) Broken userspace filesystem implementation
1818c2ecf20Sopenharmony_ci
1828c2ecf20Sopenharmony_ci  b) Network connection down
1838c2ecf20Sopenharmony_ci
1848c2ecf20Sopenharmony_ci  c) Accidental deadlock
1858c2ecf20Sopenharmony_ci
1868c2ecf20Sopenharmony_ci  d) Malicious deadlock
1878c2ecf20Sopenharmony_ci
1888c2ecf20Sopenharmony_ci(For more on c) and d) see later sections)
1898c2ecf20Sopenharmony_ci
1908c2ecf20Sopenharmony_ciIn either of these cases it may be useful to abort the connection to
1918c2ecf20Sopenharmony_cithe filesystem.  There are several ways to do this:
1928c2ecf20Sopenharmony_ci
1938c2ecf20Sopenharmony_ci  - Kill the filesystem daemon.  Works in case of a) and b)
1948c2ecf20Sopenharmony_ci
1958c2ecf20Sopenharmony_ci  - Kill the filesystem daemon and all users of the filesystem.  Works
1968c2ecf20Sopenharmony_ci    in all cases except some malicious deadlocks
1978c2ecf20Sopenharmony_ci
1988c2ecf20Sopenharmony_ci  - Use forced umount (umount -f).  Works in all cases but only if
1998c2ecf20Sopenharmony_ci    filesystem is still attached (it hasn't been lazy unmounted)
2008c2ecf20Sopenharmony_ci
2018c2ecf20Sopenharmony_ci  - Abort filesystem through the FUSE control filesystem.  Most
2028c2ecf20Sopenharmony_ci    powerful method, always works.
2038c2ecf20Sopenharmony_ci
2048c2ecf20Sopenharmony_ciHow do non-privileged mounts work?
2058c2ecf20Sopenharmony_ci==================================
2068c2ecf20Sopenharmony_ci
2078c2ecf20Sopenharmony_ciSince the mount() system call is a privileged operation, a helper
2088c2ecf20Sopenharmony_ciprogram (fusermount) is needed, which is installed setuid root.
2098c2ecf20Sopenharmony_ci
2108c2ecf20Sopenharmony_ciThe implication of providing non-privileged mounts is that the mount
2118c2ecf20Sopenharmony_ciowner must not be able to use this capability to compromise the
2128c2ecf20Sopenharmony_cisystem.  Obvious requirements arising from this are:
2138c2ecf20Sopenharmony_ci
2148c2ecf20Sopenharmony_ci A) mount owner should not be able to get elevated privileges with the
2158c2ecf20Sopenharmony_ci    help of the mounted filesystem
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_ci B) mount owner should not get illegitimate access to information from
2188c2ecf20Sopenharmony_ci    other users' and the super user's processes
2198c2ecf20Sopenharmony_ci
2208c2ecf20Sopenharmony_ci C) mount owner should not be able to induce undesired behavior in
2218c2ecf20Sopenharmony_ci    other users' or the super user's processes
2228c2ecf20Sopenharmony_ci
2238c2ecf20Sopenharmony_ciHow are requirements fulfilled?
2248c2ecf20Sopenharmony_ci===============================
2258c2ecf20Sopenharmony_ci
2268c2ecf20Sopenharmony_ci A) The mount owner could gain elevated privileges by either:
2278c2ecf20Sopenharmony_ci
2288c2ecf20Sopenharmony_ci    1. creating a filesystem containing a device file, then opening this device
2298c2ecf20Sopenharmony_ci
2308c2ecf20Sopenharmony_ci    2. creating a filesystem containing a suid or sgid application, then executing this application
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ci    The solution is not to allow opening device files and ignore
2338c2ecf20Sopenharmony_ci    setuid and setgid bits when executing programs.  To ensure this
2348c2ecf20Sopenharmony_ci    fusermount always adds "nosuid" and "nodev" to the mount options
2358c2ecf20Sopenharmony_ci    for non-privileged mounts.
2368c2ecf20Sopenharmony_ci
2378c2ecf20Sopenharmony_ci B) If another user is accessing files or directories in the
2388c2ecf20Sopenharmony_ci    filesystem, the filesystem daemon serving requests can record the
2398c2ecf20Sopenharmony_ci    exact sequence and timing of operations performed.  This
2408c2ecf20Sopenharmony_ci    information is otherwise inaccessible to the mount owner, so this
2418c2ecf20Sopenharmony_ci    counts as an information leak.
2428c2ecf20Sopenharmony_ci
2438c2ecf20Sopenharmony_ci    The solution to this problem will be presented in point 2) of C).
2448c2ecf20Sopenharmony_ci
2458c2ecf20Sopenharmony_ci C) There are several ways in which the mount owner can induce
2468c2ecf20Sopenharmony_ci    undesired behavior in other users' processes, such as:
2478c2ecf20Sopenharmony_ci
2488c2ecf20Sopenharmony_ci     1) mounting a filesystem over a file or directory which the mount
2498c2ecf20Sopenharmony_ci        owner could otherwise not be able to modify (or could only
2508c2ecf20Sopenharmony_ci        make limited modifications).
2518c2ecf20Sopenharmony_ci
2528c2ecf20Sopenharmony_ci        This is solved in fusermount, by checking the access
2538c2ecf20Sopenharmony_ci        permissions on the mountpoint and only allowing the mount if
2548c2ecf20Sopenharmony_ci        the mount owner can do unlimited modification (has write
2558c2ecf20Sopenharmony_ci        access to the mountpoint, and mountpoint is not a "sticky"
2568c2ecf20Sopenharmony_ci        directory)
2578c2ecf20Sopenharmony_ci
2588c2ecf20Sopenharmony_ci     2) Even if 1) is solved the mount owner can change the behavior
2598c2ecf20Sopenharmony_ci        of other users' processes.
2608c2ecf20Sopenharmony_ci
2618c2ecf20Sopenharmony_ci         i) It can slow down or indefinitely delay the execution of a
2628c2ecf20Sopenharmony_ci            filesystem operation creating a DoS against the user or the
2638c2ecf20Sopenharmony_ci            whole system.  For example a suid application locking a
2648c2ecf20Sopenharmony_ci            system file, and then accessing a file on the mount owner's
2658c2ecf20Sopenharmony_ci            filesystem could be stopped, and thus causing the system
2668c2ecf20Sopenharmony_ci            file to be locked forever.
2678c2ecf20Sopenharmony_ci
2688c2ecf20Sopenharmony_ci         ii) It can present files or directories of unlimited length, or
2698c2ecf20Sopenharmony_ci             directory structures of unlimited depth, possibly causing a
2708c2ecf20Sopenharmony_ci             system process to eat up diskspace, memory or other
2718c2ecf20Sopenharmony_ci             resources, again causing *DoS*.
2728c2ecf20Sopenharmony_ci
2738c2ecf20Sopenharmony_ci	The solution to this as well as B) is not to allow processes
2748c2ecf20Sopenharmony_ci	to access the filesystem, which could otherwise not be
2758c2ecf20Sopenharmony_ci	monitored or manipulated by the mount owner.  Since if the
2768c2ecf20Sopenharmony_ci	mount owner can ptrace a process, it can do all of the above
2778c2ecf20Sopenharmony_ci	without using a FUSE mount, the same criteria as used in
2788c2ecf20Sopenharmony_ci	ptrace can be used to check if a process is allowed to access
2798c2ecf20Sopenharmony_ci	the filesystem or not.
2808c2ecf20Sopenharmony_ci
2818c2ecf20Sopenharmony_ci	Note that the *ptrace* check is not strictly necessary to
2828c2ecf20Sopenharmony_ci	prevent B/2/i, it is enough to check if mount owner has enough
2838c2ecf20Sopenharmony_ci	privilege to send signal to the process accessing the
2848c2ecf20Sopenharmony_ci	filesystem, since *SIGSTOP* can be used to get a similar effect.
2858c2ecf20Sopenharmony_ci
2868c2ecf20Sopenharmony_ciI think these limitations are unacceptable?
2878c2ecf20Sopenharmony_ci===========================================
2888c2ecf20Sopenharmony_ci
2898c2ecf20Sopenharmony_ciIf a sysadmin trusts the users enough, or can ensure through other
2908c2ecf20Sopenharmony_cimeasures, that system processes will never enter non-privileged
2918c2ecf20Sopenharmony_cimounts, it can relax the last limitation with a 'user_allow_other'
2928c2ecf20Sopenharmony_ciconfig option.  If this config option is set, the mounting user can
2938c2ecf20Sopenharmony_ciadd the 'allow_other' mount option which disables the check for other
2948c2ecf20Sopenharmony_ciusers' processes.
2958c2ecf20Sopenharmony_ci
2968c2ecf20Sopenharmony_ciKernel - userspace interface
2978c2ecf20Sopenharmony_ci============================
2988c2ecf20Sopenharmony_ci
2998c2ecf20Sopenharmony_ciThe following diagram shows how a filesystem operation (in this
3008c2ecf20Sopenharmony_ciexample unlink) is performed in FUSE. ::
3018c2ecf20Sopenharmony_ci
3028c2ecf20Sopenharmony_ci
3038c2ecf20Sopenharmony_ci |  "rm /mnt/fuse/file"               |  FUSE filesystem daemon
3048c2ecf20Sopenharmony_ci |                                    |
3058c2ecf20Sopenharmony_ci |                                    |  >sys_read()
3068c2ecf20Sopenharmony_ci |                                    |    >fuse_dev_read()
3078c2ecf20Sopenharmony_ci |                                    |      >request_wait()
3088c2ecf20Sopenharmony_ci |                                    |        [sleep on fc->waitq]
3098c2ecf20Sopenharmony_ci |                                    |
3108c2ecf20Sopenharmony_ci |  >sys_unlink()                     |
3118c2ecf20Sopenharmony_ci |    >fuse_unlink()                  |
3128c2ecf20Sopenharmony_ci |      [get request from             |
3138c2ecf20Sopenharmony_ci |       fc->unused_list]             |
3148c2ecf20Sopenharmony_ci |      >request_send()               |
3158c2ecf20Sopenharmony_ci |        [queue req on fc->pending]  |
3168c2ecf20Sopenharmony_ci |        [wake up fc->waitq]         |        [woken up]
3178c2ecf20Sopenharmony_ci |        >request_wait_answer()      |
3188c2ecf20Sopenharmony_ci |          [sleep on req->waitq]     |
3198c2ecf20Sopenharmony_ci |                                    |      <request_wait()
3208c2ecf20Sopenharmony_ci |                                    |      [remove req from fc->pending]
3218c2ecf20Sopenharmony_ci |                                    |      [copy req to read buffer]
3228c2ecf20Sopenharmony_ci |                                    |      [add req to fc->processing]
3238c2ecf20Sopenharmony_ci |                                    |    <fuse_dev_read()
3248c2ecf20Sopenharmony_ci |                                    |  <sys_read()
3258c2ecf20Sopenharmony_ci |                                    |
3268c2ecf20Sopenharmony_ci |                                    |  [perform unlink]
3278c2ecf20Sopenharmony_ci |                                    |
3288c2ecf20Sopenharmony_ci |                                    |  >sys_write()
3298c2ecf20Sopenharmony_ci |                                    |    >fuse_dev_write()
3308c2ecf20Sopenharmony_ci |                                    |      [look up req in fc->processing]
3318c2ecf20Sopenharmony_ci |                                    |      [remove from fc->processing]
3328c2ecf20Sopenharmony_ci |                                    |      [copy write buffer to req]
3338c2ecf20Sopenharmony_ci |          [woken up]                |      [wake up req->waitq]
3348c2ecf20Sopenharmony_ci |                                    |    <fuse_dev_write()
3358c2ecf20Sopenharmony_ci |                                    |  <sys_write()
3368c2ecf20Sopenharmony_ci |        <request_wait_answer()      |
3378c2ecf20Sopenharmony_ci |      <request_send()               |
3388c2ecf20Sopenharmony_ci |      [add request to               |
3398c2ecf20Sopenharmony_ci |       fc->unused_list]             |
3408c2ecf20Sopenharmony_ci |    <fuse_unlink()                  |
3418c2ecf20Sopenharmony_ci |  <sys_unlink()                     |
3428c2ecf20Sopenharmony_ci
3438c2ecf20Sopenharmony_ci.. note:: Everything in the description above is greatly simplified
3448c2ecf20Sopenharmony_ci
3458c2ecf20Sopenharmony_ciThere are a couple of ways in which to deadlock a FUSE filesystem.
3468c2ecf20Sopenharmony_ciSince we are talking about unprivileged userspace programs,
3478c2ecf20Sopenharmony_cisomething must be done about these.
3488c2ecf20Sopenharmony_ci
3498c2ecf20Sopenharmony_ci**Scenario 1 -  Simple deadlock**::
3508c2ecf20Sopenharmony_ci
3518c2ecf20Sopenharmony_ci |  "rm /mnt/fuse/file"               |  FUSE filesystem daemon
3528c2ecf20Sopenharmony_ci |                                    |
3538c2ecf20Sopenharmony_ci |  >sys_unlink("/mnt/fuse/file")     |
3548c2ecf20Sopenharmony_ci |    [acquire inode semaphore        |
3558c2ecf20Sopenharmony_ci |     for "file"]                    |
3568c2ecf20Sopenharmony_ci |    >fuse_unlink()                  |
3578c2ecf20Sopenharmony_ci |      [sleep on req->waitq]         |
3588c2ecf20Sopenharmony_ci |                                    |  <sys_read()
3598c2ecf20Sopenharmony_ci |                                    |  >sys_unlink("/mnt/fuse/file")
3608c2ecf20Sopenharmony_ci |                                    |    [acquire inode semaphore
3618c2ecf20Sopenharmony_ci |                                    |     for "file"]
3628c2ecf20Sopenharmony_ci |                                    |    *DEADLOCK*
3638c2ecf20Sopenharmony_ci
3648c2ecf20Sopenharmony_ciThe solution for this is to allow the filesystem to be aborted.
3658c2ecf20Sopenharmony_ci
3668c2ecf20Sopenharmony_ci**Scenario 2 - Tricky deadlock**
3678c2ecf20Sopenharmony_ci
3688c2ecf20Sopenharmony_ci
3698c2ecf20Sopenharmony_ciThis one needs a carefully crafted filesystem.  It's a variation on
3708c2ecf20Sopenharmony_cithe above, only the call back to the filesystem is not explicit,
3718c2ecf20Sopenharmony_cibut is caused by a pagefault. ::
3728c2ecf20Sopenharmony_ci
3738c2ecf20Sopenharmony_ci |  Kamikaze filesystem thread 1      |  Kamikaze filesystem thread 2
3748c2ecf20Sopenharmony_ci |                                    |
3758c2ecf20Sopenharmony_ci |  [fd = open("/mnt/fuse/file")]     |  [request served normally]
3768c2ecf20Sopenharmony_ci |  [mmap fd to 'addr']               |
3778c2ecf20Sopenharmony_ci |  [close fd]                        |  [FLUSH triggers 'magic' flag]
3788c2ecf20Sopenharmony_ci |  [read a byte from addr]           |
3798c2ecf20Sopenharmony_ci |    >do_page_fault()                |
3808c2ecf20Sopenharmony_ci |      [find or create page]         |
3818c2ecf20Sopenharmony_ci |      [lock page]                   |
3828c2ecf20Sopenharmony_ci |      >fuse_readpage()              |
3838c2ecf20Sopenharmony_ci |         [queue READ request]       |
3848c2ecf20Sopenharmony_ci |         [sleep on req->waitq]      |
3858c2ecf20Sopenharmony_ci |                                    |  [read request to buffer]
3868c2ecf20Sopenharmony_ci |                                    |  [create reply header before addr]
3878c2ecf20Sopenharmony_ci |                                    |  >sys_write(addr - headerlength)
3888c2ecf20Sopenharmony_ci |                                    |    >fuse_dev_write()
3898c2ecf20Sopenharmony_ci |                                    |      [look up req in fc->processing]
3908c2ecf20Sopenharmony_ci |                                    |      [remove from fc->processing]
3918c2ecf20Sopenharmony_ci |                                    |      [copy write buffer to req]
3928c2ecf20Sopenharmony_ci |                                    |        >do_page_fault()
3938c2ecf20Sopenharmony_ci |                                    |           [find or create page]
3948c2ecf20Sopenharmony_ci |                                    |           [lock page]
3958c2ecf20Sopenharmony_ci |                                    |           * DEADLOCK *
3968c2ecf20Sopenharmony_ci
3978c2ecf20Sopenharmony_ciThe solution is basically the same as above.
3988c2ecf20Sopenharmony_ci
3998c2ecf20Sopenharmony_ciAn additional problem is that while the write buffer is being copied
4008c2ecf20Sopenharmony_cito the request, the request must not be interrupted/aborted.  This is
4018c2ecf20Sopenharmony_cibecause the destination address of the copy may not be valid after the
4028c2ecf20Sopenharmony_cirequest has returned.
4038c2ecf20Sopenharmony_ci
4048c2ecf20Sopenharmony_ciThis is solved with doing the copy atomically, and allowing abort
4058c2ecf20Sopenharmony_ciwhile the page(s) belonging to the write buffer are faulted with
4068c2ecf20Sopenharmony_ciget_user_pages().  The 'req->locked' flag indicates when the copy is
4078c2ecf20Sopenharmony_citaking place, and abort is delayed until this flag is unset.
408