18c2ecf20Sopenharmony_ci=================
28c2ecf20Sopenharmony_ciQueue sysfs files
38c2ecf20Sopenharmony_ci=================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciThis text file will detail the queue files that are located in the sysfs tree
68c2ecf20Sopenharmony_cifor each block device. Note that stacked devices typically do not export
78c2ecf20Sopenharmony_ciany settings, since their queue merely functions are a remapping target.
88c2ecf20Sopenharmony_ciThese files are the ones found in the /sys/block/xxx/queue/ directory.
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciFiles denoted with a RO postfix are readonly and the RW postfix means
118c2ecf20Sopenharmony_ciread-write.
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciadd_random (RW)
148c2ecf20Sopenharmony_ci---------------
158c2ecf20Sopenharmony_ciThis file allows to turn off the disk entropy contribution. Default
168c2ecf20Sopenharmony_civalue of this file is '1'(on).
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_cichunk_sectors (RO)
198c2ecf20Sopenharmony_ci------------------
208c2ecf20Sopenharmony_ciThis has different meaning depending on the type of the block device.
218c2ecf20Sopenharmony_ciFor a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
228c2ecf20Sopenharmony_ciof the RAID volume stripe segment. For a zoned block device, either host-aware
238c2ecf20Sopenharmony_cior host-managed, chunk_sectors indicates the size in 512B sectors of the zones
248c2ecf20Sopenharmony_ciof the device, with the eventual exception of the last zone of the device which
258c2ecf20Sopenharmony_cimay be smaller.
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_cidax (RO)
288c2ecf20Sopenharmony_ci--------
298c2ecf20Sopenharmony_ciThis file indicates whether the device supports Direct Access (DAX),
308c2ecf20Sopenharmony_ciused by CPU-addressable storage to bypass the pagecache.  It shows '1'
318c2ecf20Sopenharmony_ciif true, '0' if not.
328c2ecf20Sopenharmony_ci
338c2ecf20Sopenharmony_cidiscard_granularity (RO)
348c2ecf20Sopenharmony_ci------------------------
358c2ecf20Sopenharmony_ciThis shows the size of internal allocation of the device in bytes, if
368c2ecf20Sopenharmony_cireported by the device. A value of '0' means device does not support
378c2ecf20Sopenharmony_cithe discard functionality.
388c2ecf20Sopenharmony_ci
398c2ecf20Sopenharmony_cidiscard_max_hw_bytes (RO)
408c2ecf20Sopenharmony_ci-------------------------
418c2ecf20Sopenharmony_ciDevices that support discard functionality may have internal limits on
428c2ecf20Sopenharmony_cithe number of bytes that can be trimmed or unmapped in a single operation.
438c2ecf20Sopenharmony_ciThe discard_max_bytes parameter is set by the device driver to the maximum
448c2ecf20Sopenharmony_cinumber of bytes that can be discarded in a single operation. Discard
458c2ecf20Sopenharmony_cirequests issued to the device must not exceed this limit. A discard_max_bytes
468c2ecf20Sopenharmony_civalue of 0 means that the device does not support discard functionality.
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_cidiscard_max_bytes (RW)
498c2ecf20Sopenharmony_ci----------------------
508c2ecf20Sopenharmony_ciWhile discard_max_hw_bytes is the hardware limit for the device, this
518c2ecf20Sopenharmony_cisetting is the software limit. Some devices exhibit large latencies when
528c2ecf20Sopenharmony_cilarge discards are issued, setting this value lower will make Linux issue
538c2ecf20Sopenharmony_cismaller discards and potentially help reduce latencies induced by large
548c2ecf20Sopenharmony_cidiscard operations.
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_cidiscard_zeroes_data (RO)
578c2ecf20Sopenharmony_ci------------------------
588c2ecf20Sopenharmony_ciObsolete. Always zero.
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_cifua (RO)
618c2ecf20Sopenharmony_ci--------
628c2ecf20Sopenharmony_ciWhether or not the block driver supports the FUA flag for write requests.
638c2ecf20Sopenharmony_ciFUA stands for Force Unit Access. If the FUA flag is set that means that
648c2ecf20Sopenharmony_ciwrite requests must bypass the volatile cache of the storage device.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_cihw_sector_size (RO)
678c2ecf20Sopenharmony_ci-------------------
688c2ecf20Sopenharmony_ciThis is the hardware sector size of the device, in bytes.
698c2ecf20Sopenharmony_ci
708c2ecf20Sopenharmony_ciio_poll (RW)
718c2ecf20Sopenharmony_ci------------
728c2ecf20Sopenharmony_ciWhen read, this file shows whether polling is enabled (1) or disabled
738c2ecf20Sopenharmony_ci(0).  Writing '0' to this file will disable polling for this device.
748c2ecf20Sopenharmony_ciWriting any non-zero value will enable this feature.
758c2ecf20Sopenharmony_ci
768c2ecf20Sopenharmony_ciio_poll_delay (RW)
778c2ecf20Sopenharmony_ci------------------
788c2ecf20Sopenharmony_ciIf polling is enabled, this controls what kind of polling will be
798c2ecf20Sopenharmony_ciperformed. It defaults to -1, which is classic polling. In this mode,
808c2ecf20Sopenharmony_cithe CPU will repeatedly ask for completions without giving up any time.
818c2ecf20Sopenharmony_ciIf set to 0, a hybrid polling mode is used, where the kernel will attempt
828c2ecf20Sopenharmony_cito make an educated guess at when the IO will complete. Based on this
838c2ecf20Sopenharmony_ciguess, the kernel will put the process issuing IO to sleep for an amount
848c2ecf20Sopenharmony_ciof time, before entering a classic poll loop. This mode might be a
858c2ecf20Sopenharmony_cilittle slower than pure classic polling, but it will be more efficient.
868c2ecf20Sopenharmony_ciIf set to a value larger than 0, the kernel will put the process issuing
878c2ecf20Sopenharmony_ciIO to sleep for this amount of microseconds before entering classic
888c2ecf20Sopenharmony_cipolling.
898c2ecf20Sopenharmony_ci
908c2ecf20Sopenharmony_ciio_timeout (RW)
918c2ecf20Sopenharmony_ci---------------
928c2ecf20Sopenharmony_ciio_timeout is the request timeout in milliseconds. If a request does not
938c2ecf20Sopenharmony_cicomplete in this time then the block driver timeout handler is invoked.
948c2ecf20Sopenharmony_ciThat timeout handler can decide to retry the request, to fail it or to start
958c2ecf20Sopenharmony_cia device recovery strategy.
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_ciiostats (RW)
988c2ecf20Sopenharmony_ci-------------
998c2ecf20Sopenharmony_ciThis file is used to control (on/off) the iostats accounting of the
1008c2ecf20Sopenharmony_cidisk.
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_cilogical_block_size (RO)
1038c2ecf20Sopenharmony_ci-----------------------
1048c2ecf20Sopenharmony_ciThis is the logical block size of the device, in bytes.
1058c2ecf20Sopenharmony_ci
1068c2ecf20Sopenharmony_cimax_discard_segments (RO)
1078c2ecf20Sopenharmony_ci-------------------------
1088c2ecf20Sopenharmony_ciThe maximum number of DMA scatter/gather entries in a discard request.
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_cimax_hw_sectors_kb (RO)
1118c2ecf20Sopenharmony_ci----------------------
1128c2ecf20Sopenharmony_ciThis is the maximum number of kilobytes supported in a single data transfer.
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_cimax_integrity_segments (RO)
1158c2ecf20Sopenharmony_ci---------------------------
1168c2ecf20Sopenharmony_ciMaximum number of elements in a DMA scatter/gather list with integrity
1178c2ecf20Sopenharmony_cidata that will be submitted by the block layer core to the associated
1188c2ecf20Sopenharmony_ciblock driver.
1198c2ecf20Sopenharmony_ci
1208c2ecf20Sopenharmony_cimax_active_zones (RO)
1218c2ecf20Sopenharmony_ci---------------------
1228c2ecf20Sopenharmony_ciFor zoned block devices (zoned attribute indicating "host-managed" or
1238c2ecf20Sopenharmony_ci"host-aware"), the sum of zones belonging to any of the zone states:
1248c2ecf20Sopenharmony_ciEXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
1258c2ecf20Sopenharmony_ciIf this value is 0, there is no limit.
1268c2ecf20Sopenharmony_ci
1278c2ecf20Sopenharmony_ciIf the host attempts to exceed this limit, the driver should report this error
1288c2ecf20Sopenharmony_ciwith BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW
1298c2ecf20Sopenharmony_cierrno.
1308c2ecf20Sopenharmony_ci
1318c2ecf20Sopenharmony_cimax_open_zones (RO)
1328c2ecf20Sopenharmony_ci-------------------
1338c2ecf20Sopenharmony_ciFor zoned block devices (zoned attribute indicating "host-managed" or
1348c2ecf20Sopenharmony_ci"host-aware"), the sum of zones belonging to any of the zone states:
1358c2ecf20Sopenharmony_ciEXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
1368c2ecf20Sopenharmony_ciIf this value is 0, there is no limit.
1378c2ecf20Sopenharmony_ci
1388c2ecf20Sopenharmony_ciIf the host attempts to exceed this limit, the driver should report this error
1398c2ecf20Sopenharmony_ciwith BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS
1408c2ecf20Sopenharmony_cierrno.
1418c2ecf20Sopenharmony_ci
1428c2ecf20Sopenharmony_cimax_sectors_kb (RW)
1438c2ecf20Sopenharmony_ci-------------------
1448c2ecf20Sopenharmony_ciThis is the maximum number of kilobytes that the block layer will allow
1458c2ecf20Sopenharmony_cifor a filesystem request. Must be smaller than or equal to the maximum
1468c2ecf20Sopenharmony_cisize allowed by the hardware.
1478c2ecf20Sopenharmony_ci
1488c2ecf20Sopenharmony_cimax_segments (RO)
1498c2ecf20Sopenharmony_ci-----------------
1508c2ecf20Sopenharmony_ciMaximum number of elements in a DMA scatter/gather list that is submitted
1518c2ecf20Sopenharmony_cito the associated block driver.
1528c2ecf20Sopenharmony_ci
1538c2ecf20Sopenharmony_cimax_segment_size (RO)
1548c2ecf20Sopenharmony_ci---------------------
1558c2ecf20Sopenharmony_ciMaximum size in bytes of a single element in a DMA scatter/gather list.
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ciminimum_io_size (RO)
1588c2ecf20Sopenharmony_ci--------------------
1598c2ecf20Sopenharmony_ciThis is the smallest preferred IO size reported by the device.
1608c2ecf20Sopenharmony_ci
1618c2ecf20Sopenharmony_cinomerges (RW)
1628c2ecf20Sopenharmony_ci-------------
1638c2ecf20Sopenharmony_ciThis enables the user to disable the lookup logic involved with IO
1648c2ecf20Sopenharmony_cimerging requests in the block layer. By default (0) all merges are
1658c2ecf20Sopenharmony_cienabled. When set to 1 only simple one-hit merges will be tried. When
1668c2ecf20Sopenharmony_ciset to 2 no merge algorithms will be tried (including one-hit or more
1678c2ecf20Sopenharmony_cicomplex tree/hash lookups).
1688c2ecf20Sopenharmony_ci
1698c2ecf20Sopenharmony_cinr_requests (RW)
1708c2ecf20Sopenharmony_ci----------------
1718c2ecf20Sopenharmony_ciThis controls how many requests may be allocated in the block layer for
1728c2ecf20Sopenharmony_ciread or write requests. Note that the total allocated number may be twice
1738c2ecf20Sopenharmony_cithis amount, since it applies only to reads or writes (not the accumulated
1748c2ecf20Sopenharmony_cisum).
1758c2ecf20Sopenharmony_ci
1768c2ecf20Sopenharmony_ciTo avoid priority inversion through request starvation, a request
1778c2ecf20Sopenharmony_ciqueue maintains a separate request pool per each cgroup when
1788c2ecf20Sopenharmony_ciCONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
1798c2ecf20Sopenharmony_ciper-block-cgroup request pool.  IOW, if there are N block cgroups,
1808c2ecf20Sopenharmony_cieach request queue may have up to N request pools, each independently
1818c2ecf20Sopenharmony_ciregulated by nr_requests.
1828c2ecf20Sopenharmony_ci
1838c2ecf20Sopenharmony_cinr_zones (RO)
1848c2ecf20Sopenharmony_ci-------------
1858c2ecf20Sopenharmony_ciFor zoned block devices (zoned attribute indicating "host-managed" or
1868c2ecf20Sopenharmony_ci"host-aware"), this indicates the total number of zones of the device.
1878c2ecf20Sopenharmony_ciThis is always 0 for regular block devices.
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_cioptimal_io_size (RO)
1908c2ecf20Sopenharmony_ci--------------------
1918c2ecf20Sopenharmony_ciThis is the optimal IO size reported by the device.
1928c2ecf20Sopenharmony_ci
1938c2ecf20Sopenharmony_ciphysical_block_size (RO)
1948c2ecf20Sopenharmony_ci------------------------
1958c2ecf20Sopenharmony_ciThis is the physical block size of device, in bytes.
1968c2ecf20Sopenharmony_ci
1978c2ecf20Sopenharmony_ciread_ahead_kb (RW)
1988c2ecf20Sopenharmony_ci------------------
1998c2ecf20Sopenharmony_ciMaximum number of kilobytes to read-ahead for filesystems on this block
2008c2ecf20Sopenharmony_cidevice.
2018c2ecf20Sopenharmony_ci
2028c2ecf20Sopenharmony_cirotational (RW)
2038c2ecf20Sopenharmony_ci---------------
2048c2ecf20Sopenharmony_ciThis file is used to stat if the device is of rotational type or
2058c2ecf20Sopenharmony_cinon-rotational type.
2068c2ecf20Sopenharmony_ci
2078c2ecf20Sopenharmony_cirq_affinity (RW)
2088c2ecf20Sopenharmony_ci----------------
2098c2ecf20Sopenharmony_ciIf this option is '1', the block layer will migrate request completions to the
2108c2ecf20Sopenharmony_cicpu "group" that originally submitted the request. For some workloads this
2118c2ecf20Sopenharmony_ciprovides a significant reduction in CPU cycles due to caching effects.
2128c2ecf20Sopenharmony_ci
2138c2ecf20Sopenharmony_ciFor storage configurations that need to maximize distribution of completion
2148c2ecf20Sopenharmony_ciprocessing setting this option to '2' forces the completion to run on the
2158c2ecf20Sopenharmony_cirequesting cpu (bypassing the "group" aggregation logic).
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_cischeduler (RW)
2188c2ecf20Sopenharmony_ci--------------
2198c2ecf20Sopenharmony_ciWhen read, this file will display the current and available IO schedulers
2208c2ecf20Sopenharmony_cifor this block device. The currently active IO scheduler will be enclosed
2218c2ecf20Sopenharmony_ciin [] brackets. Writing an IO scheduler name to this file will switch
2228c2ecf20Sopenharmony_cicontrol of this block device to that new IO scheduler. Note that writing
2238c2ecf20Sopenharmony_cian IO scheduler name to this file will attempt to load that IO scheduler
2248c2ecf20Sopenharmony_cimodule, if it isn't already present in the system.
2258c2ecf20Sopenharmony_ci
2268c2ecf20Sopenharmony_ciwrite_cache (RW)
2278c2ecf20Sopenharmony_ci----------------
2288c2ecf20Sopenharmony_ciWhen read, this file will display whether the device has write back
2298c2ecf20Sopenharmony_cicaching enabled or not. It will return "write back" for the former
2308c2ecf20Sopenharmony_cicase, and "write through" for the latter. Writing to this file can
2318c2ecf20Sopenharmony_cichange the kernels view of the device, but it doesn't alter the
2328c2ecf20Sopenharmony_cidevice state. This means that it might not be safe to toggle the
2338c2ecf20Sopenharmony_cisetting from "write back" to "write through", since that will also
2348c2ecf20Sopenharmony_cieliminate cache flushes issued by the kernel.
2358c2ecf20Sopenharmony_ci
2368c2ecf20Sopenharmony_ciwrite_same_max_bytes (RO)
2378c2ecf20Sopenharmony_ci-------------------------
2388c2ecf20Sopenharmony_ciThis is the number of bytes the device can write in a single write-same
2398c2ecf20Sopenharmony_cicommand.  A value of '0' means write-same is not supported by this
2408c2ecf20Sopenharmony_cidevice.
2418c2ecf20Sopenharmony_ci
2428c2ecf20Sopenharmony_ciwbt_lat_usec (RW)
2438c2ecf20Sopenharmony_ci-----------------
2448c2ecf20Sopenharmony_ciIf the device is registered for writeback throttling, then this file shows
2458c2ecf20Sopenharmony_cithe target minimum read latency. If this latency is exceeded in a given
2468c2ecf20Sopenharmony_ciwindow of time (see wb_window_usec), then the writeback throttling will start
2478c2ecf20Sopenharmony_ciscaling back writes. Writing a value of '0' to this file disables the
2488c2ecf20Sopenharmony_cifeature. Writing a value of '-1' to this file resets the value to the
2498c2ecf20Sopenharmony_cidefault setting.
2508c2ecf20Sopenharmony_ci
2518c2ecf20Sopenharmony_cithrottle_sample_time (RW)
2528c2ecf20Sopenharmony_ci-------------------------
2538c2ecf20Sopenharmony_ciThis is the time window that blk-throttle samples data, in millisecond.
2548c2ecf20Sopenharmony_ciblk-throttle makes decision based on the samplings. Lower time means cgroups
2558c2ecf20Sopenharmony_cihave more smooth throughput, but higher CPU overhead. This exists only when
2568c2ecf20Sopenharmony_ciCONFIG_BLK_DEV_THROTTLING_LOW is enabled.
2578c2ecf20Sopenharmony_ci
2588c2ecf20Sopenharmony_ciwrite_zeroes_max_bytes (RO)
2598c2ecf20Sopenharmony_ci---------------------------
2608c2ecf20Sopenharmony_ciFor block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
2618c2ecf20Sopenharmony_cibytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
2628c2ecf20Sopenharmony_ciis not supported.
2638c2ecf20Sopenharmony_ci
2648c2ecf20Sopenharmony_cizoned (RO)
2658c2ecf20Sopenharmony_ci----------
2668c2ecf20Sopenharmony_ciThis indicates if the device is a zoned block device and the zone model of the
2678c2ecf20Sopenharmony_cidevice if it is indeed zoned. The possible values indicated by zoned are
2688c2ecf20Sopenharmony_ci"none" for regular block devices and "host-aware" or "host-managed" for zoned
2698c2ecf20Sopenharmony_ciblock devices. The characteristics of host-aware and host-managed zoned block
2708c2ecf20Sopenharmony_cidevices are described in the ZBC (Zoned Block Commands) and ZAC
2718c2ecf20Sopenharmony_ci(Zoned Device ATA Command Set) standards. These standards also define the
2728c2ecf20Sopenharmony_ci"drive-managed" zone model. However, since drive-managed zoned block devices
2738c2ecf20Sopenharmony_cido not support zone commands, they will be treated as regular block devices
2748c2ecf20Sopenharmony_ciand zoned will report "none".
2758c2ecf20Sopenharmony_ci
2768c2ecf20Sopenharmony_ciJens Axboe <jens.axboe@oracle.com>, February 2009
277