18c2ecf20Sopenharmony_ci================= 28c2ecf20Sopenharmony_ciQueue sysfs files 38c2ecf20Sopenharmony_ci================= 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciThis text file will detail the queue files that are located in the sysfs tree 68c2ecf20Sopenharmony_cifor each block device. Note that stacked devices typically do not export 78c2ecf20Sopenharmony_ciany settings, since their queue merely functions are a remapping target. 88c2ecf20Sopenharmony_ciThese files are the ones found in the /sys/block/xxx/queue/ directory. 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciFiles denoted with a RO postfix are readonly and the RW postfix means 118c2ecf20Sopenharmony_ciread-write. 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciadd_random (RW) 148c2ecf20Sopenharmony_ci--------------- 158c2ecf20Sopenharmony_ciThis file allows to turn off the disk entropy contribution. Default 168c2ecf20Sopenharmony_civalue of this file is '1'(on). 178c2ecf20Sopenharmony_ci 188c2ecf20Sopenharmony_cichunk_sectors (RO) 198c2ecf20Sopenharmony_ci------------------ 208c2ecf20Sopenharmony_ciThis has different meaning depending on the type of the block device. 218c2ecf20Sopenharmony_ciFor a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors 228c2ecf20Sopenharmony_ciof the RAID volume stripe segment. For a zoned block device, either host-aware 238c2ecf20Sopenharmony_cior host-managed, chunk_sectors indicates the size in 512B sectors of the zones 248c2ecf20Sopenharmony_ciof the device, with the eventual exception of the last zone of the device which 258c2ecf20Sopenharmony_cimay be smaller. 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_cidax (RO) 288c2ecf20Sopenharmony_ci-------- 298c2ecf20Sopenharmony_ciThis file indicates whether the device supports Direct Access (DAX), 308c2ecf20Sopenharmony_ciused by CPU-addressable storage to bypass the pagecache. It shows '1' 318c2ecf20Sopenharmony_ciif true, '0' if not. 328c2ecf20Sopenharmony_ci 338c2ecf20Sopenharmony_cidiscard_granularity (RO) 348c2ecf20Sopenharmony_ci------------------------ 358c2ecf20Sopenharmony_ciThis shows the size of internal allocation of the device in bytes, if 368c2ecf20Sopenharmony_cireported by the device. A value of '0' means device does not support 378c2ecf20Sopenharmony_cithe discard functionality. 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_cidiscard_max_hw_bytes (RO) 408c2ecf20Sopenharmony_ci------------------------- 418c2ecf20Sopenharmony_ciDevices that support discard functionality may have internal limits on 428c2ecf20Sopenharmony_cithe number of bytes that can be trimmed or unmapped in a single operation. 438c2ecf20Sopenharmony_ciThe discard_max_bytes parameter is set by the device driver to the maximum 448c2ecf20Sopenharmony_cinumber of bytes that can be discarded in a single operation. Discard 458c2ecf20Sopenharmony_cirequests issued to the device must not exceed this limit. A discard_max_bytes 468c2ecf20Sopenharmony_civalue of 0 means that the device does not support discard functionality. 478c2ecf20Sopenharmony_ci 488c2ecf20Sopenharmony_cidiscard_max_bytes (RW) 498c2ecf20Sopenharmony_ci---------------------- 508c2ecf20Sopenharmony_ciWhile discard_max_hw_bytes is the hardware limit for the device, this 518c2ecf20Sopenharmony_cisetting is the software limit. Some devices exhibit large latencies when 528c2ecf20Sopenharmony_cilarge discards are issued, setting this value lower will make Linux issue 538c2ecf20Sopenharmony_cismaller discards and potentially help reduce latencies induced by large 548c2ecf20Sopenharmony_cidiscard operations. 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_cidiscard_zeroes_data (RO) 578c2ecf20Sopenharmony_ci------------------------ 588c2ecf20Sopenharmony_ciObsolete. Always zero. 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_cifua (RO) 618c2ecf20Sopenharmony_ci-------- 628c2ecf20Sopenharmony_ciWhether or not the block driver supports the FUA flag for write requests. 638c2ecf20Sopenharmony_ciFUA stands for Force Unit Access. If the FUA flag is set that means that 648c2ecf20Sopenharmony_ciwrite requests must bypass the volatile cache of the storage device. 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_cihw_sector_size (RO) 678c2ecf20Sopenharmony_ci------------------- 688c2ecf20Sopenharmony_ciThis is the hardware sector size of the device, in bytes. 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_ciio_poll (RW) 718c2ecf20Sopenharmony_ci------------ 728c2ecf20Sopenharmony_ciWhen read, this file shows whether polling is enabled (1) or disabled 738c2ecf20Sopenharmony_ci(0). Writing '0' to this file will disable polling for this device. 748c2ecf20Sopenharmony_ciWriting any non-zero value will enable this feature. 758c2ecf20Sopenharmony_ci 768c2ecf20Sopenharmony_ciio_poll_delay (RW) 778c2ecf20Sopenharmony_ci------------------ 788c2ecf20Sopenharmony_ciIf polling is enabled, this controls what kind of polling will be 798c2ecf20Sopenharmony_ciperformed. It defaults to -1, which is classic polling. In this mode, 808c2ecf20Sopenharmony_cithe CPU will repeatedly ask for completions without giving up any time. 818c2ecf20Sopenharmony_ciIf set to 0, a hybrid polling mode is used, where the kernel will attempt 828c2ecf20Sopenharmony_cito make an educated guess at when the IO will complete. Based on this 838c2ecf20Sopenharmony_ciguess, the kernel will put the process issuing IO to sleep for an amount 848c2ecf20Sopenharmony_ciof time, before entering a classic poll loop. This mode might be a 858c2ecf20Sopenharmony_cilittle slower than pure classic polling, but it will be more efficient. 868c2ecf20Sopenharmony_ciIf set to a value larger than 0, the kernel will put the process issuing 878c2ecf20Sopenharmony_ciIO to sleep for this amount of microseconds before entering classic 888c2ecf20Sopenharmony_cipolling. 898c2ecf20Sopenharmony_ci 908c2ecf20Sopenharmony_ciio_timeout (RW) 918c2ecf20Sopenharmony_ci--------------- 928c2ecf20Sopenharmony_ciio_timeout is the request timeout in milliseconds. If a request does not 938c2ecf20Sopenharmony_cicomplete in this time then the block driver timeout handler is invoked. 948c2ecf20Sopenharmony_ciThat timeout handler can decide to retry the request, to fail it or to start 958c2ecf20Sopenharmony_cia device recovery strategy. 968c2ecf20Sopenharmony_ci 978c2ecf20Sopenharmony_ciiostats (RW) 988c2ecf20Sopenharmony_ci------------- 998c2ecf20Sopenharmony_ciThis file is used to control (on/off) the iostats accounting of the 1008c2ecf20Sopenharmony_cidisk. 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_cilogical_block_size (RO) 1038c2ecf20Sopenharmony_ci----------------------- 1048c2ecf20Sopenharmony_ciThis is the logical block size of the device, in bytes. 1058c2ecf20Sopenharmony_ci 1068c2ecf20Sopenharmony_cimax_discard_segments (RO) 1078c2ecf20Sopenharmony_ci------------------------- 1088c2ecf20Sopenharmony_ciThe maximum number of DMA scatter/gather entries in a discard request. 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_cimax_hw_sectors_kb (RO) 1118c2ecf20Sopenharmony_ci---------------------- 1128c2ecf20Sopenharmony_ciThis is the maximum number of kilobytes supported in a single data transfer. 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_cimax_integrity_segments (RO) 1158c2ecf20Sopenharmony_ci--------------------------- 1168c2ecf20Sopenharmony_ciMaximum number of elements in a DMA scatter/gather list with integrity 1178c2ecf20Sopenharmony_cidata that will be submitted by the block layer core to the associated 1188c2ecf20Sopenharmony_ciblock driver. 1198c2ecf20Sopenharmony_ci 1208c2ecf20Sopenharmony_cimax_active_zones (RO) 1218c2ecf20Sopenharmony_ci--------------------- 1228c2ecf20Sopenharmony_ciFor zoned block devices (zoned attribute indicating "host-managed" or 1238c2ecf20Sopenharmony_ci"host-aware"), the sum of zones belonging to any of the zone states: 1248c2ecf20Sopenharmony_ciEXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value. 1258c2ecf20Sopenharmony_ciIf this value is 0, there is no limit. 1268c2ecf20Sopenharmony_ci 1278c2ecf20Sopenharmony_ciIf the host attempts to exceed this limit, the driver should report this error 1288c2ecf20Sopenharmony_ciwith BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW 1298c2ecf20Sopenharmony_cierrno. 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_cimax_open_zones (RO) 1328c2ecf20Sopenharmony_ci------------------- 1338c2ecf20Sopenharmony_ciFor zoned block devices (zoned attribute indicating "host-managed" or 1348c2ecf20Sopenharmony_ci"host-aware"), the sum of zones belonging to any of the zone states: 1358c2ecf20Sopenharmony_ciEXPLICIT OPEN or IMPLICIT OPEN, is limited by this value. 1368c2ecf20Sopenharmony_ciIf this value is 0, there is no limit. 1378c2ecf20Sopenharmony_ci 1388c2ecf20Sopenharmony_ciIf the host attempts to exceed this limit, the driver should report this error 1398c2ecf20Sopenharmony_ciwith BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS 1408c2ecf20Sopenharmony_cierrno. 1418c2ecf20Sopenharmony_ci 1428c2ecf20Sopenharmony_cimax_sectors_kb (RW) 1438c2ecf20Sopenharmony_ci------------------- 1448c2ecf20Sopenharmony_ciThis is the maximum number of kilobytes that the block layer will allow 1458c2ecf20Sopenharmony_cifor a filesystem request. Must be smaller than or equal to the maximum 1468c2ecf20Sopenharmony_cisize allowed by the hardware. 1478c2ecf20Sopenharmony_ci 1488c2ecf20Sopenharmony_cimax_segments (RO) 1498c2ecf20Sopenharmony_ci----------------- 1508c2ecf20Sopenharmony_ciMaximum number of elements in a DMA scatter/gather list that is submitted 1518c2ecf20Sopenharmony_cito the associated block driver. 1528c2ecf20Sopenharmony_ci 1538c2ecf20Sopenharmony_cimax_segment_size (RO) 1548c2ecf20Sopenharmony_ci--------------------- 1558c2ecf20Sopenharmony_ciMaximum size in bytes of a single element in a DMA scatter/gather list. 1568c2ecf20Sopenharmony_ci 1578c2ecf20Sopenharmony_ciminimum_io_size (RO) 1588c2ecf20Sopenharmony_ci-------------------- 1598c2ecf20Sopenharmony_ciThis is the smallest preferred IO size reported by the device. 1608c2ecf20Sopenharmony_ci 1618c2ecf20Sopenharmony_cinomerges (RW) 1628c2ecf20Sopenharmony_ci------------- 1638c2ecf20Sopenharmony_ciThis enables the user to disable the lookup logic involved with IO 1648c2ecf20Sopenharmony_cimerging requests in the block layer. By default (0) all merges are 1658c2ecf20Sopenharmony_cienabled. When set to 1 only simple one-hit merges will be tried. When 1668c2ecf20Sopenharmony_ciset to 2 no merge algorithms will be tried (including one-hit or more 1678c2ecf20Sopenharmony_cicomplex tree/hash lookups). 1688c2ecf20Sopenharmony_ci 1698c2ecf20Sopenharmony_cinr_requests (RW) 1708c2ecf20Sopenharmony_ci---------------- 1718c2ecf20Sopenharmony_ciThis controls how many requests may be allocated in the block layer for 1728c2ecf20Sopenharmony_ciread or write requests. Note that the total allocated number may be twice 1738c2ecf20Sopenharmony_cithis amount, since it applies only to reads or writes (not the accumulated 1748c2ecf20Sopenharmony_cisum). 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ciTo avoid priority inversion through request starvation, a request 1778c2ecf20Sopenharmony_ciqueue maintains a separate request pool per each cgroup when 1788c2ecf20Sopenharmony_ciCONFIG_BLK_CGROUP is enabled, and this parameter applies to each such 1798c2ecf20Sopenharmony_ciper-block-cgroup request pool. IOW, if there are N block cgroups, 1808c2ecf20Sopenharmony_cieach request queue may have up to N request pools, each independently 1818c2ecf20Sopenharmony_ciregulated by nr_requests. 1828c2ecf20Sopenharmony_ci 1838c2ecf20Sopenharmony_cinr_zones (RO) 1848c2ecf20Sopenharmony_ci------------- 1858c2ecf20Sopenharmony_ciFor zoned block devices (zoned attribute indicating "host-managed" or 1868c2ecf20Sopenharmony_ci"host-aware"), this indicates the total number of zones of the device. 1878c2ecf20Sopenharmony_ciThis is always 0 for regular block devices. 1888c2ecf20Sopenharmony_ci 1898c2ecf20Sopenharmony_cioptimal_io_size (RO) 1908c2ecf20Sopenharmony_ci-------------------- 1918c2ecf20Sopenharmony_ciThis is the optimal IO size reported by the device. 1928c2ecf20Sopenharmony_ci 1938c2ecf20Sopenharmony_ciphysical_block_size (RO) 1948c2ecf20Sopenharmony_ci------------------------ 1958c2ecf20Sopenharmony_ciThis is the physical block size of device, in bytes. 1968c2ecf20Sopenharmony_ci 1978c2ecf20Sopenharmony_ciread_ahead_kb (RW) 1988c2ecf20Sopenharmony_ci------------------ 1998c2ecf20Sopenharmony_ciMaximum number of kilobytes to read-ahead for filesystems on this block 2008c2ecf20Sopenharmony_cidevice. 2018c2ecf20Sopenharmony_ci 2028c2ecf20Sopenharmony_cirotational (RW) 2038c2ecf20Sopenharmony_ci--------------- 2048c2ecf20Sopenharmony_ciThis file is used to stat if the device is of rotational type or 2058c2ecf20Sopenharmony_cinon-rotational type. 2068c2ecf20Sopenharmony_ci 2078c2ecf20Sopenharmony_cirq_affinity (RW) 2088c2ecf20Sopenharmony_ci---------------- 2098c2ecf20Sopenharmony_ciIf this option is '1', the block layer will migrate request completions to the 2108c2ecf20Sopenharmony_cicpu "group" that originally submitted the request. For some workloads this 2118c2ecf20Sopenharmony_ciprovides a significant reduction in CPU cycles due to caching effects. 2128c2ecf20Sopenharmony_ci 2138c2ecf20Sopenharmony_ciFor storage configurations that need to maximize distribution of completion 2148c2ecf20Sopenharmony_ciprocessing setting this option to '2' forces the completion to run on the 2158c2ecf20Sopenharmony_cirequesting cpu (bypassing the "group" aggregation logic). 2168c2ecf20Sopenharmony_ci 2178c2ecf20Sopenharmony_cischeduler (RW) 2188c2ecf20Sopenharmony_ci-------------- 2198c2ecf20Sopenharmony_ciWhen read, this file will display the current and available IO schedulers 2208c2ecf20Sopenharmony_cifor this block device. The currently active IO scheduler will be enclosed 2218c2ecf20Sopenharmony_ciin [] brackets. Writing an IO scheduler name to this file will switch 2228c2ecf20Sopenharmony_cicontrol of this block device to that new IO scheduler. Note that writing 2238c2ecf20Sopenharmony_cian IO scheduler name to this file will attempt to load that IO scheduler 2248c2ecf20Sopenharmony_cimodule, if it isn't already present in the system. 2258c2ecf20Sopenharmony_ci 2268c2ecf20Sopenharmony_ciwrite_cache (RW) 2278c2ecf20Sopenharmony_ci---------------- 2288c2ecf20Sopenharmony_ciWhen read, this file will display whether the device has write back 2298c2ecf20Sopenharmony_cicaching enabled or not. It will return "write back" for the former 2308c2ecf20Sopenharmony_cicase, and "write through" for the latter. Writing to this file can 2318c2ecf20Sopenharmony_cichange the kernels view of the device, but it doesn't alter the 2328c2ecf20Sopenharmony_cidevice state. This means that it might not be safe to toggle the 2338c2ecf20Sopenharmony_cisetting from "write back" to "write through", since that will also 2348c2ecf20Sopenharmony_cieliminate cache flushes issued by the kernel. 2358c2ecf20Sopenharmony_ci 2368c2ecf20Sopenharmony_ciwrite_same_max_bytes (RO) 2378c2ecf20Sopenharmony_ci------------------------- 2388c2ecf20Sopenharmony_ciThis is the number of bytes the device can write in a single write-same 2398c2ecf20Sopenharmony_cicommand. A value of '0' means write-same is not supported by this 2408c2ecf20Sopenharmony_cidevice. 2418c2ecf20Sopenharmony_ci 2428c2ecf20Sopenharmony_ciwbt_lat_usec (RW) 2438c2ecf20Sopenharmony_ci----------------- 2448c2ecf20Sopenharmony_ciIf the device is registered for writeback throttling, then this file shows 2458c2ecf20Sopenharmony_cithe target minimum read latency. If this latency is exceeded in a given 2468c2ecf20Sopenharmony_ciwindow of time (see wb_window_usec), then the writeback throttling will start 2478c2ecf20Sopenharmony_ciscaling back writes. Writing a value of '0' to this file disables the 2488c2ecf20Sopenharmony_cifeature. Writing a value of '-1' to this file resets the value to the 2498c2ecf20Sopenharmony_cidefault setting. 2508c2ecf20Sopenharmony_ci 2518c2ecf20Sopenharmony_cithrottle_sample_time (RW) 2528c2ecf20Sopenharmony_ci------------------------- 2538c2ecf20Sopenharmony_ciThis is the time window that blk-throttle samples data, in millisecond. 2548c2ecf20Sopenharmony_ciblk-throttle makes decision based on the samplings. Lower time means cgroups 2558c2ecf20Sopenharmony_cihave more smooth throughput, but higher CPU overhead. This exists only when 2568c2ecf20Sopenharmony_ciCONFIG_BLK_DEV_THROTTLING_LOW is enabled. 2578c2ecf20Sopenharmony_ci 2588c2ecf20Sopenharmony_ciwrite_zeroes_max_bytes (RO) 2598c2ecf20Sopenharmony_ci--------------------------- 2608c2ecf20Sopenharmony_ciFor block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of 2618c2ecf20Sopenharmony_cibytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES 2628c2ecf20Sopenharmony_ciis not supported. 2638c2ecf20Sopenharmony_ci 2648c2ecf20Sopenharmony_cizoned (RO) 2658c2ecf20Sopenharmony_ci---------- 2668c2ecf20Sopenharmony_ciThis indicates if the device is a zoned block device and the zone model of the 2678c2ecf20Sopenharmony_cidevice if it is indeed zoned. The possible values indicated by zoned are 2688c2ecf20Sopenharmony_ci"none" for regular block devices and "host-aware" or "host-managed" for zoned 2698c2ecf20Sopenharmony_ciblock devices. The characteristics of host-aware and host-managed zoned block 2708c2ecf20Sopenharmony_cidevices are described in the ZBC (Zoned Block Commands) and ZAC 2718c2ecf20Sopenharmony_ci(Zoned Device ATA Command Set) standards. These standards also define the 2728c2ecf20Sopenharmony_ci"drive-managed" zone model. However, since drive-managed zoned block devices 2738c2ecf20Sopenharmony_cido not support zone commands, they will be treated as regular block devices 2748c2ecf20Sopenharmony_ciand zoned will report "none". 2758c2ecf20Sopenharmony_ci 2768c2ecf20Sopenharmony_ciJens Axboe <jens.axboe@oracle.com>, February 2009 277