162306a36Sopenharmony_ci/* SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci * 362306a36Sopenharmony_ci * IO cost model based controller. 462306a36Sopenharmony_ci * 562306a36Sopenharmony_ci * Copyright (C) 2019 Tejun Heo <tj@kernel.org> 662306a36Sopenharmony_ci * Copyright (C) 2019 Andy Newell <newella@fb.com> 762306a36Sopenharmony_ci * Copyright (C) 2019 Facebook 862306a36Sopenharmony_ci * 962306a36Sopenharmony_ci * One challenge of controlling IO resources is the lack of trivially 1062306a36Sopenharmony_ci * observable cost metric. This is distinguished from CPU and memory where 1162306a36Sopenharmony_ci * wallclock time and the number of bytes can serve as accurate enough 1262306a36Sopenharmony_ci * approximations. 1362306a36Sopenharmony_ci * 1462306a36Sopenharmony_ci * Bandwidth and iops are the most commonly used metrics for IO devices but 1562306a36Sopenharmony_ci * depending on the type and specifics of the device, different IO patterns 1662306a36Sopenharmony_ci * easily lead to multiple orders of magnitude variations rendering them 1762306a36Sopenharmony_ci * useless for the purpose of IO capacity distribution. While on-device 1862306a36Sopenharmony_ci * time, with a lot of clutches, could serve as a useful approximation for 1962306a36Sopenharmony_ci * non-queued rotational devices, this is no longer viable with modern 2062306a36Sopenharmony_ci * devices, even the rotational ones. 2162306a36Sopenharmony_ci * 2262306a36Sopenharmony_ci * While there is no cost metric we can trivially observe, it isn't a 2362306a36Sopenharmony_ci * complete mystery. For example, on a rotational device, seek cost 2462306a36Sopenharmony_ci * dominates while a contiguous transfer contributes a smaller amount 2562306a36Sopenharmony_ci * proportional to the size. If we can characterize at least the relative 2662306a36Sopenharmony_ci * costs of these different types of IOs, it should be possible to 2762306a36Sopenharmony_ci * implement a reasonable work-conserving proportional IO resource 2862306a36Sopenharmony_ci * distribution. 2962306a36Sopenharmony_ci * 3062306a36Sopenharmony_ci * 1. IO Cost Model 3162306a36Sopenharmony_ci * 3262306a36Sopenharmony_ci * IO cost model estimates the cost of an IO given its basic parameters and 3362306a36Sopenharmony_ci * history (e.g. the end sector of the last IO). The cost is measured in 3462306a36Sopenharmony_ci * device time. If a given IO is estimated to cost 10ms, the device should 3562306a36Sopenharmony_ci * be able to process ~100 of those IOs in a second. 3662306a36Sopenharmony_ci * 3762306a36Sopenharmony_ci * Currently, there's only one builtin cost model - linear. Each IO is 3862306a36Sopenharmony_ci * classified as sequential or random and given a base cost accordingly. 3962306a36Sopenharmony_ci * On top of that, a size cost proportional to the length of the IO is 4062306a36Sopenharmony_ci * added. While simple, this model captures the operational 4162306a36Sopenharmony_ci * characteristics of a wide varienty of devices well enough. Default 4262306a36Sopenharmony_ci * parameters for several different classes of devices are provided and the 4362306a36Sopenharmony_ci * parameters can be configured from userspace via 4462306a36Sopenharmony_ci * /sys/fs/cgroup/io.cost.model. 4562306a36Sopenharmony_ci * 4662306a36Sopenharmony_ci * If needed, tools/cgroup/iocost_coef_gen.py can be used to generate 4762306a36Sopenharmony_ci * device-specific coefficients. 4862306a36Sopenharmony_ci * 4962306a36Sopenharmony_ci * 2. Control Strategy 5062306a36Sopenharmony_ci * 5162306a36Sopenharmony_ci * The device virtual time (vtime) is used as the primary control metric. 5262306a36Sopenharmony_ci * The control strategy is composed of the following three parts. 5362306a36Sopenharmony_ci * 5462306a36Sopenharmony_ci * 2-1. Vtime Distribution 5562306a36Sopenharmony_ci * 5662306a36Sopenharmony_ci * When a cgroup becomes active in terms of IOs, its hierarchical share is 5762306a36Sopenharmony_ci * calculated. Please consider the following hierarchy where the numbers 5862306a36Sopenharmony_ci * inside parentheses denote the configured weights. 5962306a36Sopenharmony_ci * 6062306a36Sopenharmony_ci * root 6162306a36Sopenharmony_ci * / \ 6262306a36Sopenharmony_ci * A (w:100) B (w:300) 6362306a36Sopenharmony_ci * / \ 6462306a36Sopenharmony_ci * A0 (w:100) A1 (w:100) 6562306a36Sopenharmony_ci * 6662306a36Sopenharmony_ci * If B is idle and only A0 and A1 are actively issuing IOs, as the two are 6762306a36Sopenharmony_ci * of equal weight, each gets 50% share. If then B starts issuing IOs, B 6862306a36Sopenharmony_ci * gets 300/(100+300) or 75% share, and A0 and A1 equally splits the rest, 6962306a36Sopenharmony_ci * 12.5% each. The distribution mechanism only cares about these flattened 7062306a36Sopenharmony_ci * shares. They're called hweights (hierarchical weights) and always add 7162306a36Sopenharmony_ci * upto 1 (WEIGHT_ONE). 7262306a36Sopenharmony_ci * 7362306a36Sopenharmony_ci * A given cgroup's vtime runs slower in inverse proportion to its hweight. 7462306a36Sopenharmony_ci * For example, with 12.5% weight, A0's time runs 8 times slower (100/12.5) 7562306a36Sopenharmony_ci * against the device vtime - an IO which takes 10ms on the underlying 7662306a36Sopenharmony_ci * device is considered to take 80ms on A0. 7762306a36Sopenharmony_ci * 7862306a36Sopenharmony_ci * This constitutes the basis of IO capacity distribution. Each cgroup's 7962306a36Sopenharmony_ci * vtime is running at a rate determined by its hweight. A cgroup tracks 8062306a36Sopenharmony_ci * the vtime consumed by past IOs and can issue a new IO if doing so 8162306a36Sopenharmony_ci * wouldn't outrun the current device vtime. Otherwise, the IO is 8262306a36Sopenharmony_ci * suspended until the vtime has progressed enough to cover it. 8362306a36Sopenharmony_ci * 8462306a36Sopenharmony_ci * 2-2. Vrate Adjustment 8562306a36Sopenharmony_ci * 8662306a36Sopenharmony_ci * It's unrealistic to expect the cost model to be perfect. There are too 8762306a36Sopenharmony_ci * many devices and even on the same device the overall performance 8862306a36Sopenharmony_ci * fluctuates depending on numerous factors such as IO mixture and device 8962306a36Sopenharmony_ci * internal garbage collection. The controller needs to adapt dynamically. 9062306a36Sopenharmony_ci * 9162306a36Sopenharmony_ci * This is achieved by adjusting the overall IO rate according to how busy 9262306a36Sopenharmony_ci * the device is. If the device becomes overloaded, we're sending down too 9362306a36Sopenharmony_ci * many IOs and should generally slow down. If there are waiting issuers 9462306a36Sopenharmony_ci * but the device isn't saturated, we're issuing too few and should 9562306a36Sopenharmony_ci * generally speed up. 9662306a36Sopenharmony_ci * 9762306a36Sopenharmony_ci * To slow down, we lower the vrate - the rate at which the device vtime 9862306a36Sopenharmony_ci * passes compared to the wall clock. For example, if the vtime is running 9962306a36Sopenharmony_ci * at the vrate of 75%, all cgroups added up would only be able to issue 10062306a36Sopenharmony_ci * 750ms worth of IOs per second, and vice-versa for speeding up. 10162306a36Sopenharmony_ci * 10262306a36Sopenharmony_ci * Device business is determined using two criteria - rq wait and 10362306a36Sopenharmony_ci * completion latencies. 10462306a36Sopenharmony_ci * 10562306a36Sopenharmony_ci * When a device gets saturated, the on-device and then the request queues 10662306a36Sopenharmony_ci * fill up and a bio which is ready to be issued has to wait for a request 10762306a36Sopenharmony_ci * to become available. When this delay becomes noticeable, it's a clear 10862306a36Sopenharmony_ci * indication that the device is saturated and we lower the vrate. This 10962306a36Sopenharmony_ci * saturation signal is fairly conservative as it only triggers when both 11062306a36Sopenharmony_ci * hardware and software queues are filled up, and is used as the default 11162306a36Sopenharmony_ci * busy signal. 11262306a36Sopenharmony_ci * 11362306a36Sopenharmony_ci * As devices can have deep queues and be unfair in how the queued commands 11462306a36Sopenharmony_ci * are executed, solely depending on rq wait may not result in satisfactory 11562306a36Sopenharmony_ci * control quality. For a better control quality, completion latency QoS 11662306a36Sopenharmony_ci * parameters can be configured so that the device is considered saturated 11762306a36Sopenharmony_ci * if N'th percentile completion latency rises above the set point. 11862306a36Sopenharmony_ci * 11962306a36Sopenharmony_ci * The completion latency requirements are a function of both the 12062306a36Sopenharmony_ci * underlying device characteristics and the desired IO latency quality of 12162306a36Sopenharmony_ci * service. There is an inherent trade-off - the tighter the latency QoS, 12262306a36Sopenharmony_ci * the higher the bandwidth lossage. Latency QoS is disabled by default 12362306a36Sopenharmony_ci * and can be set through /sys/fs/cgroup/io.cost.qos. 12462306a36Sopenharmony_ci * 12562306a36Sopenharmony_ci * 2-3. Work Conservation 12662306a36Sopenharmony_ci * 12762306a36Sopenharmony_ci * Imagine two cgroups A and B with equal weights. A is issuing a small IO 12862306a36Sopenharmony_ci * periodically while B is sending out enough parallel IOs to saturate the 12962306a36Sopenharmony_ci * device on its own. Let's say A's usage amounts to 100ms worth of IO 13062306a36Sopenharmony_ci * cost per second, i.e., 10% of the device capacity. The naive 13162306a36Sopenharmony_ci * distribution of half and half would lead to 60% utilization of the 13262306a36Sopenharmony_ci * device, a significant reduction in the total amount of work done 13362306a36Sopenharmony_ci * compared to free-for-all competition. This is too high a cost to pay 13462306a36Sopenharmony_ci * for IO control. 13562306a36Sopenharmony_ci * 13662306a36Sopenharmony_ci * To conserve the total amount of work done, we keep track of how much 13762306a36Sopenharmony_ci * each active cgroup is actually using and yield part of its weight if 13862306a36Sopenharmony_ci * there are other cgroups which can make use of it. In the above case, 13962306a36Sopenharmony_ci * A's weight will be lowered so that it hovers above the actual usage and 14062306a36Sopenharmony_ci * B would be able to use the rest. 14162306a36Sopenharmony_ci * 14262306a36Sopenharmony_ci * As we don't want to penalize a cgroup for donating its weight, the 14362306a36Sopenharmony_ci * surplus weight adjustment factors in a margin and has an immediate 14462306a36Sopenharmony_ci * snapback mechanism in case the cgroup needs more IO vtime for itself. 14562306a36Sopenharmony_ci * 14662306a36Sopenharmony_ci * Note that adjusting down surplus weights has the same effects as 14762306a36Sopenharmony_ci * accelerating vtime for other cgroups and work conservation can also be 14862306a36Sopenharmony_ci * implemented by adjusting vrate dynamically. However, squaring who can 14962306a36Sopenharmony_ci * donate and should take back how much requires hweight propagations 15062306a36Sopenharmony_ci * anyway making it easier to implement and understand as a separate 15162306a36Sopenharmony_ci * mechanism. 15262306a36Sopenharmony_ci * 15362306a36Sopenharmony_ci * 3. Monitoring 15462306a36Sopenharmony_ci * 15562306a36Sopenharmony_ci * Instead of debugfs or other clumsy monitoring mechanisms, this 15662306a36Sopenharmony_ci * controller uses a drgn based monitoring script - 15762306a36Sopenharmony_ci * tools/cgroup/iocost_monitor.py. For details on drgn, please see 15862306a36Sopenharmony_ci * https://github.com/osandov/drgn. The output looks like the following. 15962306a36Sopenharmony_ci * 16062306a36Sopenharmony_ci * sdb RUN per=300ms cur_per=234.218:v203.695 busy= +1 vrate= 62.12% 16162306a36Sopenharmony_ci * active weight hweight% inflt% dbt delay usages% 16262306a36Sopenharmony_ci * test/a * 50/ 50 33.33/ 33.33 27.65 2 0*041 033:033:033 16362306a36Sopenharmony_ci * test/b * 100/ 100 66.67/ 66.67 17.56 0 0*000 066:079:077 16462306a36Sopenharmony_ci * 16562306a36Sopenharmony_ci * - per : Timer period 16662306a36Sopenharmony_ci * - cur_per : Internal wall and device vtime clock 16762306a36Sopenharmony_ci * - vrate : Device virtual time rate against wall clock 16862306a36Sopenharmony_ci * - weight : Surplus-adjusted and configured weights 16962306a36Sopenharmony_ci * - hweight : Surplus-adjusted and configured hierarchical weights 17062306a36Sopenharmony_ci * - inflt : The percentage of in-flight IO cost at the end of last period 17162306a36Sopenharmony_ci * - del_ms : Deferred issuer delay induction level and duration 17262306a36Sopenharmony_ci * - usages : Usage history 17362306a36Sopenharmony_ci */ 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ci#include <linux/kernel.h> 17662306a36Sopenharmony_ci#include <linux/module.h> 17762306a36Sopenharmony_ci#include <linux/timer.h> 17862306a36Sopenharmony_ci#include <linux/time64.h> 17962306a36Sopenharmony_ci#include <linux/parser.h> 18062306a36Sopenharmony_ci#include <linux/sched/signal.h> 18162306a36Sopenharmony_ci#include <asm/local.h> 18262306a36Sopenharmony_ci#include <asm/local64.h> 18362306a36Sopenharmony_ci#include "blk-rq-qos.h" 18462306a36Sopenharmony_ci#include "blk-stat.h" 18562306a36Sopenharmony_ci#include "blk-wbt.h" 18662306a36Sopenharmony_ci#include "blk-cgroup.h" 18762306a36Sopenharmony_ci 18862306a36Sopenharmony_ci#ifdef CONFIG_TRACEPOINTS 18962306a36Sopenharmony_ci 19062306a36Sopenharmony_ci/* copied from TRACE_CGROUP_PATH, see cgroup-internal.h */ 19162306a36Sopenharmony_ci#define TRACE_IOCG_PATH_LEN 1024 19262306a36Sopenharmony_cistatic DEFINE_SPINLOCK(trace_iocg_path_lock); 19362306a36Sopenharmony_cistatic char trace_iocg_path[TRACE_IOCG_PATH_LEN]; 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ci#define TRACE_IOCG_PATH(type, iocg, ...) \ 19662306a36Sopenharmony_ci do { \ 19762306a36Sopenharmony_ci unsigned long flags; \ 19862306a36Sopenharmony_ci if (trace_iocost_##type##_enabled()) { \ 19962306a36Sopenharmony_ci spin_lock_irqsave(&trace_iocg_path_lock, flags); \ 20062306a36Sopenharmony_ci cgroup_path(iocg_to_blkg(iocg)->blkcg->css.cgroup, \ 20162306a36Sopenharmony_ci trace_iocg_path, TRACE_IOCG_PATH_LEN); \ 20262306a36Sopenharmony_ci trace_iocost_##type(iocg, trace_iocg_path, \ 20362306a36Sopenharmony_ci ##__VA_ARGS__); \ 20462306a36Sopenharmony_ci spin_unlock_irqrestore(&trace_iocg_path_lock, flags); \ 20562306a36Sopenharmony_ci } \ 20662306a36Sopenharmony_ci } while (0) 20762306a36Sopenharmony_ci 20862306a36Sopenharmony_ci#else /* CONFIG_TRACE_POINTS */ 20962306a36Sopenharmony_ci#define TRACE_IOCG_PATH(type, iocg, ...) do { } while (0) 21062306a36Sopenharmony_ci#endif /* CONFIG_TRACE_POINTS */ 21162306a36Sopenharmony_ci 21262306a36Sopenharmony_cienum { 21362306a36Sopenharmony_ci MILLION = 1000000, 21462306a36Sopenharmony_ci 21562306a36Sopenharmony_ci /* timer period is calculated from latency requirements, bound it */ 21662306a36Sopenharmony_ci MIN_PERIOD = USEC_PER_MSEC, 21762306a36Sopenharmony_ci MAX_PERIOD = USEC_PER_SEC, 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ci /* 22062306a36Sopenharmony_ci * iocg->vtime is targeted at 50% behind the device vtime, which 22162306a36Sopenharmony_ci * serves as its IO credit buffer. Surplus weight adjustment is 22262306a36Sopenharmony_ci * immediately canceled if the vtime margin runs below 10%. 22362306a36Sopenharmony_ci */ 22462306a36Sopenharmony_ci MARGIN_MIN_PCT = 10, 22562306a36Sopenharmony_ci MARGIN_LOW_PCT = 20, 22662306a36Sopenharmony_ci MARGIN_TARGET_PCT = 50, 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ci INUSE_ADJ_STEP_PCT = 25, 22962306a36Sopenharmony_ci 23062306a36Sopenharmony_ci /* Have some play in timer operations */ 23162306a36Sopenharmony_ci TIMER_SLACK_PCT = 1, 23262306a36Sopenharmony_ci 23362306a36Sopenharmony_ci /* 1/64k is granular enough and can easily be handled w/ u32 */ 23462306a36Sopenharmony_ci WEIGHT_ONE = 1 << 16, 23562306a36Sopenharmony_ci}; 23662306a36Sopenharmony_ci 23762306a36Sopenharmony_cienum { 23862306a36Sopenharmony_ci /* 23962306a36Sopenharmony_ci * As vtime is used to calculate the cost of each IO, it needs to 24062306a36Sopenharmony_ci * be fairly high precision. For example, it should be able to 24162306a36Sopenharmony_ci * represent the cost of a single page worth of discard with 24262306a36Sopenharmony_ci * suffificient accuracy. At the same time, it should be able to 24362306a36Sopenharmony_ci * represent reasonably long enough durations to be useful and 24462306a36Sopenharmony_ci * convenient during operation. 24562306a36Sopenharmony_ci * 24662306a36Sopenharmony_ci * 1s worth of vtime is 2^37. This gives us both sub-nanosecond 24762306a36Sopenharmony_ci * granularity and days of wrap-around time even at extreme vrates. 24862306a36Sopenharmony_ci */ 24962306a36Sopenharmony_ci VTIME_PER_SEC_SHIFT = 37, 25062306a36Sopenharmony_ci VTIME_PER_SEC = 1LLU << VTIME_PER_SEC_SHIFT, 25162306a36Sopenharmony_ci VTIME_PER_USEC = VTIME_PER_SEC / USEC_PER_SEC, 25262306a36Sopenharmony_ci VTIME_PER_NSEC = VTIME_PER_SEC / NSEC_PER_SEC, 25362306a36Sopenharmony_ci 25462306a36Sopenharmony_ci /* bound vrate adjustments within two orders of magnitude */ 25562306a36Sopenharmony_ci VRATE_MIN_PPM = 10000, /* 1% */ 25662306a36Sopenharmony_ci VRATE_MAX_PPM = 100000000, /* 10000% */ 25762306a36Sopenharmony_ci 25862306a36Sopenharmony_ci VRATE_MIN = VTIME_PER_USEC * VRATE_MIN_PPM / MILLION, 25962306a36Sopenharmony_ci VRATE_CLAMP_ADJ_PCT = 4, 26062306a36Sopenharmony_ci 26162306a36Sopenharmony_ci /* switch iff the conditions are met for longer than this */ 26262306a36Sopenharmony_ci AUTOP_CYCLE_NSEC = 10LLU * NSEC_PER_SEC, 26362306a36Sopenharmony_ci}; 26462306a36Sopenharmony_ci 26562306a36Sopenharmony_cienum { 26662306a36Sopenharmony_ci /* if IOs end up waiting for requests, issue less */ 26762306a36Sopenharmony_ci RQ_WAIT_BUSY_PCT = 5, 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci /* unbusy hysterisis */ 27062306a36Sopenharmony_ci UNBUSY_THR_PCT = 75, 27162306a36Sopenharmony_ci 27262306a36Sopenharmony_ci /* 27362306a36Sopenharmony_ci * The effect of delay is indirect and non-linear and a huge amount of 27462306a36Sopenharmony_ci * future debt can accumulate abruptly while unthrottled. Linearly scale 27562306a36Sopenharmony_ci * up delay as debt is going up and then let it decay exponentially. 27662306a36Sopenharmony_ci * This gives us quick ramp ups while delay is accumulating and long 27762306a36Sopenharmony_ci * tails which can help reducing the frequency of debt explosions on 27862306a36Sopenharmony_ci * unthrottle. The parameters are experimentally determined. 27962306a36Sopenharmony_ci * 28062306a36Sopenharmony_ci * The delay mechanism provides adequate protection and behavior in many 28162306a36Sopenharmony_ci * cases. However, this is far from ideal and falls shorts on both 28262306a36Sopenharmony_ci * fronts. The debtors are often throttled too harshly costing a 28362306a36Sopenharmony_ci * significant level of fairness and possibly total work while the 28462306a36Sopenharmony_ci * protection against their impacts on the system can be choppy and 28562306a36Sopenharmony_ci * unreliable. 28662306a36Sopenharmony_ci * 28762306a36Sopenharmony_ci * The shortcoming primarily stems from the fact that, unlike for page 28862306a36Sopenharmony_ci * cache, the kernel doesn't have well-defined back-pressure propagation 28962306a36Sopenharmony_ci * mechanism and policies for anonymous memory. Fully addressing this 29062306a36Sopenharmony_ci * issue will likely require substantial improvements in the area. 29162306a36Sopenharmony_ci */ 29262306a36Sopenharmony_ci MIN_DELAY_THR_PCT = 500, 29362306a36Sopenharmony_ci MAX_DELAY_THR_PCT = 25000, 29462306a36Sopenharmony_ci MIN_DELAY = 250, 29562306a36Sopenharmony_ci MAX_DELAY = 250 * USEC_PER_MSEC, 29662306a36Sopenharmony_ci 29762306a36Sopenharmony_ci /* halve debts if avg usage over 100ms is under 50% */ 29862306a36Sopenharmony_ci DFGV_USAGE_PCT = 50, 29962306a36Sopenharmony_ci DFGV_PERIOD = 100 * USEC_PER_MSEC, 30062306a36Sopenharmony_ci 30162306a36Sopenharmony_ci /* don't let cmds which take a very long time pin lagging for too long */ 30262306a36Sopenharmony_ci MAX_LAGGING_PERIODS = 10, 30362306a36Sopenharmony_ci 30462306a36Sopenharmony_ci /* 30562306a36Sopenharmony_ci * Count IO size in 4k pages. The 12bit shift helps keeping 30662306a36Sopenharmony_ci * size-proportional components of cost calculation in closer 30762306a36Sopenharmony_ci * numbers of digits to per-IO cost components. 30862306a36Sopenharmony_ci */ 30962306a36Sopenharmony_ci IOC_PAGE_SHIFT = 12, 31062306a36Sopenharmony_ci IOC_PAGE_SIZE = 1 << IOC_PAGE_SHIFT, 31162306a36Sopenharmony_ci IOC_SECT_TO_PAGE_SHIFT = IOC_PAGE_SHIFT - SECTOR_SHIFT, 31262306a36Sopenharmony_ci 31362306a36Sopenharmony_ci /* if apart further than 16M, consider randio for linear model */ 31462306a36Sopenharmony_ci LCOEF_RANDIO_PAGES = 4096, 31562306a36Sopenharmony_ci}; 31662306a36Sopenharmony_ci 31762306a36Sopenharmony_cienum ioc_running { 31862306a36Sopenharmony_ci IOC_IDLE, 31962306a36Sopenharmony_ci IOC_RUNNING, 32062306a36Sopenharmony_ci IOC_STOP, 32162306a36Sopenharmony_ci}; 32262306a36Sopenharmony_ci 32362306a36Sopenharmony_ci/* io.cost.qos controls including per-dev enable of the whole controller */ 32462306a36Sopenharmony_cienum { 32562306a36Sopenharmony_ci QOS_ENABLE, 32662306a36Sopenharmony_ci QOS_CTRL, 32762306a36Sopenharmony_ci NR_QOS_CTRL_PARAMS, 32862306a36Sopenharmony_ci}; 32962306a36Sopenharmony_ci 33062306a36Sopenharmony_ci/* io.cost.qos params */ 33162306a36Sopenharmony_cienum { 33262306a36Sopenharmony_ci QOS_RPPM, 33362306a36Sopenharmony_ci QOS_RLAT, 33462306a36Sopenharmony_ci QOS_WPPM, 33562306a36Sopenharmony_ci QOS_WLAT, 33662306a36Sopenharmony_ci QOS_MIN, 33762306a36Sopenharmony_ci QOS_MAX, 33862306a36Sopenharmony_ci NR_QOS_PARAMS, 33962306a36Sopenharmony_ci}; 34062306a36Sopenharmony_ci 34162306a36Sopenharmony_ci/* io.cost.model controls */ 34262306a36Sopenharmony_cienum { 34362306a36Sopenharmony_ci COST_CTRL, 34462306a36Sopenharmony_ci COST_MODEL, 34562306a36Sopenharmony_ci NR_COST_CTRL_PARAMS, 34662306a36Sopenharmony_ci}; 34762306a36Sopenharmony_ci 34862306a36Sopenharmony_ci/* builtin linear cost model coefficients */ 34962306a36Sopenharmony_cienum { 35062306a36Sopenharmony_ci I_LCOEF_RBPS, 35162306a36Sopenharmony_ci I_LCOEF_RSEQIOPS, 35262306a36Sopenharmony_ci I_LCOEF_RRANDIOPS, 35362306a36Sopenharmony_ci I_LCOEF_WBPS, 35462306a36Sopenharmony_ci I_LCOEF_WSEQIOPS, 35562306a36Sopenharmony_ci I_LCOEF_WRANDIOPS, 35662306a36Sopenharmony_ci NR_I_LCOEFS, 35762306a36Sopenharmony_ci}; 35862306a36Sopenharmony_ci 35962306a36Sopenharmony_cienum { 36062306a36Sopenharmony_ci LCOEF_RPAGE, 36162306a36Sopenharmony_ci LCOEF_RSEQIO, 36262306a36Sopenharmony_ci LCOEF_RRANDIO, 36362306a36Sopenharmony_ci LCOEF_WPAGE, 36462306a36Sopenharmony_ci LCOEF_WSEQIO, 36562306a36Sopenharmony_ci LCOEF_WRANDIO, 36662306a36Sopenharmony_ci NR_LCOEFS, 36762306a36Sopenharmony_ci}; 36862306a36Sopenharmony_ci 36962306a36Sopenharmony_cienum { 37062306a36Sopenharmony_ci AUTOP_INVALID, 37162306a36Sopenharmony_ci AUTOP_HDD, 37262306a36Sopenharmony_ci AUTOP_SSD_QD1, 37362306a36Sopenharmony_ci AUTOP_SSD_DFL, 37462306a36Sopenharmony_ci AUTOP_SSD_FAST, 37562306a36Sopenharmony_ci}; 37662306a36Sopenharmony_ci 37762306a36Sopenharmony_cistruct ioc_params { 37862306a36Sopenharmony_ci u32 qos[NR_QOS_PARAMS]; 37962306a36Sopenharmony_ci u64 i_lcoefs[NR_I_LCOEFS]; 38062306a36Sopenharmony_ci u64 lcoefs[NR_LCOEFS]; 38162306a36Sopenharmony_ci u32 too_fast_vrate_pct; 38262306a36Sopenharmony_ci u32 too_slow_vrate_pct; 38362306a36Sopenharmony_ci}; 38462306a36Sopenharmony_ci 38562306a36Sopenharmony_cistruct ioc_margins { 38662306a36Sopenharmony_ci s64 min; 38762306a36Sopenharmony_ci s64 low; 38862306a36Sopenharmony_ci s64 target; 38962306a36Sopenharmony_ci}; 39062306a36Sopenharmony_ci 39162306a36Sopenharmony_cistruct ioc_missed { 39262306a36Sopenharmony_ci local_t nr_met; 39362306a36Sopenharmony_ci local_t nr_missed; 39462306a36Sopenharmony_ci u32 last_met; 39562306a36Sopenharmony_ci u32 last_missed; 39662306a36Sopenharmony_ci}; 39762306a36Sopenharmony_ci 39862306a36Sopenharmony_cistruct ioc_pcpu_stat { 39962306a36Sopenharmony_ci struct ioc_missed missed[2]; 40062306a36Sopenharmony_ci 40162306a36Sopenharmony_ci local64_t rq_wait_ns; 40262306a36Sopenharmony_ci u64 last_rq_wait_ns; 40362306a36Sopenharmony_ci}; 40462306a36Sopenharmony_ci 40562306a36Sopenharmony_ci/* per device */ 40662306a36Sopenharmony_cistruct ioc { 40762306a36Sopenharmony_ci struct rq_qos rqos; 40862306a36Sopenharmony_ci 40962306a36Sopenharmony_ci bool enabled; 41062306a36Sopenharmony_ci 41162306a36Sopenharmony_ci struct ioc_params params; 41262306a36Sopenharmony_ci struct ioc_margins margins; 41362306a36Sopenharmony_ci u32 period_us; 41462306a36Sopenharmony_ci u32 timer_slack_ns; 41562306a36Sopenharmony_ci u64 vrate_min; 41662306a36Sopenharmony_ci u64 vrate_max; 41762306a36Sopenharmony_ci 41862306a36Sopenharmony_ci spinlock_t lock; 41962306a36Sopenharmony_ci struct timer_list timer; 42062306a36Sopenharmony_ci struct list_head active_iocgs; /* active cgroups */ 42162306a36Sopenharmony_ci struct ioc_pcpu_stat __percpu *pcpu_stat; 42262306a36Sopenharmony_ci 42362306a36Sopenharmony_ci enum ioc_running running; 42462306a36Sopenharmony_ci atomic64_t vtime_rate; 42562306a36Sopenharmony_ci u64 vtime_base_rate; 42662306a36Sopenharmony_ci s64 vtime_err; 42762306a36Sopenharmony_ci 42862306a36Sopenharmony_ci seqcount_spinlock_t period_seqcount; 42962306a36Sopenharmony_ci u64 period_at; /* wallclock starttime */ 43062306a36Sopenharmony_ci u64 period_at_vtime; /* vtime starttime */ 43162306a36Sopenharmony_ci 43262306a36Sopenharmony_ci atomic64_t cur_period; /* inc'd each period */ 43362306a36Sopenharmony_ci int busy_level; /* saturation history */ 43462306a36Sopenharmony_ci 43562306a36Sopenharmony_ci bool weights_updated; 43662306a36Sopenharmony_ci atomic_t hweight_gen; /* for lazy hweights */ 43762306a36Sopenharmony_ci 43862306a36Sopenharmony_ci /* debt forgivness */ 43962306a36Sopenharmony_ci u64 dfgv_period_at; 44062306a36Sopenharmony_ci u64 dfgv_period_rem; 44162306a36Sopenharmony_ci u64 dfgv_usage_us_sum; 44262306a36Sopenharmony_ci 44362306a36Sopenharmony_ci u64 autop_too_fast_at; 44462306a36Sopenharmony_ci u64 autop_too_slow_at; 44562306a36Sopenharmony_ci int autop_idx; 44662306a36Sopenharmony_ci bool user_qos_params:1; 44762306a36Sopenharmony_ci bool user_cost_model:1; 44862306a36Sopenharmony_ci}; 44962306a36Sopenharmony_ci 45062306a36Sopenharmony_cistruct iocg_pcpu_stat { 45162306a36Sopenharmony_ci local64_t abs_vusage; 45262306a36Sopenharmony_ci}; 45362306a36Sopenharmony_ci 45462306a36Sopenharmony_cistruct iocg_stat { 45562306a36Sopenharmony_ci u64 usage_us; 45662306a36Sopenharmony_ci u64 wait_us; 45762306a36Sopenharmony_ci u64 indebt_us; 45862306a36Sopenharmony_ci u64 indelay_us; 45962306a36Sopenharmony_ci}; 46062306a36Sopenharmony_ci 46162306a36Sopenharmony_ci/* per device-cgroup pair */ 46262306a36Sopenharmony_cistruct ioc_gq { 46362306a36Sopenharmony_ci struct blkg_policy_data pd; 46462306a36Sopenharmony_ci struct ioc *ioc; 46562306a36Sopenharmony_ci 46662306a36Sopenharmony_ci /* 46762306a36Sopenharmony_ci * A iocg can get its weight from two sources - an explicit 46862306a36Sopenharmony_ci * per-device-cgroup configuration or the default weight of the 46962306a36Sopenharmony_ci * cgroup. `cfg_weight` is the explicit per-device-cgroup 47062306a36Sopenharmony_ci * configuration. `weight` is the effective considering both 47162306a36Sopenharmony_ci * sources. 47262306a36Sopenharmony_ci * 47362306a36Sopenharmony_ci * When an idle cgroup becomes active its `active` goes from 0 to 47462306a36Sopenharmony_ci * `weight`. `inuse` is the surplus adjusted active weight. 47562306a36Sopenharmony_ci * `active` and `inuse` are used to calculate `hweight_active` and 47662306a36Sopenharmony_ci * `hweight_inuse`. 47762306a36Sopenharmony_ci * 47862306a36Sopenharmony_ci * `last_inuse` remembers `inuse` while an iocg is idle to persist 47962306a36Sopenharmony_ci * surplus adjustments. 48062306a36Sopenharmony_ci * 48162306a36Sopenharmony_ci * `inuse` may be adjusted dynamically during period. `saved_*` are used 48262306a36Sopenharmony_ci * to determine and track adjustments. 48362306a36Sopenharmony_ci */ 48462306a36Sopenharmony_ci u32 cfg_weight; 48562306a36Sopenharmony_ci u32 weight; 48662306a36Sopenharmony_ci u32 active; 48762306a36Sopenharmony_ci u32 inuse; 48862306a36Sopenharmony_ci 48962306a36Sopenharmony_ci u32 last_inuse; 49062306a36Sopenharmony_ci s64 saved_margin; 49162306a36Sopenharmony_ci 49262306a36Sopenharmony_ci sector_t cursor; /* to detect randio */ 49362306a36Sopenharmony_ci 49462306a36Sopenharmony_ci /* 49562306a36Sopenharmony_ci * `vtime` is this iocg's vtime cursor which progresses as IOs are 49662306a36Sopenharmony_ci * issued. If lagging behind device vtime, the delta represents 49762306a36Sopenharmony_ci * the currently available IO budget. If running ahead, the 49862306a36Sopenharmony_ci * overage. 49962306a36Sopenharmony_ci * 50062306a36Sopenharmony_ci * `vtime_done` is the same but progressed on completion rather 50162306a36Sopenharmony_ci * than issue. The delta behind `vtime` represents the cost of 50262306a36Sopenharmony_ci * currently in-flight IOs. 50362306a36Sopenharmony_ci */ 50462306a36Sopenharmony_ci atomic64_t vtime; 50562306a36Sopenharmony_ci atomic64_t done_vtime; 50662306a36Sopenharmony_ci u64 abs_vdebt; 50762306a36Sopenharmony_ci 50862306a36Sopenharmony_ci /* current delay in effect and when it started */ 50962306a36Sopenharmony_ci u64 delay; 51062306a36Sopenharmony_ci u64 delay_at; 51162306a36Sopenharmony_ci 51262306a36Sopenharmony_ci /* 51362306a36Sopenharmony_ci * The period this iocg was last active in. Used for deactivation 51462306a36Sopenharmony_ci * and invalidating `vtime`. 51562306a36Sopenharmony_ci */ 51662306a36Sopenharmony_ci atomic64_t active_period; 51762306a36Sopenharmony_ci struct list_head active_list; 51862306a36Sopenharmony_ci 51962306a36Sopenharmony_ci /* see __propagate_weights() and current_hweight() for details */ 52062306a36Sopenharmony_ci u64 child_active_sum; 52162306a36Sopenharmony_ci u64 child_inuse_sum; 52262306a36Sopenharmony_ci u64 child_adjusted_sum; 52362306a36Sopenharmony_ci int hweight_gen; 52462306a36Sopenharmony_ci u32 hweight_active; 52562306a36Sopenharmony_ci u32 hweight_inuse; 52662306a36Sopenharmony_ci u32 hweight_donating; 52762306a36Sopenharmony_ci u32 hweight_after_donation; 52862306a36Sopenharmony_ci 52962306a36Sopenharmony_ci struct list_head walk_list; 53062306a36Sopenharmony_ci struct list_head surplus_list; 53162306a36Sopenharmony_ci 53262306a36Sopenharmony_ci struct wait_queue_head waitq; 53362306a36Sopenharmony_ci struct hrtimer waitq_timer; 53462306a36Sopenharmony_ci 53562306a36Sopenharmony_ci /* timestamp at the latest activation */ 53662306a36Sopenharmony_ci u64 activated_at; 53762306a36Sopenharmony_ci 53862306a36Sopenharmony_ci /* statistics */ 53962306a36Sopenharmony_ci struct iocg_pcpu_stat __percpu *pcpu_stat; 54062306a36Sopenharmony_ci struct iocg_stat stat; 54162306a36Sopenharmony_ci struct iocg_stat last_stat; 54262306a36Sopenharmony_ci u64 last_stat_abs_vusage; 54362306a36Sopenharmony_ci u64 usage_delta_us; 54462306a36Sopenharmony_ci u64 wait_since; 54562306a36Sopenharmony_ci u64 indebt_since; 54662306a36Sopenharmony_ci u64 indelay_since; 54762306a36Sopenharmony_ci 54862306a36Sopenharmony_ci /* this iocg's depth in the hierarchy and ancestors including self */ 54962306a36Sopenharmony_ci int level; 55062306a36Sopenharmony_ci struct ioc_gq *ancestors[]; 55162306a36Sopenharmony_ci}; 55262306a36Sopenharmony_ci 55362306a36Sopenharmony_ci/* per cgroup */ 55462306a36Sopenharmony_cistruct ioc_cgrp { 55562306a36Sopenharmony_ci struct blkcg_policy_data cpd; 55662306a36Sopenharmony_ci unsigned int dfl_weight; 55762306a36Sopenharmony_ci}; 55862306a36Sopenharmony_ci 55962306a36Sopenharmony_cistruct ioc_now { 56062306a36Sopenharmony_ci u64 now_ns; 56162306a36Sopenharmony_ci u64 now; 56262306a36Sopenharmony_ci u64 vnow; 56362306a36Sopenharmony_ci}; 56462306a36Sopenharmony_ci 56562306a36Sopenharmony_cistruct iocg_wait { 56662306a36Sopenharmony_ci struct wait_queue_entry wait; 56762306a36Sopenharmony_ci struct bio *bio; 56862306a36Sopenharmony_ci u64 abs_cost; 56962306a36Sopenharmony_ci bool committed; 57062306a36Sopenharmony_ci}; 57162306a36Sopenharmony_ci 57262306a36Sopenharmony_cistruct iocg_wake_ctx { 57362306a36Sopenharmony_ci struct ioc_gq *iocg; 57462306a36Sopenharmony_ci u32 hw_inuse; 57562306a36Sopenharmony_ci s64 vbudget; 57662306a36Sopenharmony_ci}; 57762306a36Sopenharmony_ci 57862306a36Sopenharmony_cistatic const struct ioc_params autop[] = { 57962306a36Sopenharmony_ci [AUTOP_HDD] = { 58062306a36Sopenharmony_ci .qos = { 58162306a36Sopenharmony_ci [QOS_RLAT] = 250000, /* 250ms */ 58262306a36Sopenharmony_ci [QOS_WLAT] = 250000, 58362306a36Sopenharmony_ci [QOS_MIN] = VRATE_MIN_PPM, 58462306a36Sopenharmony_ci [QOS_MAX] = VRATE_MAX_PPM, 58562306a36Sopenharmony_ci }, 58662306a36Sopenharmony_ci .i_lcoefs = { 58762306a36Sopenharmony_ci [I_LCOEF_RBPS] = 174019176, 58862306a36Sopenharmony_ci [I_LCOEF_RSEQIOPS] = 41708, 58962306a36Sopenharmony_ci [I_LCOEF_RRANDIOPS] = 370, 59062306a36Sopenharmony_ci [I_LCOEF_WBPS] = 178075866, 59162306a36Sopenharmony_ci [I_LCOEF_WSEQIOPS] = 42705, 59262306a36Sopenharmony_ci [I_LCOEF_WRANDIOPS] = 378, 59362306a36Sopenharmony_ci }, 59462306a36Sopenharmony_ci }, 59562306a36Sopenharmony_ci [AUTOP_SSD_QD1] = { 59662306a36Sopenharmony_ci .qos = { 59762306a36Sopenharmony_ci [QOS_RLAT] = 25000, /* 25ms */ 59862306a36Sopenharmony_ci [QOS_WLAT] = 25000, 59962306a36Sopenharmony_ci [QOS_MIN] = VRATE_MIN_PPM, 60062306a36Sopenharmony_ci [QOS_MAX] = VRATE_MAX_PPM, 60162306a36Sopenharmony_ci }, 60262306a36Sopenharmony_ci .i_lcoefs = { 60362306a36Sopenharmony_ci [I_LCOEF_RBPS] = 245855193, 60462306a36Sopenharmony_ci [I_LCOEF_RSEQIOPS] = 61575, 60562306a36Sopenharmony_ci [I_LCOEF_RRANDIOPS] = 6946, 60662306a36Sopenharmony_ci [I_LCOEF_WBPS] = 141365009, 60762306a36Sopenharmony_ci [I_LCOEF_WSEQIOPS] = 33716, 60862306a36Sopenharmony_ci [I_LCOEF_WRANDIOPS] = 26796, 60962306a36Sopenharmony_ci }, 61062306a36Sopenharmony_ci }, 61162306a36Sopenharmony_ci [AUTOP_SSD_DFL] = { 61262306a36Sopenharmony_ci .qos = { 61362306a36Sopenharmony_ci [QOS_RLAT] = 25000, /* 25ms */ 61462306a36Sopenharmony_ci [QOS_WLAT] = 25000, 61562306a36Sopenharmony_ci [QOS_MIN] = VRATE_MIN_PPM, 61662306a36Sopenharmony_ci [QOS_MAX] = VRATE_MAX_PPM, 61762306a36Sopenharmony_ci }, 61862306a36Sopenharmony_ci .i_lcoefs = { 61962306a36Sopenharmony_ci [I_LCOEF_RBPS] = 488636629, 62062306a36Sopenharmony_ci [I_LCOEF_RSEQIOPS] = 8932, 62162306a36Sopenharmony_ci [I_LCOEF_RRANDIOPS] = 8518, 62262306a36Sopenharmony_ci [I_LCOEF_WBPS] = 427891549, 62362306a36Sopenharmony_ci [I_LCOEF_WSEQIOPS] = 28755, 62462306a36Sopenharmony_ci [I_LCOEF_WRANDIOPS] = 21940, 62562306a36Sopenharmony_ci }, 62662306a36Sopenharmony_ci .too_fast_vrate_pct = 500, 62762306a36Sopenharmony_ci }, 62862306a36Sopenharmony_ci [AUTOP_SSD_FAST] = { 62962306a36Sopenharmony_ci .qos = { 63062306a36Sopenharmony_ci [QOS_RLAT] = 5000, /* 5ms */ 63162306a36Sopenharmony_ci [QOS_WLAT] = 5000, 63262306a36Sopenharmony_ci [QOS_MIN] = VRATE_MIN_PPM, 63362306a36Sopenharmony_ci [QOS_MAX] = VRATE_MAX_PPM, 63462306a36Sopenharmony_ci }, 63562306a36Sopenharmony_ci .i_lcoefs = { 63662306a36Sopenharmony_ci [I_LCOEF_RBPS] = 3102524156LLU, 63762306a36Sopenharmony_ci [I_LCOEF_RSEQIOPS] = 724816, 63862306a36Sopenharmony_ci [I_LCOEF_RRANDIOPS] = 778122, 63962306a36Sopenharmony_ci [I_LCOEF_WBPS] = 1742780862LLU, 64062306a36Sopenharmony_ci [I_LCOEF_WSEQIOPS] = 425702, 64162306a36Sopenharmony_ci [I_LCOEF_WRANDIOPS] = 443193, 64262306a36Sopenharmony_ci }, 64362306a36Sopenharmony_ci .too_slow_vrate_pct = 10, 64462306a36Sopenharmony_ci }, 64562306a36Sopenharmony_ci}; 64662306a36Sopenharmony_ci 64762306a36Sopenharmony_ci/* 64862306a36Sopenharmony_ci * vrate adjust percentages indexed by ioc->busy_level. We adjust up on 64962306a36Sopenharmony_ci * vtime credit shortage and down on device saturation. 65062306a36Sopenharmony_ci */ 65162306a36Sopenharmony_cistatic u32 vrate_adj_pct[] = 65262306a36Sopenharmony_ci { 0, 0, 0, 0, 65362306a36Sopenharmony_ci 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 65462306a36Sopenharmony_ci 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 65562306a36Sopenharmony_ci 4, 4, 4, 4, 4, 4, 4, 4, 8, 8, 8, 8, 8, 8, 8, 8, 16 }; 65662306a36Sopenharmony_ci 65762306a36Sopenharmony_cistatic struct blkcg_policy blkcg_policy_iocost; 65862306a36Sopenharmony_ci 65962306a36Sopenharmony_ci/* accessors and helpers */ 66062306a36Sopenharmony_cistatic struct ioc *rqos_to_ioc(struct rq_qos *rqos) 66162306a36Sopenharmony_ci{ 66262306a36Sopenharmony_ci return container_of(rqos, struct ioc, rqos); 66362306a36Sopenharmony_ci} 66462306a36Sopenharmony_ci 66562306a36Sopenharmony_cistatic struct ioc *q_to_ioc(struct request_queue *q) 66662306a36Sopenharmony_ci{ 66762306a36Sopenharmony_ci return rqos_to_ioc(rq_qos_id(q, RQ_QOS_COST)); 66862306a36Sopenharmony_ci} 66962306a36Sopenharmony_ci 67062306a36Sopenharmony_cistatic const char __maybe_unused *ioc_name(struct ioc *ioc) 67162306a36Sopenharmony_ci{ 67262306a36Sopenharmony_ci struct gendisk *disk = ioc->rqos.disk; 67362306a36Sopenharmony_ci 67462306a36Sopenharmony_ci if (!disk) 67562306a36Sopenharmony_ci return "<unknown>"; 67662306a36Sopenharmony_ci return disk->disk_name; 67762306a36Sopenharmony_ci} 67862306a36Sopenharmony_ci 67962306a36Sopenharmony_cistatic struct ioc_gq *pd_to_iocg(struct blkg_policy_data *pd) 68062306a36Sopenharmony_ci{ 68162306a36Sopenharmony_ci return pd ? container_of(pd, struct ioc_gq, pd) : NULL; 68262306a36Sopenharmony_ci} 68362306a36Sopenharmony_ci 68462306a36Sopenharmony_cistatic struct ioc_gq *blkg_to_iocg(struct blkcg_gq *blkg) 68562306a36Sopenharmony_ci{ 68662306a36Sopenharmony_ci return pd_to_iocg(blkg_to_pd(blkg, &blkcg_policy_iocost)); 68762306a36Sopenharmony_ci} 68862306a36Sopenharmony_ci 68962306a36Sopenharmony_cistatic struct blkcg_gq *iocg_to_blkg(struct ioc_gq *iocg) 69062306a36Sopenharmony_ci{ 69162306a36Sopenharmony_ci return pd_to_blkg(&iocg->pd); 69262306a36Sopenharmony_ci} 69362306a36Sopenharmony_ci 69462306a36Sopenharmony_cistatic struct ioc_cgrp *blkcg_to_iocc(struct blkcg *blkcg) 69562306a36Sopenharmony_ci{ 69662306a36Sopenharmony_ci return container_of(blkcg_to_cpd(blkcg, &blkcg_policy_iocost), 69762306a36Sopenharmony_ci struct ioc_cgrp, cpd); 69862306a36Sopenharmony_ci} 69962306a36Sopenharmony_ci 70062306a36Sopenharmony_ci/* 70162306a36Sopenharmony_ci * Scale @abs_cost to the inverse of @hw_inuse. The lower the hierarchical 70262306a36Sopenharmony_ci * weight, the more expensive each IO. Must round up. 70362306a36Sopenharmony_ci */ 70462306a36Sopenharmony_cistatic u64 abs_cost_to_cost(u64 abs_cost, u32 hw_inuse) 70562306a36Sopenharmony_ci{ 70662306a36Sopenharmony_ci return DIV64_U64_ROUND_UP(abs_cost * WEIGHT_ONE, hw_inuse); 70762306a36Sopenharmony_ci} 70862306a36Sopenharmony_ci 70962306a36Sopenharmony_ci/* 71062306a36Sopenharmony_ci * The inverse of abs_cost_to_cost(). Must round up. 71162306a36Sopenharmony_ci */ 71262306a36Sopenharmony_cistatic u64 cost_to_abs_cost(u64 cost, u32 hw_inuse) 71362306a36Sopenharmony_ci{ 71462306a36Sopenharmony_ci return DIV64_U64_ROUND_UP(cost * hw_inuse, WEIGHT_ONE); 71562306a36Sopenharmony_ci} 71662306a36Sopenharmony_ci 71762306a36Sopenharmony_cistatic void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, 71862306a36Sopenharmony_ci u64 abs_cost, u64 cost) 71962306a36Sopenharmony_ci{ 72062306a36Sopenharmony_ci struct iocg_pcpu_stat *gcs; 72162306a36Sopenharmony_ci 72262306a36Sopenharmony_ci bio->bi_iocost_cost = cost; 72362306a36Sopenharmony_ci atomic64_add(cost, &iocg->vtime); 72462306a36Sopenharmony_ci 72562306a36Sopenharmony_ci gcs = get_cpu_ptr(iocg->pcpu_stat); 72662306a36Sopenharmony_ci local64_add(abs_cost, &gcs->abs_vusage); 72762306a36Sopenharmony_ci put_cpu_ptr(gcs); 72862306a36Sopenharmony_ci} 72962306a36Sopenharmony_ci 73062306a36Sopenharmony_cistatic void iocg_lock(struct ioc_gq *iocg, bool lock_ioc, unsigned long *flags) 73162306a36Sopenharmony_ci{ 73262306a36Sopenharmony_ci if (lock_ioc) { 73362306a36Sopenharmony_ci spin_lock_irqsave(&iocg->ioc->lock, *flags); 73462306a36Sopenharmony_ci spin_lock(&iocg->waitq.lock); 73562306a36Sopenharmony_ci } else { 73662306a36Sopenharmony_ci spin_lock_irqsave(&iocg->waitq.lock, *flags); 73762306a36Sopenharmony_ci } 73862306a36Sopenharmony_ci} 73962306a36Sopenharmony_ci 74062306a36Sopenharmony_cistatic void iocg_unlock(struct ioc_gq *iocg, bool unlock_ioc, unsigned long *flags) 74162306a36Sopenharmony_ci{ 74262306a36Sopenharmony_ci if (unlock_ioc) { 74362306a36Sopenharmony_ci spin_unlock(&iocg->waitq.lock); 74462306a36Sopenharmony_ci spin_unlock_irqrestore(&iocg->ioc->lock, *flags); 74562306a36Sopenharmony_ci } else { 74662306a36Sopenharmony_ci spin_unlock_irqrestore(&iocg->waitq.lock, *flags); 74762306a36Sopenharmony_ci } 74862306a36Sopenharmony_ci} 74962306a36Sopenharmony_ci 75062306a36Sopenharmony_ci#define CREATE_TRACE_POINTS 75162306a36Sopenharmony_ci#include <trace/events/iocost.h> 75262306a36Sopenharmony_ci 75362306a36Sopenharmony_cistatic void ioc_refresh_margins(struct ioc *ioc) 75462306a36Sopenharmony_ci{ 75562306a36Sopenharmony_ci struct ioc_margins *margins = &ioc->margins; 75662306a36Sopenharmony_ci u32 period_us = ioc->period_us; 75762306a36Sopenharmony_ci u64 vrate = ioc->vtime_base_rate; 75862306a36Sopenharmony_ci 75962306a36Sopenharmony_ci margins->min = (period_us * MARGIN_MIN_PCT / 100) * vrate; 76062306a36Sopenharmony_ci margins->low = (period_us * MARGIN_LOW_PCT / 100) * vrate; 76162306a36Sopenharmony_ci margins->target = (period_us * MARGIN_TARGET_PCT / 100) * vrate; 76262306a36Sopenharmony_ci} 76362306a36Sopenharmony_ci 76462306a36Sopenharmony_ci/* latency Qos params changed, update period_us and all the dependent params */ 76562306a36Sopenharmony_cistatic void ioc_refresh_period_us(struct ioc *ioc) 76662306a36Sopenharmony_ci{ 76762306a36Sopenharmony_ci u32 ppm, lat, multi, period_us; 76862306a36Sopenharmony_ci 76962306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 77062306a36Sopenharmony_ci 77162306a36Sopenharmony_ci /* pick the higher latency target */ 77262306a36Sopenharmony_ci if (ioc->params.qos[QOS_RLAT] >= ioc->params.qos[QOS_WLAT]) { 77362306a36Sopenharmony_ci ppm = ioc->params.qos[QOS_RPPM]; 77462306a36Sopenharmony_ci lat = ioc->params.qos[QOS_RLAT]; 77562306a36Sopenharmony_ci } else { 77662306a36Sopenharmony_ci ppm = ioc->params.qos[QOS_WPPM]; 77762306a36Sopenharmony_ci lat = ioc->params.qos[QOS_WLAT]; 77862306a36Sopenharmony_ci } 77962306a36Sopenharmony_ci 78062306a36Sopenharmony_ci /* 78162306a36Sopenharmony_ci * We want the period to be long enough to contain a healthy number 78262306a36Sopenharmony_ci * of IOs while short enough for granular control. Define it as a 78362306a36Sopenharmony_ci * multiple of the latency target. Ideally, the multiplier should 78462306a36Sopenharmony_ci * be scaled according to the percentile so that it would nominally 78562306a36Sopenharmony_ci * contain a certain number of requests. Let's be simpler and 78662306a36Sopenharmony_ci * scale it linearly so that it's 2x >= pct(90) and 10x at pct(50). 78762306a36Sopenharmony_ci */ 78862306a36Sopenharmony_ci if (ppm) 78962306a36Sopenharmony_ci multi = max_t(u32, (MILLION - ppm) / 50000, 2); 79062306a36Sopenharmony_ci else 79162306a36Sopenharmony_ci multi = 2; 79262306a36Sopenharmony_ci period_us = multi * lat; 79362306a36Sopenharmony_ci period_us = clamp_t(u32, period_us, MIN_PERIOD, MAX_PERIOD); 79462306a36Sopenharmony_ci 79562306a36Sopenharmony_ci /* calculate dependent params */ 79662306a36Sopenharmony_ci ioc->period_us = period_us; 79762306a36Sopenharmony_ci ioc->timer_slack_ns = div64_u64( 79862306a36Sopenharmony_ci (u64)period_us * NSEC_PER_USEC * TIMER_SLACK_PCT, 79962306a36Sopenharmony_ci 100); 80062306a36Sopenharmony_ci ioc_refresh_margins(ioc); 80162306a36Sopenharmony_ci} 80262306a36Sopenharmony_ci 80362306a36Sopenharmony_ci/* 80462306a36Sopenharmony_ci * ioc->rqos.disk isn't initialized when this function is called from 80562306a36Sopenharmony_ci * the init path. 80662306a36Sopenharmony_ci */ 80762306a36Sopenharmony_cistatic int ioc_autop_idx(struct ioc *ioc, struct gendisk *disk) 80862306a36Sopenharmony_ci{ 80962306a36Sopenharmony_ci int idx = ioc->autop_idx; 81062306a36Sopenharmony_ci const struct ioc_params *p = &autop[idx]; 81162306a36Sopenharmony_ci u32 vrate_pct; 81262306a36Sopenharmony_ci u64 now_ns; 81362306a36Sopenharmony_ci 81462306a36Sopenharmony_ci /* rotational? */ 81562306a36Sopenharmony_ci if (!blk_queue_nonrot(disk->queue)) 81662306a36Sopenharmony_ci return AUTOP_HDD; 81762306a36Sopenharmony_ci 81862306a36Sopenharmony_ci /* handle SATA SSDs w/ broken NCQ */ 81962306a36Sopenharmony_ci if (blk_queue_depth(disk->queue) == 1) 82062306a36Sopenharmony_ci return AUTOP_SSD_QD1; 82162306a36Sopenharmony_ci 82262306a36Sopenharmony_ci /* use one of the normal ssd sets */ 82362306a36Sopenharmony_ci if (idx < AUTOP_SSD_DFL) 82462306a36Sopenharmony_ci return AUTOP_SSD_DFL; 82562306a36Sopenharmony_ci 82662306a36Sopenharmony_ci /* if user is overriding anything, maintain what was there */ 82762306a36Sopenharmony_ci if (ioc->user_qos_params || ioc->user_cost_model) 82862306a36Sopenharmony_ci return idx; 82962306a36Sopenharmony_ci 83062306a36Sopenharmony_ci /* step up/down based on the vrate */ 83162306a36Sopenharmony_ci vrate_pct = div64_u64(ioc->vtime_base_rate * 100, VTIME_PER_USEC); 83262306a36Sopenharmony_ci now_ns = ktime_get_ns(); 83362306a36Sopenharmony_ci 83462306a36Sopenharmony_ci if (p->too_fast_vrate_pct && p->too_fast_vrate_pct <= vrate_pct) { 83562306a36Sopenharmony_ci if (!ioc->autop_too_fast_at) 83662306a36Sopenharmony_ci ioc->autop_too_fast_at = now_ns; 83762306a36Sopenharmony_ci if (now_ns - ioc->autop_too_fast_at >= AUTOP_CYCLE_NSEC) 83862306a36Sopenharmony_ci return idx + 1; 83962306a36Sopenharmony_ci } else { 84062306a36Sopenharmony_ci ioc->autop_too_fast_at = 0; 84162306a36Sopenharmony_ci } 84262306a36Sopenharmony_ci 84362306a36Sopenharmony_ci if (p->too_slow_vrate_pct && p->too_slow_vrate_pct >= vrate_pct) { 84462306a36Sopenharmony_ci if (!ioc->autop_too_slow_at) 84562306a36Sopenharmony_ci ioc->autop_too_slow_at = now_ns; 84662306a36Sopenharmony_ci if (now_ns - ioc->autop_too_slow_at >= AUTOP_CYCLE_NSEC) 84762306a36Sopenharmony_ci return idx - 1; 84862306a36Sopenharmony_ci } else { 84962306a36Sopenharmony_ci ioc->autop_too_slow_at = 0; 85062306a36Sopenharmony_ci } 85162306a36Sopenharmony_ci 85262306a36Sopenharmony_ci return idx; 85362306a36Sopenharmony_ci} 85462306a36Sopenharmony_ci 85562306a36Sopenharmony_ci/* 85662306a36Sopenharmony_ci * Take the followings as input 85762306a36Sopenharmony_ci * 85862306a36Sopenharmony_ci * @bps maximum sequential throughput 85962306a36Sopenharmony_ci * @seqiops maximum sequential 4k iops 86062306a36Sopenharmony_ci * @randiops maximum random 4k iops 86162306a36Sopenharmony_ci * 86262306a36Sopenharmony_ci * and calculate the linear model cost coefficients. 86362306a36Sopenharmony_ci * 86462306a36Sopenharmony_ci * *@page per-page cost 1s / (@bps / 4096) 86562306a36Sopenharmony_ci * *@seqio base cost of a seq IO max((1s / @seqiops) - *@page, 0) 86662306a36Sopenharmony_ci * @randiops base cost of a rand IO max((1s / @randiops) - *@page, 0) 86762306a36Sopenharmony_ci */ 86862306a36Sopenharmony_cistatic void calc_lcoefs(u64 bps, u64 seqiops, u64 randiops, 86962306a36Sopenharmony_ci u64 *page, u64 *seqio, u64 *randio) 87062306a36Sopenharmony_ci{ 87162306a36Sopenharmony_ci u64 v; 87262306a36Sopenharmony_ci 87362306a36Sopenharmony_ci *page = *seqio = *randio = 0; 87462306a36Sopenharmony_ci 87562306a36Sopenharmony_ci if (bps) { 87662306a36Sopenharmony_ci u64 bps_pages = DIV_ROUND_UP_ULL(bps, IOC_PAGE_SIZE); 87762306a36Sopenharmony_ci 87862306a36Sopenharmony_ci if (bps_pages) 87962306a36Sopenharmony_ci *page = DIV64_U64_ROUND_UP(VTIME_PER_SEC, bps_pages); 88062306a36Sopenharmony_ci else 88162306a36Sopenharmony_ci *page = 1; 88262306a36Sopenharmony_ci } 88362306a36Sopenharmony_ci 88462306a36Sopenharmony_ci if (seqiops) { 88562306a36Sopenharmony_ci v = DIV64_U64_ROUND_UP(VTIME_PER_SEC, seqiops); 88662306a36Sopenharmony_ci if (v > *page) 88762306a36Sopenharmony_ci *seqio = v - *page; 88862306a36Sopenharmony_ci } 88962306a36Sopenharmony_ci 89062306a36Sopenharmony_ci if (randiops) { 89162306a36Sopenharmony_ci v = DIV64_U64_ROUND_UP(VTIME_PER_SEC, randiops); 89262306a36Sopenharmony_ci if (v > *page) 89362306a36Sopenharmony_ci *randio = v - *page; 89462306a36Sopenharmony_ci } 89562306a36Sopenharmony_ci} 89662306a36Sopenharmony_ci 89762306a36Sopenharmony_cistatic void ioc_refresh_lcoefs(struct ioc *ioc) 89862306a36Sopenharmony_ci{ 89962306a36Sopenharmony_ci u64 *u = ioc->params.i_lcoefs; 90062306a36Sopenharmony_ci u64 *c = ioc->params.lcoefs; 90162306a36Sopenharmony_ci 90262306a36Sopenharmony_ci calc_lcoefs(u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS], 90362306a36Sopenharmony_ci &c[LCOEF_RPAGE], &c[LCOEF_RSEQIO], &c[LCOEF_RRANDIO]); 90462306a36Sopenharmony_ci calc_lcoefs(u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS], 90562306a36Sopenharmony_ci &c[LCOEF_WPAGE], &c[LCOEF_WSEQIO], &c[LCOEF_WRANDIO]); 90662306a36Sopenharmony_ci} 90762306a36Sopenharmony_ci 90862306a36Sopenharmony_ci/* 90962306a36Sopenharmony_ci * struct gendisk is required as an argument because ioc->rqos.disk 91062306a36Sopenharmony_ci * is not properly initialized when called from the init path. 91162306a36Sopenharmony_ci */ 91262306a36Sopenharmony_cistatic bool ioc_refresh_params_disk(struct ioc *ioc, bool force, 91362306a36Sopenharmony_ci struct gendisk *disk) 91462306a36Sopenharmony_ci{ 91562306a36Sopenharmony_ci const struct ioc_params *p; 91662306a36Sopenharmony_ci int idx; 91762306a36Sopenharmony_ci 91862306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 91962306a36Sopenharmony_ci 92062306a36Sopenharmony_ci idx = ioc_autop_idx(ioc, disk); 92162306a36Sopenharmony_ci p = &autop[idx]; 92262306a36Sopenharmony_ci 92362306a36Sopenharmony_ci if (idx == ioc->autop_idx && !force) 92462306a36Sopenharmony_ci return false; 92562306a36Sopenharmony_ci 92662306a36Sopenharmony_ci if (idx != ioc->autop_idx) { 92762306a36Sopenharmony_ci atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC); 92862306a36Sopenharmony_ci ioc->vtime_base_rate = VTIME_PER_USEC; 92962306a36Sopenharmony_ci } 93062306a36Sopenharmony_ci 93162306a36Sopenharmony_ci ioc->autop_idx = idx; 93262306a36Sopenharmony_ci ioc->autop_too_fast_at = 0; 93362306a36Sopenharmony_ci ioc->autop_too_slow_at = 0; 93462306a36Sopenharmony_ci 93562306a36Sopenharmony_ci if (!ioc->user_qos_params) 93662306a36Sopenharmony_ci memcpy(ioc->params.qos, p->qos, sizeof(p->qos)); 93762306a36Sopenharmony_ci if (!ioc->user_cost_model) 93862306a36Sopenharmony_ci memcpy(ioc->params.i_lcoefs, p->i_lcoefs, sizeof(p->i_lcoefs)); 93962306a36Sopenharmony_ci 94062306a36Sopenharmony_ci ioc_refresh_period_us(ioc); 94162306a36Sopenharmony_ci ioc_refresh_lcoefs(ioc); 94262306a36Sopenharmony_ci 94362306a36Sopenharmony_ci ioc->vrate_min = DIV64_U64_ROUND_UP((u64)ioc->params.qos[QOS_MIN] * 94462306a36Sopenharmony_ci VTIME_PER_USEC, MILLION); 94562306a36Sopenharmony_ci ioc->vrate_max = DIV64_U64_ROUND_UP((u64)ioc->params.qos[QOS_MAX] * 94662306a36Sopenharmony_ci VTIME_PER_USEC, MILLION); 94762306a36Sopenharmony_ci 94862306a36Sopenharmony_ci return true; 94962306a36Sopenharmony_ci} 95062306a36Sopenharmony_ci 95162306a36Sopenharmony_cistatic bool ioc_refresh_params(struct ioc *ioc, bool force) 95262306a36Sopenharmony_ci{ 95362306a36Sopenharmony_ci return ioc_refresh_params_disk(ioc, force, ioc->rqos.disk); 95462306a36Sopenharmony_ci} 95562306a36Sopenharmony_ci 95662306a36Sopenharmony_ci/* 95762306a36Sopenharmony_ci * When an iocg accumulates too much vtime or gets deactivated, we throw away 95862306a36Sopenharmony_ci * some vtime, which lowers the overall device utilization. As the exact amount 95962306a36Sopenharmony_ci * which is being thrown away is known, we can compensate by accelerating the 96062306a36Sopenharmony_ci * vrate accordingly so that the extra vtime generated in the current period 96162306a36Sopenharmony_ci * matches what got lost. 96262306a36Sopenharmony_ci */ 96362306a36Sopenharmony_cistatic void ioc_refresh_vrate(struct ioc *ioc, struct ioc_now *now) 96462306a36Sopenharmony_ci{ 96562306a36Sopenharmony_ci s64 pleft = ioc->period_at + ioc->period_us - now->now; 96662306a36Sopenharmony_ci s64 vperiod = ioc->period_us * ioc->vtime_base_rate; 96762306a36Sopenharmony_ci s64 vcomp, vcomp_min, vcomp_max; 96862306a36Sopenharmony_ci 96962306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 97062306a36Sopenharmony_ci 97162306a36Sopenharmony_ci /* we need some time left in this period */ 97262306a36Sopenharmony_ci if (pleft <= 0) 97362306a36Sopenharmony_ci goto done; 97462306a36Sopenharmony_ci 97562306a36Sopenharmony_ci /* 97662306a36Sopenharmony_ci * Calculate how much vrate should be adjusted to offset the error. 97762306a36Sopenharmony_ci * Limit the amount of adjustment and deduct the adjusted amount from 97862306a36Sopenharmony_ci * the error. 97962306a36Sopenharmony_ci */ 98062306a36Sopenharmony_ci vcomp = -div64_s64(ioc->vtime_err, pleft); 98162306a36Sopenharmony_ci vcomp_min = -(ioc->vtime_base_rate >> 1); 98262306a36Sopenharmony_ci vcomp_max = ioc->vtime_base_rate; 98362306a36Sopenharmony_ci vcomp = clamp(vcomp, vcomp_min, vcomp_max); 98462306a36Sopenharmony_ci 98562306a36Sopenharmony_ci ioc->vtime_err += vcomp * pleft; 98662306a36Sopenharmony_ci 98762306a36Sopenharmony_ci atomic64_set(&ioc->vtime_rate, ioc->vtime_base_rate + vcomp); 98862306a36Sopenharmony_cidone: 98962306a36Sopenharmony_ci /* bound how much error can accumulate */ 99062306a36Sopenharmony_ci ioc->vtime_err = clamp(ioc->vtime_err, -vperiod, vperiod); 99162306a36Sopenharmony_ci} 99262306a36Sopenharmony_ci 99362306a36Sopenharmony_cistatic void ioc_adjust_base_vrate(struct ioc *ioc, u32 rq_wait_pct, 99462306a36Sopenharmony_ci int nr_lagging, int nr_shortages, 99562306a36Sopenharmony_ci int prev_busy_level, u32 *missed_ppm) 99662306a36Sopenharmony_ci{ 99762306a36Sopenharmony_ci u64 vrate = ioc->vtime_base_rate; 99862306a36Sopenharmony_ci u64 vrate_min = ioc->vrate_min, vrate_max = ioc->vrate_max; 99962306a36Sopenharmony_ci 100062306a36Sopenharmony_ci if (!ioc->busy_level || (ioc->busy_level < 0 && nr_lagging)) { 100162306a36Sopenharmony_ci if (ioc->busy_level != prev_busy_level || nr_lagging) 100262306a36Sopenharmony_ci trace_iocost_ioc_vrate_adj(ioc, vrate, 100362306a36Sopenharmony_ci missed_ppm, rq_wait_pct, 100462306a36Sopenharmony_ci nr_lagging, nr_shortages); 100562306a36Sopenharmony_ci 100662306a36Sopenharmony_ci return; 100762306a36Sopenharmony_ci } 100862306a36Sopenharmony_ci 100962306a36Sopenharmony_ci /* 101062306a36Sopenharmony_ci * If vrate is out of bounds, apply clamp gradually as the 101162306a36Sopenharmony_ci * bounds can change abruptly. Otherwise, apply busy_level 101262306a36Sopenharmony_ci * based adjustment. 101362306a36Sopenharmony_ci */ 101462306a36Sopenharmony_ci if (vrate < vrate_min) { 101562306a36Sopenharmony_ci vrate = div64_u64(vrate * (100 + VRATE_CLAMP_ADJ_PCT), 100); 101662306a36Sopenharmony_ci vrate = min(vrate, vrate_min); 101762306a36Sopenharmony_ci } else if (vrate > vrate_max) { 101862306a36Sopenharmony_ci vrate = div64_u64(vrate * (100 - VRATE_CLAMP_ADJ_PCT), 100); 101962306a36Sopenharmony_ci vrate = max(vrate, vrate_max); 102062306a36Sopenharmony_ci } else { 102162306a36Sopenharmony_ci int idx = min_t(int, abs(ioc->busy_level), 102262306a36Sopenharmony_ci ARRAY_SIZE(vrate_adj_pct) - 1); 102362306a36Sopenharmony_ci u32 adj_pct = vrate_adj_pct[idx]; 102462306a36Sopenharmony_ci 102562306a36Sopenharmony_ci if (ioc->busy_level > 0) 102662306a36Sopenharmony_ci adj_pct = 100 - adj_pct; 102762306a36Sopenharmony_ci else 102862306a36Sopenharmony_ci adj_pct = 100 + adj_pct; 102962306a36Sopenharmony_ci 103062306a36Sopenharmony_ci vrate = clamp(DIV64_U64_ROUND_UP(vrate * adj_pct, 100), 103162306a36Sopenharmony_ci vrate_min, vrate_max); 103262306a36Sopenharmony_ci } 103362306a36Sopenharmony_ci 103462306a36Sopenharmony_ci trace_iocost_ioc_vrate_adj(ioc, vrate, missed_ppm, rq_wait_pct, 103562306a36Sopenharmony_ci nr_lagging, nr_shortages); 103662306a36Sopenharmony_ci 103762306a36Sopenharmony_ci ioc->vtime_base_rate = vrate; 103862306a36Sopenharmony_ci ioc_refresh_margins(ioc); 103962306a36Sopenharmony_ci} 104062306a36Sopenharmony_ci 104162306a36Sopenharmony_ci/* take a snapshot of the current [v]time and vrate */ 104262306a36Sopenharmony_cistatic void ioc_now(struct ioc *ioc, struct ioc_now *now) 104362306a36Sopenharmony_ci{ 104462306a36Sopenharmony_ci unsigned seq; 104562306a36Sopenharmony_ci u64 vrate; 104662306a36Sopenharmony_ci 104762306a36Sopenharmony_ci now->now_ns = ktime_get(); 104862306a36Sopenharmony_ci now->now = ktime_to_us(now->now_ns); 104962306a36Sopenharmony_ci vrate = atomic64_read(&ioc->vtime_rate); 105062306a36Sopenharmony_ci 105162306a36Sopenharmony_ci /* 105262306a36Sopenharmony_ci * The current vtime is 105362306a36Sopenharmony_ci * 105462306a36Sopenharmony_ci * vtime at period start + (wallclock time since the start) * vrate 105562306a36Sopenharmony_ci * 105662306a36Sopenharmony_ci * As a consistent snapshot of `period_at_vtime` and `period_at` is 105762306a36Sopenharmony_ci * needed, they're seqcount protected. 105862306a36Sopenharmony_ci */ 105962306a36Sopenharmony_ci do { 106062306a36Sopenharmony_ci seq = read_seqcount_begin(&ioc->period_seqcount); 106162306a36Sopenharmony_ci now->vnow = ioc->period_at_vtime + 106262306a36Sopenharmony_ci (now->now - ioc->period_at) * vrate; 106362306a36Sopenharmony_ci } while (read_seqcount_retry(&ioc->period_seqcount, seq)); 106462306a36Sopenharmony_ci} 106562306a36Sopenharmony_ci 106662306a36Sopenharmony_cistatic void ioc_start_period(struct ioc *ioc, struct ioc_now *now) 106762306a36Sopenharmony_ci{ 106862306a36Sopenharmony_ci WARN_ON_ONCE(ioc->running != IOC_RUNNING); 106962306a36Sopenharmony_ci 107062306a36Sopenharmony_ci write_seqcount_begin(&ioc->period_seqcount); 107162306a36Sopenharmony_ci ioc->period_at = now->now; 107262306a36Sopenharmony_ci ioc->period_at_vtime = now->vnow; 107362306a36Sopenharmony_ci write_seqcount_end(&ioc->period_seqcount); 107462306a36Sopenharmony_ci 107562306a36Sopenharmony_ci ioc->timer.expires = jiffies + usecs_to_jiffies(ioc->period_us); 107662306a36Sopenharmony_ci add_timer(&ioc->timer); 107762306a36Sopenharmony_ci} 107862306a36Sopenharmony_ci 107962306a36Sopenharmony_ci/* 108062306a36Sopenharmony_ci * Update @iocg's `active` and `inuse` to @active and @inuse, update level 108162306a36Sopenharmony_ci * weight sums and propagate upwards accordingly. If @save, the current margin 108262306a36Sopenharmony_ci * is saved to be used as reference for later inuse in-period adjustments. 108362306a36Sopenharmony_ci */ 108462306a36Sopenharmony_cistatic void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse, 108562306a36Sopenharmony_ci bool save, struct ioc_now *now) 108662306a36Sopenharmony_ci{ 108762306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 108862306a36Sopenharmony_ci int lvl; 108962306a36Sopenharmony_ci 109062306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 109162306a36Sopenharmony_ci 109262306a36Sopenharmony_ci /* 109362306a36Sopenharmony_ci * For an active leaf node, its inuse shouldn't be zero or exceed 109462306a36Sopenharmony_ci * @active. An active internal node's inuse is solely determined by the 109562306a36Sopenharmony_ci * inuse to active ratio of its children regardless of @inuse. 109662306a36Sopenharmony_ci */ 109762306a36Sopenharmony_ci if (list_empty(&iocg->active_list) && iocg->child_active_sum) { 109862306a36Sopenharmony_ci inuse = DIV64_U64_ROUND_UP(active * iocg->child_inuse_sum, 109962306a36Sopenharmony_ci iocg->child_active_sum); 110062306a36Sopenharmony_ci } else { 110162306a36Sopenharmony_ci inuse = clamp_t(u32, inuse, 1, active); 110262306a36Sopenharmony_ci } 110362306a36Sopenharmony_ci 110462306a36Sopenharmony_ci iocg->last_inuse = iocg->inuse; 110562306a36Sopenharmony_ci if (save) 110662306a36Sopenharmony_ci iocg->saved_margin = now->vnow - atomic64_read(&iocg->vtime); 110762306a36Sopenharmony_ci 110862306a36Sopenharmony_ci if (active == iocg->active && inuse == iocg->inuse) 110962306a36Sopenharmony_ci return; 111062306a36Sopenharmony_ci 111162306a36Sopenharmony_ci for (lvl = iocg->level - 1; lvl >= 0; lvl--) { 111262306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[lvl]; 111362306a36Sopenharmony_ci struct ioc_gq *child = iocg->ancestors[lvl + 1]; 111462306a36Sopenharmony_ci u32 parent_active = 0, parent_inuse = 0; 111562306a36Sopenharmony_ci 111662306a36Sopenharmony_ci /* update the level sums */ 111762306a36Sopenharmony_ci parent->child_active_sum += (s32)(active - child->active); 111862306a36Sopenharmony_ci parent->child_inuse_sum += (s32)(inuse - child->inuse); 111962306a36Sopenharmony_ci /* apply the updates */ 112062306a36Sopenharmony_ci child->active = active; 112162306a36Sopenharmony_ci child->inuse = inuse; 112262306a36Sopenharmony_ci 112362306a36Sopenharmony_ci /* 112462306a36Sopenharmony_ci * The delta between inuse and active sums indicates that 112562306a36Sopenharmony_ci * much of weight is being given away. Parent's inuse 112662306a36Sopenharmony_ci * and active should reflect the ratio. 112762306a36Sopenharmony_ci */ 112862306a36Sopenharmony_ci if (parent->child_active_sum) { 112962306a36Sopenharmony_ci parent_active = parent->weight; 113062306a36Sopenharmony_ci parent_inuse = DIV64_U64_ROUND_UP( 113162306a36Sopenharmony_ci parent_active * parent->child_inuse_sum, 113262306a36Sopenharmony_ci parent->child_active_sum); 113362306a36Sopenharmony_ci } 113462306a36Sopenharmony_ci 113562306a36Sopenharmony_ci /* do we need to keep walking up? */ 113662306a36Sopenharmony_ci if (parent_active == parent->active && 113762306a36Sopenharmony_ci parent_inuse == parent->inuse) 113862306a36Sopenharmony_ci break; 113962306a36Sopenharmony_ci 114062306a36Sopenharmony_ci active = parent_active; 114162306a36Sopenharmony_ci inuse = parent_inuse; 114262306a36Sopenharmony_ci } 114362306a36Sopenharmony_ci 114462306a36Sopenharmony_ci ioc->weights_updated = true; 114562306a36Sopenharmony_ci} 114662306a36Sopenharmony_ci 114762306a36Sopenharmony_cistatic void commit_weights(struct ioc *ioc) 114862306a36Sopenharmony_ci{ 114962306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 115062306a36Sopenharmony_ci 115162306a36Sopenharmony_ci if (ioc->weights_updated) { 115262306a36Sopenharmony_ci /* paired with rmb in current_hweight(), see there */ 115362306a36Sopenharmony_ci smp_wmb(); 115462306a36Sopenharmony_ci atomic_inc(&ioc->hweight_gen); 115562306a36Sopenharmony_ci ioc->weights_updated = false; 115662306a36Sopenharmony_ci } 115762306a36Sopenharmony_ci} 115862306a36Sopenharmony_ci 115962306a36Sopenharmony_cistatic void propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse, 116062306a36Sopenharmony_ci bool save, struct ioc_now *now) 116162306a36Sopenharmony_ci{ 116262306a36Sopenharmony_ci __propagate_weights(iocg, active, inuse, save, now); 116362306a36Sopenharmony_ci commit_weights(iocg->ioc); 116462306a36Sopenharmony_ci} 116562306a36Sopenharmony_ci 116662306a36Sopenharmony_cistatic void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep) 116762306a36Sopenharmony_ci{ 116862306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 116962306a36Sopenharmony_ci int lvl; 117062306a36Sopenharmony_ci u32 hwa, hwi; 117162306a36Sopenharmony_ci int ioc_gen; 117262306a36Sopenharmony_ci 117362306a36Sopenharmony_ci /* hot path - if uptodate, use cached */ 117462306a36Sopenharmony_ci ioc_gen = atomic_read(&ioc->hweight_gen); 117562306a36Sopenharmony_ci if (ioc_gen == iocg->hweight_gen) 117662306a36Sopenharmony_ci goto out; 117762306a36Sopenharmony_ci 117862306a36Sopenharmony_ci /* 117962306a36Sopenharmony_ci * Paired with wmb in commit_weights(). If we saw the updated 118062306a36Sopenharmony_ci * hweight_gen, all the weight updates from __propagate_weights() are 118162306a36Sopenharmony_ci * visible too. 118262306a36Sopenharmony_ci * 118362306a36Sopenharmony_ci * We can race with weight updates during calculation and get it 118462306a36Sopenharmony_ci * wrong. However, hweight_gen would have changed and a future 118562306a36Sopenharmony_ci * reader will recalculate and we're guaranteed to discard the 118662306a36Sopenharmony_ci * wrong result soon. 118762306a36Sopenharmony_ci */ 118862306a36Sopenharmony_ci smp_rmb(); 118962306a36Sopenharmony_ci 119062306a36Sopenharmony_ci hwa = hwi = WEIGHT_ONE; 119162306a36Sopenharmony_ci for (lvl = 0; lvl <= iocg->level - 1; lvl++) { 119262306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[lvl]; 119362306a36Sopenharmony_ci struct ioc_gq *child = iocg->ancestors[lvl + 1]; 119462306a36Sopenharmony_ci u64 active_sum = READ_ONCE(parent->child_active_sum); 119562306a36Sopenharmony_ci u64 inuse_sum = READ_ONCE(parent->child_inuse_sum); 119662306a36Sopenharmony_ci u32 active = READ_ONCE(child->active); 119762306a36Sopenharmony_ci u32 inuse = READ_ONCE(child->inuse); 119862306a36Sopenharmony_ci 119962306a36Sopenharmony_ci /* we can race with deactivations and either may read as zero */ 120062306a36Sopenharmony_ci if (!active_sum || !inuse_sum) 120162306a36Sopenharmony_ci continue; 120262306a36Sopenharmony_ci 120362306a36Sopenharmony_ci active_sum = max_t(u64, active, active_sum); 120462306a36Sopenharmony_ci hwa = div64_u64((u64)hwa * active, active_sum); 120562306a36Sopenharmony_ci 120662306a36Sopenharmony_ci inuse_sum = max_t(u64, inuse, inuse_sum); 120762306a36Sopenharmony_ci hwi = div64_u64((u64)hwi * inuse, inuse_sum); 120862306a36Sopenharmony_ci } 120962306a36Sopenharmony_ci 121062306a36Sopenharmony_ci iocg->hweight_active = max_t(u32, hwa, 1); 121162306a36Sopenharmony_ci iocg->hweight_inuse = max_t(u32, hwi, 1); 121262306a36Sopenharmony_ci iocg->hweight_gen = ioc_gen; 121362306a36Sopenharmony_ciout: 121462306a36Sopenharmony_ci if (hw_activep) 121562306a36Sopenharmony_ci *hw_activep = iocg->hweight_active; 121662306a36Sopenharmony_ci if (hw_inusep) 121762306a36Sopenharmony_ci *hw_inusep = iocg->hweight_inuse; 121862306a36Sopenharmony_ci} 121962306a36Sopenharmony_ci 122062306a36Sopenharmony_ci/* 122162306a36Sopenharmony_ci * Calculate the hweight_inuse @iocg would get with max @inuse assuming all the 122262306a36Sopenharmony_ci * other weights stay unchanged. 122362306a36Sopenharmony_ci */ 122462306a36Sopenharmony_cistatic u32 current_hweight_max(struct ioc_gq *iocg) 122562306a36Sopenharmony_ci{ 122662306a36Sopenharmony_ci u32 hwm = WEIGHT_ONE; 122762306a36Sopenharmony_ci u32 inuse = iocg->active; 122862306a36Sopenharmony_ci u64 child_inuse_sum; 122962306a36Sopenharmony_ci int lvl; 123062306a36Sopenharmony_ci 123162306a36Sopenharmony_ci lockdep_assert_held(&iocg->ioc->lock); 123262306a36Sopenharmony_ci 123362306a36Sopenharmony_ci for (lvl = iocg->level - 1; lvl >= 0; lvl--) { 123462306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[lvl]; 123562306a36Sopenharmony_ci struct ioc_gq *child = iocg->ancestors[lvl + 1]; 123662306a36Sopenharmony_ci 123762306a36Sopenharmony_ci child_inuse_sum = parent->child_inuse_sum + inuse - child->inuse; 123862306a36Sopenharmony_ci hwm = div64_u64((u64)hwm * inuse, child_inuse_sum); 123962306a36Sopenharmony_ci inuse = DIV64_U64_ROUND_UP(parent->active * child_inuse_sum, 124062306a36Sopenharmony_ci parent->child_active_sum); 124162306a36Sopenharmony_ci } 124262306a36Sopenharmony_ci 124362306a36Sopenharmony_ci return max_t(u32, hwm, 1); 124462306a36Sopenharmony_ci} 124562306a36Sopenharmony_ci 124662306a36Sopenharmony_cistatic void weight_updated(struct ioc_gq *iocg, struct ioc_now *now) 124762306a36Sopenharmony_ci{ 124862306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 124962306a36Sopenharmony_ci struct blkcg_gq *blkg = iocg_to_blkg(iocg); 125062306a36Sopenharmony_ci struct ioc_cgrp *iocc = blkcg_to_iocc(blkg->blkcg); 125162306a36Sopenharmony_ci u32 weight; 125262306a36Sopenharmony_ci 125362306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 125462306a36Sopenharmony_ci 125562306a36Sopenharmony_ci weight = iocg->cfg_weight ?: iocc->dfl_weight; 125662306a36Sopenharmony_ci if (weight != iocg->weight && iocg->active) 125762306a36Sopenharmony_ci propagate_weights(iocg, weight, iocg->inuse, true, now); 125862306a36Sopenharmony_ci iocg->weight = weight; 125962306a36Sopenharmony_ci} 126062306a36Sopenharmony_ci 126162306a36Sopenharmony_cistatic bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) 126262306a36Sopenharmony_ci{ 126362306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 126462306a36Sopenharmony_ci u64 last_period, cur_period; 126562306a36Sopenharmony_ci u64 vtime, vtarget; 126662306a36Sopenharmony_ci int i; 126762306a36Sopenharmony_ci 126862306a36Sopenharmony_ci /* 126962306a36Sopenharmony_ci * If seem to be already active, just update the stamp to tell the 127062306a36Sopenharmony_ci * timer that we're still active. We don't mind occassional races. 127162306a36Sopenharmony_ci */ 127262306a36Sopenharmony_ci if (!list_empty(&iocg->active_list)) { 127362306a36Sopenharmony_ci ioc_now(ioc, now); 127462306a36Sopenharmony_ci cur_period = atomic64_read(&ioc->cur_period); 127562306a36Sopenharmony_ci if (atomic64_read(&iocg->active_period) != cur_period) 127662306a36Sopenharmony_ci atomic64_set(&iocg->active_period, cur_period); 127762306a36Sopenharmony_ci return true; 127862306a36Sopenharmony_ci } 127962306a36Sopenharmony_ci 128062306a36Sopenharmony_ci /* racy check on internal node IOs, treat as root level IOs */ 128162306a36Sopenharmony_ci if (iocg->child_active_sum) 128262306a36Sopenharmony_ci return false; 128362306a36Sopenharmony_ci 128462306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 128562306a36Sopenharmony_ci 128662306a36Sopenharmony_ci ioc_now(ioc, now); 128762306a36Sopenharmony_ci 128862306a36Sopenharmony_ci /* update period */ 128962306a36Sopenharmony_ci cur_period = atomic64_read(&ioc->cur_period); 129062306a36Sopenharmony_ci last_period = atomic64_read(&iocg->active_period); 129162306a36Sopenharmony_ci atomic64_set(&iocg->active_period, cur_period); 129262306a36Sopenharmony_ci 129362306a36Sopenharmony_ci /* already activated or breaking leaf-only constraint? */ 129462306a36Sopenharmony_ci if (!list_empty(&iocg->active_list)) 129562306a36Sopenharmony_ci goto succeed_unlock; 129662306a36Sopenharmony_ci for (i = iocg->level - 1; i > 0; i--) 129762306a36Sopenharmony_ci if (!list_empty(&iocg->ancestors[i]->active_list)) 129862306a36Sopenharmony_ci goto fail_unlock; 129962306a36Sopenharmony_ci 130062306a36Sopenharmony_ci if (iocg->child_active_sum) 130162306a36Sopenharmony_ci goto fail_unlock; 130262306a36Sopenharmony_ci 130362306a36Sopenharmony_ci /* 130462306a36Sopenharmony_ci * Always start with the target budget. On deactivation, we throw away 130562306a36Sopenharmony_ci * anything above it. 130662306a36Sopenharmony_ci */ 130762306a36Sopenharmony_ci vtarget = now->vnow - ioc->margins.target; 130862306a36Sopenharmony_ci vtime = atomic64_read(&iocg->vtime); 130962306a36Sopenharmony_ci 131062306a36Sopenharmony_ci atomic64_add(vtarget - vtime, &iocg->vtime); 131162306a36Sopenharmony_ci atomic64_add(vtarget - vtime, &iocg->done_vtime); 131262306a36Sopenharmony_ci vtime = vtarget; 131362306a36Sopenharmony_ci 131462306a36Sopenharmony_ci /* 131562306a36Sopenharmony_ci * Activate, propagate weight and start period timer if not 131662306a36Sopenharmony_ci * running. Reset hweight_gen to avoid accidental match from 131762306a36Sopenharmony_ci * wrapping. 131862306a36Sopenharmony_ci */ 131962306a36Sopenharmony_ci iocg->hweight_gen = atomic_read(&ioc->hweight_gen) - 1; 132062306a36Sopenharmony_ci list_add(&iocg->active_list, &ioc->active_iocgs); 132162306a36Sopenharmony_ci 132262306a36Sopenharmony_ci propagate_weights(iocg, iocg->weight, 132362306a36Sopenharmony_ci iocg->last_inuse ?: iocg->weight, true, now); 132462306a36Sopenharmony_ci 132562306a36Sopenharmony_ci TRACE_IOCG_PATH(iocg_activate, iocg, now, 132662306a36Sopenharmony_ci last_period, cur_period, vtime); 132762306a36Sopenharmony_ci 132862306a36Sopenharmony_ci iocg->activated_at = now->now; 132962306a36Sopenharmony_ci 133062306a36Sopenharmony_ci if (ioc->running == IOC_IDLE) { 133162306a36Sopenharmony_ci ioc->running = IOC_RUNNING; 133262306a36Sopenharmony_ci ioc->dfgv_period_at = now->now; 133362306a36Sopenharmony_ci ioc->dfgv_period_rem = 0; 133462306a36Sopenharmony_ci ioc_start_period(ioc, now); 133562306a36Sopenharmony_ci } 133662306a36Sopenharmony_ci 133762306a36Sopenharmony_cisucceed_unlock: 133862306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 133962306a36Sopenharmony_ci return true; 134062306a36Sopenharmony_ci 134162306a36Sopenharmony_cifail_unlock: 134262306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 134362306a36Sopenharmony_ci return false; 134462306a36Sopenharmony_ci} 134562306a36Sopenharmony_ci 134662306a36Sopenharmony_cistatic bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) 134762306a36Sopenharmony_ci{ 134862306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 134962306a36Sopenharmony_ci struct blkcg_gq *blkg = iocg_to_blkg(iocg); 135062306a36Sopenharmony_ci u64 tdelta, delay, new_delay; 135162306a36Sopenharmony_ci s64 vover, vover_pct; 135262306a36Sopenharmony_ci u32 hwa; 135362306a36Sopenharmony_ci 135462306a36Sopenharmony_ci lockdep_assert_held(&iocg->waitq.lock); 135562306a36Sopenharmony_ci 135662306a36Sopenharmony_ci /* 135762306a36Sopenharmony_ci * If the delay is set by another CPU, we may be in the past. No need to 135862306a36Sopenharmony_ci * change anything if so. This avoids decay calculation underflow. 135962306a36Sopenharmony_ci */ 136062306a36Sopenharmony_ci if (time_before64(now->now, iocg->delay_at)) 136162306a36Sopenharmony_ci return false; 136262306a36Sopenharmony_ci 136362306a36Sopenharmony_ci /* calculate the current delay in effect - 1/2 every second */ 136462306a36Sopenharmony_ci tdelta = now->now - iocg->delay_at; 136562306a36Sopenharmony_ci if (iocg->delay) 136662306a36Sopenharmony_ci delay = iocg->delay >> div64_u64(tdelta, USEC_PER_SEC); 136762306a36Sopenharmony_ci else 136862306a36Sopenharmony_ci delay = 0; 136962306a36Sopenharmony_ci 137062306a36Sopenharmony_ci /* calculate the new delay from the debt amount */ 137162306a36Sopenharmony_ci current_hweight(iocg, &hwa, NULL); 137262306a36Sopenharmony_ci vover = atomic64_read(&iocg->vtime) + 137362306a36Sopenharmony_ci abs_cost_to_cost(iocg->abs_vdebt, hwa) - now->vnow; 137462306a36Sopenharmony_ci vover_pct = div64_s64(100 * vover, 137562306a36Sopenharmony_ci ioc->period_us * ioc->vtime_base_rate); 137662306a36Sopenharmony_ci 137762306a36Sopenharmony_ci if (vover_pct <= MIN_DELAY_THR_PCT) 137862306a36Sopenharmony_ci new_delay = 0; 137962306a36Sopenharmony_ci else if (vover_pct >= MAX_DELAY_THR_PCT) 138062306a36Sopenharmony_ci new_delay = MAX_DELAY; 138162306a36Sopenharmony_ci else 138262306a36Sopenharmony_ci new_delay = MIN_DELAY + 138362306a36Sopenharmony_ci div_u64((MAX_DELAY - MIN_DELAY) * 138462306a36Sopenharmony_ci (vover_pct - MIN_DELAY_THR_PCT), 138562306a36Sopenharmony_ci MAX_DELAY_THR_PCT - MIN_DELAY_THR_PCT); 138662306a36Sopenharmony_ci 138762306a36Sopenharmony_ci /* pick the higher one and apply */ 138862306a36Sopenharmony_ci if (new_delay > delay) { 138962306a36Sopenharmony_ci iocg->delay = new_delay; 139062306a36Sopenharmony_ci iocg->delay_at = now->now; 139162306a36Sopenharmony_ci delay = new_delay; 139262306a36Sopenharmony_ci } 139362306a36Sopenharmony_ci 139462306a36Sopenharmony_ci if (delay >= MIN_DELAY) { 139562306a36Sopenharmony_ci if (!iocg->indelay_since) 139662306a36Sopenharmony_ci iocg->indelay_since = now->now; 139762306a36Sopenharmony_ci blkcg_set_delay(blkg, delay * NSEC_PER_USEC); 139862306a36Sopenharmony_ci return true; 139962306a36Sopenharmony_ci } else { 140062306a36Sopenharmony_ci if (iocg->indelay_since) { 140162306a36Sopenharmony_ci iocg->stat.indelay_us += now->now - iocg->indelay_since; 140262306a36Sopenharmony_ci iocg->indelay_since = 0; 140362306a36Sopenharmony_ci } 140462306a36Sopenharmony_ci iocg->delay = 0; 140562306a36Sopenharmony_ci blkcg_clear_delay(blkg); 140662306a36Sopenharmony_ci return false; 140762306a36Sopenharmony_ci } 140862306a36Sopenharmony_ci} 140962306a36Sopenharmony_ci 141062306a36Sopenharmony_cistatic void iocg_incur_debt(struct ioc_gq *iocg, u64 abs_cost, 141162306a36Sopenharmony_ci struct ioc_now *now) 141262306a36Sopenharmony_ci{ 141362306a36Sopenharmony_ci struct iocg_pcpu_stat *gcs; 141462306a36Sopenharmony_ci 141562306a36Sopenharmony_ci lockdep_assert_held(&iocg->ioc->lock); 141662306a36Sopenharmony_ci lockdep_assert_held(&iocg->waitq.lock); 141762306a36Sopenharmony_ci WARN_ON_ONCE(list_empty(&iocg->active_list)); 141862306a36Sopenharmony_ci 141962306a36Sopenharmony_ci /* 142062306a36Sopenharmony_ci * Once in debt, debt handling owns inuse. @iocg stays at the minimum 142162306a36Sopenharmony_ci * inuse donating all of it share to others until its debt is paid off. 142262306a36Sopenharmony_ci */ 142362306a36Sopenharmony_ci if (!iocg->abs_vdebt && abs_cost) { 142462306a36Sopenharmony_ci iocg->indebt_since = now->now; 142562306a36Sopenharmony_ci propagate_weights(iocg, iocg->active, 0, false, now); 142662306a36Sopenharmony_ci } 142762306a36Sopenharmony_ci 142862306a36Sopenharmony_ci iocg->abs_vdebt += abs_cost; 142962306a36Sopenharmony_ci 143062306a36Sopenharmony_ci gcs = get_cpu_ptr(iocg->pcpu_stat); 143162306a36Sopenharmony_ci local64_add(abs_cost, &gcs->abs_vusage); 143262306a36Sopenharmony_ci put_cpu_ptr(gcs); 143362306a36Sopenharmony_ci} 143462306a36Sopenharmony_ci 143562306a36Sopenharmony_cistatic void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, 143662306a36Sopenharmony_ci struct ioc_now *now) 143762306a36Sopenharmony_ci{ 143862306a36Sopenharmony_ci lockdep_assert_held(&iocg->ioc->lock); 143962306a36Sopenharmony_ci lockdep_assert_held(&iocg->waitq.lock); 144062306a36Sopenharmony_ci 144162306a36Sopenharmony_ci /* make sure that nobody messed with @iocg */ 144262306a36Sopenharmony_ci WARN_ON_ONCE(list_empty(&iocg->active_list)); 144362306a36Sopenharmony_ci WARN_ON_ONCE(iocg->inuse > 1); 144462306a36Sopenharmony_ci 144562306a36Sopenharmony_ci iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); 144662306a36Sopenharmony_ci 144762306a36Sopenharmony_ci /* if debt is paid in full, restore inuse */ 144862306a36Sopenharmony_ci if (!iocg->abs_vdebt) { 144962306a36Sopenharmony_ci iocg->stat.indebt_us += now->now - iocg->indebt_since; 145062306a36Sopenharmony_ci iocg->indebt_since = 0; 145162306a36Sopenharmony_ci 145262306a36Sopenharmony_ci propagate_weights(iocg, iocg->active, iocg->last_inuse, 145362306a36Sopenharmony_ci false, now); 145462306a36Sopenharmony_ci } 145562306a36Sopenharmony_ci} 145662306a36Sopenharmony_ci 145762306a36Sopenharmony_cistatic int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode, 145862306a36Sopenharmony_ci int flags, void *key) 145962306a36Sopenharmony_ci{ 146062306a36Sopenharmony_ci struct iocg_wait *wait = container_of(wq_entry, struct iocg_wait, wait); 146162306a36Sopenharmony_ci struct iocg_wake_ctx *ctx = key; 146262306a36Sopenharmony_ci u64 cost = abs_cost_to_cost(wait->abs_cost, ctx->hw_inuse); 146362306a36Sopenharmony_ci 146462306a36Sopenharmony_ci ctx->vbudget -= cost; 146562306a36Sopenharmony_ci 146662306a36Sopenharmony_ci if (ctx->vbudget < 0) 146762306a36Sopenharmony_ci return -1; 146862306a36Sopenharmony_ci 146962306a36Sopenharmony_ci iocg_commit_bio(ctx->iocg, wait->bio, wait->abs_cost, cost); 147062306a36Sopenharmony_ci wait->committed = true; 147162306a36Sopenharmony_ci 147262306a36Sopenharmony_ci /* 147362306a36Sopenharmony_ci * autoremove_wake_function() removes the wait entry only when it 147462306a36Sopenharmony_ci * actually changed the task state. We want the wait always removed. 147562306a36Sopenharmony_ci * Remove explicitly and use default_wake_function(). Note that the 147662306a36Sopenharmony_ci * order of operations is important as finish_wait() tests whether 147762306a36Sopenharmony_ci * @wq_entry is removed without grabbing the lock. 147862306a36Sopenharmony_ci */ 147962306a36Sopenharmony_ci default_wake_function(wq_entry, mode, flags, key); 148062306a36Sopenharmony_ci list_del_init_careful(&wq_entry->entry); 148162306a36Sopenharmony_ci return 0; 148262306a36Sopenharmony_ci} 148362306a36Sopenharmony_ci 148462306a36Sopenharmony_ci/* 148562306a36Sopenharmony_ci * Calculate the accumulated budget, pay debt if @pay_debt and wake up waiters 148662306a36Sopenharmony_ci * accordingly. When @pay_debt is %true, the caller must be holding ioc->lock in 148762306a36Sopenharmony_ci * addition to iocg->waitq.lock. 148862306a36Sopenharmony_ci */ 148962306a36Sopenharmony_cistatic void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, 149062306a36Sopenharmony_ci struct ioc_now *now) 149162306a36Sopenharmony_ci{ 149262306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 149362306a36Sopenharmony_ci struct iocg_wake_ctx ctx = { .iocg = iocg }; 149462306a36Sopenharmony_ci u64 vshortage, expires, oexpires; 149562306a36Sopenharmony_ci s64 vbudget; 149662306a36Sopenharmony_ci u32 hwa; 149762306a36Sopenharmony_ci 149862306a36Sopenharmony_ci lockdep_assert_held(&iocg->waitq.lock); 149962306a36Sopenharmony_ci 150062306a36Sopenharmony_ci current_hweight(iocg, &hwa, NULL); 150162306a36Sopenharmony_ci vbudget = now->vnow - atomic64_read(&iocg->vtime); 150262306a36Sopenharmony_ci 150362306a36Sopenharmony_ci /* pay off debt */ 150462306a36Sopenharmony_ci if (pay_debt && iocg->abs_vdebt && vbudget > 0) { 150562306a36Sopenharmony_ci u64 abs_vbudget = cost_to_abs_cost(vbudget, hwa); 150662306a36Sopenharmony_ci u64 abs_vpay = min_t(u64, abs_vbudget, iocg->abs_vdebt); 150762306a36Sopenharmony_ci u64 vpay = abs_cost_to_cost(abs_vpay, hwa); 150862306a36Sopenharmony_ci 150962306a36Sopenharmony_ci lockdep_assert_held(&ioc->lock); 151062306a36Sopenharmony_ci 151162306a36Sopenharmony_ci atomic64_add(vpay, &iocg->vtime); 151262306a36Sopenharmony_ci atomic64_add(vpay, &iocg->done_vtime); 151362306a36Sopenharmony_ci iocg_pay_debt(iocg, abs_vpay, now); 151462306a36Sopenharmony_ci vbudget -= vpay; 151562306a36Sopenharmony_ci } 151662306a36Sopenharmony_ci 151762306a36Sopenharmony_ci if (iocg->abs_vdebt || iocg->delay) 151862306a36Sopenharmony_ci iocg_kick_delay(iocg, now); 151962306a36Sopenharmony_ci 152062306a36Sopenharmony_ci /* 152162306a36Sopenharmony_ci * Debt can still be outstanding if we haven't paid all yet or the 152262306a36Sopenharmony_ci * caller raced and called without @pay_debt. Shouldn't wake up waiters 152362306a36Sopenharmony_ci * under debt. Make sure @vbudget reflects the outstanding amount and is 152462306a36Sopenharmony_ci * not positive. 152562306a36Sopenharmony_ci */ 152662306a36Sopenharmony_ci if (iocg->abs_vdebt) { 152762306a36Sopenharmony_ci s64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hwa); 152862306a36Sopenharmony_ci vbudget = min_t(s64, 0, vbudget - vdebt); 152962306a36Sopenharmony_ci } 153062306a36Sopenharmony_ci 153162306a36Sopenharmony_ci /* 153262306a36Sopenharmony_ci * Wake up the ones which are due and see how much vtime we'll need for 153362306a36Sopenharmony_ci * the next one. As paying off debt restores hw_inuse, it must be read 153462306a36Sopenharmony_ci * after the above debt payment. 153562306a36Sopenharmony_ci */ 153662306a36Sopenharmony_ci ctx.vbudget = vbudget; 153762306a36Sopenharmony_ci current_hweight(iocg, NULL, &ctx.hw_inuse); 153862306a36Sopenharmony_ci 153962306a36Sopenharmony_ci __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx); 154062306a36Sopenharmony_ci 154162306a36Sopenharmony_ci if (!waitqueue_active(&iocg->waitq)) { 154262306a36Sopenharmony_ci if (iocg->wait_since) { 154362306a36Sopenharmony_ci iocg->stat.wait_us += now->now - iocg->wait_since; 154462306a36Sopenharmony_ci iocg->wait_since = 0; 154562306a36Sopenharmony_ci } 154662306a36Sopenharmony_ci return; 154762306a36Sopenharmony_ci } 154862306a36Sopenharmony_ci 154962306a36Sopenharmony_ci if (!iocg->wait_since) 155062306a36Sopenharmony_ci iocg->wait_since = now->now; 155162306a36Sopenharmony_ci 155262306a36Sopenharmony_ci if (WARN_ON_ONCE(ctx.vbudget >= 0)) 155362306a36Sopenharmony_ci return; 155462306a36Sopenharmony_ci 155562306a36Sopenharmony_ci /* determine next wakeup, add a timer margin to guarantee chunking */ 155662306a36Sopenharmony_ci vshortage = -ctx.vbudget; 155762306a36Sopenharmony_ci expires = now->now_ns + 155862306a36Sopenharmony_ci DIV64_U64_ROUND_UP(vshortage, ioc->vtime_base_rate) * 155962306a36Sopenharmony_ci NSEC_PER_USEC; 156062306a36Sopenharmony_ci expires += ioc->timer_slack_ns; 156162306a36Sopenharmony_ci 156262306a36Sopenharmony_ci /* if already active and close enough, don't bother */ 156362306a36Sopenharmony_ci oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->waitq_timer)); 156462306a36Sopenharmony_ci if (hrtimer_is_queued(&iocg->waitq_timer) && 156562306a36Sopenharmony_ci abs(oexpires - expires) <= ioc->timer_slack_ns) 156662306a36Sopenharmony_ci return; 156762306a36Sopenharmony_ci 156862306a36Sopenharmony_ci hrtimer_start_range_ns(&iocg->waitq_timer, ns_to_ktime(expires), 156962306a36Sopenharmony_ci ioc->timer_slack_ns, HRTIMER_MODE_ABS); 157062306a36Sopenharmony_ci} 157162306a36Sopenharmony_ci 157262306a36Sopenharmony_cistatic enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer) 157362306a36Sopenharmony_ci{ 157462306a36Sopenharmony_ci struct ioc_gq *iocg = container_of(timer, struct ioc_gq, waitq_timer); 157562306a36Sopenharmony_ci bool pay_debt = READ_ONCE(iocg->abs_vdebt); 157662306a36Sopenharmony_ci struct ioc_now now; 157762306a36Sopenharmony_ci unsigned long flags; 157862306a36Sopenharmony_ci 157962306a36Sopenharmony_ci ioc_now(iocg->ioc, &now); 158062306a36Sopenharmony_ci 158162306a36Sopenharmony_ci iocg_lock(iocg, pay_debt, &flags); 158262306a36Sopenharmony_ci iocg_kick_waitq(iocg, pay_debt, &now); 158362306a36Sopenharmony_ci iocg_unlock(iocg, pay_debt, &flags); 158462306a36Sopenharmony_ci 158562306a36Sopenharmony_ci return HRTIMER_NORESTART; 158662306a36Sopenharmony_ci} 158762306a36Sopenharmony_ci 158862306a36Sopenharmony_cistatic void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p) 158962306a36Sopenharmony_ci{ 159062306a36Sopenharmony_ci u32 nr_met[2] = { }; 159162306a36Sopenharmony_ci u32 nr_missed[2] = { }; 159262306a36Sopenharmony_ci u64 rq_wait_ns = 0; 159362306a36Sopenharmony_ci int cpu, rw; 159462306a36Sopenharmony_ci 159562306a36Sopenharmony_ci for_each_online_cpu(cpu) { 159662306a36Sopenharmony_ci struct ioc_pcpu_stat *stat = per_cpu_ptr(ioc->pcpu_stat, cpu); 159762306a36Sopenharmony_ci u64 this_rq_wait_ns; 159862306a36Sopenharmony_ci 159962306a36Sopenharmony_ci for (rw = READ; rw <= WRITE; rw++) { 160062306a36Sopenharmony_ci u32 this_met = local_read(&stat->missed[rw].nr_met); 160162306a36Sopenharmony_ci u32 this_missed = local_read(&stat->missed[rw].nr_missed); 160262306a36Sopenharmony_ci 160362306a36Sopenharmony_ci nr_met[rw] += this_met - stat->missed[rw].last_met; 160462306a36Sopenharmony_ci nr_missed[rw] += this_missed - stat->missed[rw].last_missed; 160562306a36Sopenharmony_ci stat->missed[rw].last_met = this_met; 160662306a36Sopenharmony_ci stat->missed[rw].last_missed = this_missed; 160762306a36Sopenharmony_ci } 160862306a36Sopenharmony_ci 160962306a36Sopenharmony_ci this_rq_wait_ns = local64_read(&stat->rq_wait_ns); 161062306a36Sopenharmony_ci rq_wait_ns += this_rq_wait_ns - stat->last_rq_wait_ns; 161162306a36Sopenharmony_ci stat->last_rq_wait_ns = this_rq_wait_ns; 161262306a36Sopenharmony_ci } 161362306a36Sopenharmony_ci 161462306a36Sopenharmony_ci for (rw = READ; rw <= WRITE; rw++) { 161562306a36Sopenharmony_ci if (nr_met[rw] + nr_missed[rw]) 161662306a36Sopenharmony_ci missed_ppm_ar[rw] = 161762306a36Sopenharmony_ci DIV64_U64_ROUND_UP((u64)nr_missed[rw] * MILLION, 161862306a36Sopenharmony_ci nr_met[rw] + nr_missed[rw]); 161962306a36Sopenharmony_ci else 162062306a36Sopenharmony_ci missed_ppm_ar[rw] = 0; 162162306a36Sopenharmony_ci } 162262306a36Sopenharmony_ci 162362306a36Sopenharmony_ci *rq_wait_pct_p = div64_u64(rq_wait_ns * 100, 162462306a36Sopenharmony_ci ioc->period_us * NSEC_PER_USEC); 162562306a36Sopenharmony_ci} 162662306a36Sopenharmony_ci 162762306a36Sopenharmony_ci/* was iocg idle this period? */ 162862306a36Sopenharmony_cistatic bool iocg_is_idle(struct ioc_gq *iocg) 162962306a36Sopenharmony_ci{ 163062306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 163162306a36Sopenharmony_ci 163262306a36Sopenharmony_ci /* did something get issued this period? */ 163362306a36Sopenharmony_ci if (atomic64_read(&iocg->active_period) == 163462306a36Sopenharmony_ci atomic64_read(&ioc->cur_period)) 163562306a36Sopenharmony_ci return false; 163662306a36Sopenharmony_ci 163762306a36Sopenharmony_ci /* is something in flight? */ 163862306a36Sopenharmony_ci if (atomic64_read(&iocg->done_vtime) != atomic64_read(&iocg->vtime)) 163962306a36Sopenharmony_ci return false; 164062306a36Sopenharmony_ci 164162306a36Sopenharmony_ci return true; 164262306a36Sopenharmony_ci} 164362306a36Sopenharmony_ci 164462306a36Sopenharmony_ci/* 164562306a36Sopenharmony_ci * Call this function on the target leaf @iocg's to build pre-order traversal 164662306a36Sopenharmony_ci * list of all the ancestors in @inner_walk. The inner nodes are linked through 164762306a36Sopenharmony_ci * ->walk_list and the caller is responsible for dissolving the list after use. 164862306a36Sopenharmony_ci */ 164962306a36Sopenharmony_cistatic void iocg_build_inner_walk(struct ioc_gq *iocg, 165062306a36Sopenharmony_ci struct list_head *inner_walk) 165162306a36Sopenharmony_ci{ 165262306a36Sopenharmony_ci int lvl; 165362306a36Sopenharmony_ci 165462306a36Sopenharmony_ci WARN_ON_ONCE(!list_empty(&iocg->walk_list)); 165562306a36Sopenharmony_ci 165662306a36Sopenharmony_ci /* find the first ancestor which hasn't been visited yet */ 165762306a36Sopenharmony_ci for (lvl = iocg->level - 1; lvl >= 0; lvl--) { 165862306a36Sopenharmony_ci if (!list_empty(&iocg->ancestors[lvl]->walk_list)) 165962306a36Sopenharmony_ci break; 166062306a36Sopenharmony_ci } 166162306a36Sopenharmony_ci 166262306a36Sopenharmony_ci /* walk down and visit the inner nodes to get pre-order traversal */ 166362306a36Sopenharmony_ci while (++lvl <= iocg->level - 1) { 166462306a36Sopenharmony_ci struct ioc_gq *inner = iocg->ancestors[lvl]; 166562306a36Sopenharmony_ci 166662306a36Sopenharmony_ci /* record traversal order */ 166762306a36Sopenharmony_ci list_add_tail(&inner->walk_list, inner_walk); 166862306a36Sopenharmony_ci } 166962306a36Sopenharmony_ci} 167062306a36Sopenharmony_ci 167162306a36Sopenharmony_ci/* propagate the deltas to the parent */ 167262306a36Sopenharmony_cistatic void iocg_flush_stat_upward(struct ioc_gq *iocg) 167362306a36Sopenharmony_ci{ 167462306a36Sopenharmony_ci if (iocg->level > 0) { 167562306a36Sopenharmony_ci struct iocg_stat *parent_stat = 167662306a36Sopenharmony_ci &iocg->ancestors[iocg->level - 1]->stat; 167762306a36Sopenharmony_ci 167862306a36Sopenharmony_ci parent_stat->usage_us += 167962306a36Sopenharmony_ci iocg->stat.usage_us - iocg->last_stat.usage_us; 168062306a36Sopenharmony_ci parent_stat->wait_us += 168162306a36Sopenharmony_ci iocg->stat.wait_us - iocg->last_stat.wait_us; 168262306a36Sopenharmony_ci parent_stat->indebt_us += 168362306a36Sopenharmony_ci iocg->stat.indebt_us - iocg->last_stat.indebt_us; 168462306a36Sopenharmony_ci parent_stat->indelay_us += 168562306a36Sopenharmony_ci iocg->stat.indelay_us - iocg->last_stat.indelay_us; 168662306a36Sopenharmony_ci } 168762306a36Sopenharmony_ci 168862306a36Sopenharmony_ci iocg->last_stat = iocg->stat; 168962306a36Sopenharmony_ci} 169062306a36Sopenharmony_ci 169162306a36Sopenharmony_ci/* collect per-cpu counters and propagate the deltas to the parent */ 169262306a36Sopenharmony_cistatic void iocg_flush_stat_leaf(struct ioc_gq *iocg, struct ioc_now *now) 169362306a36Sopenharmony_ci{ 169462306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 169562306a36Sopenharmony_ci u64 abs_vusage = 0; 169662306a36Sopenharmony_ci u64 vusage_delta; 169762306a36Sopenharmony_ci int cpu; 169862306a36Sopenharmony_ci 169962306a36Sopenharmony_ci lockdep_assert_held(&iocg->ioc->lock); 170062306a36Sopenharmony_ci 170162306a36Sopenharmony_ci /* collect per-cpu counters */ 170262306a36Sopenharmony_ci for_each_possible_cpu(cpu) { 170362306a36Sopenharmony_ci abs_vusage += local64_read( 170462306a36Sopenharmony_ci per_cpu_ptr(&iocg->pcpu_stat->abs_vusage, cpu)); 170562306a36Sopenharmony_ci } 170662306a36Sopenharmony_ci vusage_delta = abs_vusage - iocg->last_stat_abs_vusage; 170762306a36Sopenharmony_ci iocg->last_stat_abs_vusage = abs_vusage; 170862306a36Sopenharmony_ci 170962306a36Sopenharmony_ci iocg->usage_delta_us = div64_u64(vusage_delta, ioc->vtime_base_rate); 171062306a36Sopenharmony_ci iocg->stat.usage_us += iocg->usage_delta_us; 171162306a36Sopenharmony_ci 171262306a36Sopenharmony_ci iocg_flush_stat_upward(iocg); 171362306a36Sopenharmony_ci} 171462306a36Sopenharmony_ci 171562306a36Sopenharmony_ci/* get stat counters ready for reading on all active iocgs */ 171662306a36Sopenharmony_cistatic void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now) 171762306a36Sopenharmony_ci{ 171862306a36Sopenharmony_ci LIST_HEAD(inner_walk); 171962306a36Sopenharmony_ci struct ioc_gq *iocg, *tiocg; 172062306a36Sopenharmony_ci 172162306a36Sopenharmony_ci /* flush leaves and build inner node walk list */ 172262306a36Sopenharmony_ci list_for_each_entry(iocg, target_iocgs, active_list) { 172362306a36Sopenharmony_ci iocg_flush_stat_leaf(iocg, now); 172462306a36Sopenharmony_ci iocg_build_inner_walk(iocg, &inner_walk); 172562306a36Sopenharmony_ci } 172662306a36Sopenharmony_ci 172762306a36Sopenharmony_ci /* keep flushing upwards by walking the inner list backwards */ 172862306a36Sopenharmony_ci list_for_each_entry_safe_reverse(iocg, tiocg, &inner_walk, walk_list) { 172962306a36Sopenharmony_ci iocg_flush_stat_upward(iocg); 173062306a36Sopenharmony_ci list_del_init(&iocg->walk_list); 173162306a36Sopenharmony_ci } 173262306a36Sopenharmony_ci} 173362306a36Sopenharmony_ci 173462306a36Sopenharmony_ci/* 173562306a36Sopenharmony_ci * Determine what @iocg's hweight_inuse should be after donating unused 173662306a36Sopenharmony_ci * capacity. @hwm is the upper bound and used to signal no donation. This 173762306a36Sopenharmony_ci * function also throws away @iocg's excess budget. 173862306a36Sopenharmony_ci */ 173962306a36Sopenharmony_cistatic u32 hweight_after_donation(struct ioc_gq *iocg, u32 old_hwi, u32 hwm, 174062306a36Sopenharmony_ci u32 usage, struct ioc_now *now) 174162306a36Sopenharmony_ci{ 174262306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 174362306a36Sopenharmony_ci u64 vtime = atomic64_read(&iocg->vtime); 174462306a36Sopenharmony_ci s64 excess, delta, target, new_hwi; 174562306a36Sopenharmony_ci 174662306a36Sopenharmony_ci /* debt handling owns inuse for debtors */ 174762306a36Sopenharmony_ci if (iocg->abs_vdebt) 174862306a36Sopenharmony_ci return 1; 174962306a36Sopenharmony_ci 175062306a36Sopenharmony_ci /* see whether minimum margin requirement is met */ 175162306a36Sopenharmony_ci if (waitqueue_active(&iocg->waitq) || 175262306a36Sopenharmony_ci time_after64(vtime, now->vnow - ioc->margins.min)) 175362306a36Sopenharmony_ci return hwm; 175462306a36Sopenharmony_ci 175562306a36Sopenharmony_ci /* throw away excess above target */ 175662306a36Sopenharmony_ci excess = now->vnow - vtime - ioc->margins.target; 175762306a36Sopenharmony_ci if (excess > 0) { 175862306a36Sopenharmony_ci atomic64_add(excess, &iocg->vtime); 175962306a36Sopenharmony_ci atomic64_add(excess, &iocg->done_vtime); 176062306a36Sopenharmony_ci vtime += excess; 176162306a36Sopenharmony_ci ioc->vtime_err -= div64_u64(excess * old_hwi, WEIGHT_ONE); 176262306a36Sopenharmony_ci } 176362306a36Sopenharmony_ci 176462306a36Sopenharmony_ci /* 176562306a36Sopenharmony_ci * Let's say the distance between iocg's and device's vtimes as a 176662306a36Sopenharmony_ci * fraction of period duration is delta. Assuming that the iocg will 176762306a36Sopenharmony_ci * consume the usage determined above, we want to determine new_hwi so 176862306a36Sopenharmony_ci * that delta equals MARGIN_TARGET at the end of the next period. 176962306a36Sopenharmony_ci * 177062306a36Sopenharmony_ci * We need to execute usage worth of IOs while spending the sum of the 177162306a36Sopenharmony_ci * new budget (1 - MARGIN_TARGET) and the leftover from the last period 177262306a36Sopenharmony_ci * (delta): 177362306a36Sopenharmony_ci * 177462306a36Sopenharmony_ci * usage = (1 - MARGIN_TARGET + delta) * new_hwi 177562306a36Sopenharmony_ci * 177662306a36Sopenharmony_ci * Therefore, the new_hwi is: 177762306a36Sopenharmony_ci * 177862306a36Sopenharmony_ci * new_hwi = usage / (1 - MARGIN_TARGET + delta) 177962306a36Sopenharmony_ci */ 178062306a36Sopenharmony_ci delta = div64_s64(WEIGHT_ONE * (now->vnow - vtime), 178162306a36Sopenharmony_ci now->vnow - ioc->period_at_vtime); 178262306a36Sopenharmony_ci target = WEIGHT_ONE * MARGIN_TARGET_PCT / 100; 178362306a36Sopenharmony_ci new_hwi = div64_s64(WEIGHT_ONE * usage, WEIGHT_ONE - target + delta); 178462306a36Sopenharmony_ci 178562306a36Sopenharmony_ci return clamp_t(s64, new_hwi, 1, hwm); 178662306a36Sopenharmony_ci} 178762306a36Sopenharmony_ci 178862306a36Sopenharmony_ci/* 178962306a36Sopenharmony_ci * For work-conservation, an iocg which isn't using all of its share should 179062306a36Sopenharmony_ci * donate the leftover to other iocgs. There are two ways to achieve this - 1. 179162306a36Sopenharmony_ci * bumping up vrate accordingly 2. lowering the donating iocg's inuse weight. 179262306a36Sopenharmony_ci * 179362306a36Sopenharmony_ci * #1 is mathematically simpler but has the drawback of requiring synchronous 179462306a36Sopenharmony_ci * global hweight_inuse updates when idle iocg's get activated or inuse weights 179562306a36Sopenharmony_ci * change due to donation snapbacks as it has the possibility of grossly 179662306a36Sopenharmony_ci * overshooting what's allowed by the model and vrate. 179762306a36Sopenharmony_ci * 179862306a36Sopenharmony_ci * #2 is inherently safe with local operations. The donating iocg can easily 179962306a36Sopenharmony_ci * snap back to higher weights when needed without worrying about impacts on 180062306a36Sopenharmony_ci * other nodes as the impacts will be inherently correct. This also makes idle 180162306a36Sopenharmony_ci * iocg activations safe. The only effect activations have is decreasing 180262306a36Sopenharmony_ci * hweight_inuse of others, the right solution to which is for those iocgs to 180362306a36Sopenharmony_ci * snap back to higher weights. 180462306a36Sopenharmony_ci * 180562306a36Sopenharmony_ci * So, we go with #2. The challenge is calculating how each donating iocg's 180662306a36Sopenharmony_ci * inuse should be adjusted to achieve the target donation amounts. This is done 180762306a36Sopenharmony_ci * using Andy's method described in the following pdf. 180862306a36Sopenharmony_ci * 180962306a36Sopenharmony_ci * https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo 181062306a36Sopenharmony_ci * 181162306a36Sopenharmony_ci * Given the weights and target after-donation hweight_inuse values, Andy's 181262306a36Sopenharmony_ci * method determines how the proportional distribution should look like at each 181362306a36Sopenharmony_ci * sibling level to maintain the relative relationship between all non-donating 181462306a36Sopenharmony_ci * pairs. To roughly summarize, it divides the tree into donating and 181562306a36Sopenharmony_ci * non-donating parts, calculates global donation rate which is used to 181662306a36Sopenharmony_ci * determine the target hweight_inuse for each node, and then derives per-level 181762306a36Sopenharmony_ci * proportions. 181862306a36Sopenharmony_ci * 181962306a36Sopenharmony_ci * The following pdf shows that global distribution calculated this way can be 182062306a36Sopenharmony_ci * achieved by scaling inuse weights of donating leaves and propagating the 182162306a36Sopenharmony_ci * adjustments upwards proportionally. 182262306a36Sopenharmony_ci * 182362306a36Sopenharmony_ci * https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE 182462306a36Sopenharmony_ci * 182562306a36Sopenharmony_ci * Combining the above two, we can determine how each leaf iocg's inuse should 182662306a36Sopenharmony_ci * be adjusted to achieve the target donation. 182762306a36Sopenharmony_ci * 182862306a36Sopenharmony_ci * https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN 182962306a36Sopenharmony_ci * 183062306a36Sopenharmony_ci * The inline comments use symbols from the last pdf. 183162306a36Sopenharmony_ci * 183262306a36Sopenharmony_ci * b is the sum of the absolute budgets in the subtree. 1 for the root node. 183362306a36Sopenharmony_ci * f is the sum of the absolute budgets of non-donating nodes in the subtree. 183462306a36Sopenharmony_ci * t is the sum of the absolute budgets of donating nodes in the subtree. 183562306a36Sopenharmony_ci * w is the weight of the node. w = w_f + w_t 183662306a36Sopenharmony_ci * w_f is the non-donating portion of w. w_f = w * f / b 183762306a36Sopenharmony_ci * w_b is the donating portion of w. w_t = w * t / b 183862306a36Sopenharmony_ci * s is the sum of all sibling weights. s = Sum(w) for siblings 183962306a36Sopenharmony_ci * s_f and s_t are the non-donating and donating portions of s. 184062306a36Sopenharmony_ci * 184162306a36Sopenharmony_ci * Subscript p denotes the parent's counterpart and ' the adjusted value - e.g. 184262306a36Sopenharmony_ci * w_pt is the donating portion of the parent's weight and w'_pt the same value 184362306a36Sopenharmony_ci * after adjustments. Subscript r denotes the root node's values. 184462306a36Sopenharmony_ci */ 184562306a36Sopenharmony_cistatic void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now) 184662306a36Sopenharmony_ci{ 184762306a36Sopenharmony_ci LIST_HEAD(over_hwa); 184862306a36Sopenharmony_ci LIST_HEAD(inner_walk); 184962306a36Sopenharmony_ci struct ioc_gq *iocg, *tiocg, *root_iocg; 185062306a36Sopenharmony_ci u32 after_sum, over_sum, over_target, gamma; 185162306a36Sopenharmony_ci 185262306a36Sopenharmony_ci /* 185362306a36Sopenharmony_ci * It's pretty unlikely but possible for the total sum of 185462306a36Sopenharmony_ci * hweight_after_donation's to be higher than WEIGHT_ONE, which will 185562306a36Sopenharmony_ci * confuse the following calculations. If such condition is detected, 185662306a36Sopenharmony_ci * scale down everyone over its full share equally to keep the sum below 185762306a36Sopenharmony_ci * WEIGHT_ONE. 185862306a36Sopenharmony_ci */ 185962306a36Sopenharmony_ci after_sum = 0; 186062306a36Sopenharmony_ci over_sum = 0; 186162306a36Sopenharmony_ci list_for_each_entry(iocg, surpluses, surplus_list) { 186262306a36Sopenharmony_ci u32 hwa; 186362306a36Sopenharmony_ci 186462306a36Sopenharmony_ci current_hweight(iocg, &hwa, NULL); 186562306a36Sopenharmony_ci after_sum += iocg->hweight_after_donation; 186662306a36Sopenharmony_ci 186762306a36Sopenharmony_ci if (iocg->hweight_after_donation > hwa) { 186862306a36Sopenharmony_ci over_sum += iocg->hweight_after_donation; 186962306a36Sopenharmony_ci list_add(&iocg->walk_list, &over_hwa); 187062306a36Sopenharmony_ci } 187162306a36Sopenharmony_ci } 187262306a36Sopenharmony_ci 187362306a36Sopenharmony_ci if (after_sum >= WEIGHT_ONE) { 187462306a36Sopenharmony_ci /* 187562306a36Sopenharmony_ci * The delta should be deducted from the over_sum, calculate 187662306a36Sopenharmony_ci * target over_sum value. 187762306a36Sopenharmony_ci */ 187862306a36Sopenharmony_ci u32 over_delta = after_sum - (WEIGHT_ONE - 1); 187962306a36Sopenharmony_ci WARN_ON_ONCE(over_sum <= over_delta); 188062306a36Sopenharmony_ci over_target = over_sum - over_delta; 188162306a36Sopenharmony_ci } else { 188262306a36Sopenharmony_ci over_target = 0; 188362306a36Sopenharmony_ci } 188462306a36Sopenharmony_ci 188562306a36Sopenharmony_ci list_for_each_entry_safe(iocg, tiocg, &over_hwa, walk_list) { 188662306a36Sopenharmony_ci if (over_target) 188762306a36Sopenharmony_ci iocg->hweight_after_donation = 188862306a36Sopenharmony_ci div_u64((u64)iocg->hweight_after_donation * 188962306a36Sopenharmony_ci over_target, over_sum); 189062306a36Sopenharmony_ci list_del_init(&iocg->walk_list); 189162306a36Sopenharmony_ci } 189262306a36Sopenharmony_ci 189362306a36Sopenharmony_ci /* 189462306a36Sopenharmony_ci * Build pre-order inner node walk list and prepare for donation 189562306a36Sopenharmony_ci * adjustment calculations. 189662306a36Sopenharmony_ci */ 189762306a36Sopenharmony_ci list_for_each_entry(iocg, surpluses, surplus_list) { 189862306a36Sopenharmony_ci iocg_build_inner_walk(iocg, &inner_walk); 189962306a36Sopenharmony_ci } 190062306a36Sopenharmony_ci 190162306a36Sopenharmony_ci root_iocg = list_first_entry(&inner_walk, struct ioc_gq, walk_list); 190262306a36Sopenharmony_ci WARN_ON_ONCE(root_iocg->level > 0); 190362306a36Sopenharmony_ci 190462306a36Sopenharmony_ci list_for_each_entry(iocg, &inner_walk, walk_list) { 190562306a36Sopenharmony_ci iocg->child_adjusted_sum = 0; 190662306a36Sopenharmony_ci iocg->hweight_donating = 0; 190762306a36Sopenharmony_ci iocg->hweight_after_donation = 0; 190862306a36Sopenharmony_ci } 190962306a36Sopenharmony_ci 191062306a36Sopenharmony_ci /* 191162306a36Sopenharmony_ci * Propagate the donating budget (b_t) and after donation budget (b'_t) 191262306a36Sopenharmony_ci * up the hierarchy. 191362306a36Sopenharmony_ci */ 191462306a36Sopenharmony_ci list_for_each_entry(iocg, surpluses, surplus_list) { 191562306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; 191662306a36Sopenharmony_ci 191762306a36Sopenharmony_ci parent->hweight_donating += iocg->hweight_donating; 191862306a36Sopenharmony_ci parent->hweight_after_donation += iocg->hweight_after_donation; 191962306a36Sopenharmony_ci } 192062306a36Sopenharmony_ci 192162306a36Sopenharmony_ci list_for_each_entry_reverse(iocg, &inner_walk, walk_list) { 192262306a36Sopenharmony_ci if (iocg->level > 0) { 192362306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; 192462306a36Sopenharmony_ci 192562306a36Sopenharmony_ci parent->hweight_donating += iocg->hweight_donating; 192662306a36Sopenharmony_ci parent->hweight_after_donation += iocg->hweight_after_donation; 192762306a36Sopenharmony_ci } 192862306a36Sopenharmony_ci } 192962306a36Sopenharmony_ci 193062306a36Sopenharmony_ci /* 193162306a36Sopenharmony_ci * Calculate inner hwa's (b) and make sure the donation values are 193262306a36Sopenharmony_ci * within the accepted ranges as we're doing low res calculations with 193362306a36Sopenharmony_ci * roundups. 193462306a36Sopenharmony_ci */ 193562306a36Sopenharmony_ci list_for_each_entry(iocg, &inner_walk, walk_list) { 193662306a36Sopenharmony_ci if (iocg->level) { 193762306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; 193862306a36Sopenharmony_ci 193962306a36Sopenharmony_ci iocg->hweight_active = DIV64_U64_ROUND_UP( 194062306a36Sopenharmony_ci (u64)parent->hweight_active * iocg->active, 194162306a36Sopenharmony_ci parent->child_active_sum); 194262306a36Sopenharmony_ci 194362306a36Sopenharmony_ci } 194462306a36Sopenharmony_ci 194562306a36Sopenharmony_ci iocg->hweight_donating = min(iocg->hweight_donating, 194662306a36Sopenharmony_ci iocg->hweight_active); 194762306a36Sopenharmony_ci iocg->hweight_after_donation = min(iocg->hweight_after_donation, 194862306a36Sopenharmony_ci iocg->hweight_donating - 1); 194962306a36Sopenharmony_ci if (WARN_ON_ONCE(iocg->hweight_active <= 1 || 195062306a36Sopenharmony_ci iocg->hweight_donating <= 1 || 195162306a36Sopenharmony_ci iocg->hweight_after_donation == 0)) { 195262306a36Sopenharmony_ci pr_warn("iocg: invalid donation weights in "); 195362306a36Sopenharmony_ci pr_cont_cgroup_path(iocg_to_blkg(iocg)->blkcg->css.cgroup); 195462306a36Sopenharmony_ci pr_cont(": active=%u donating=%u after=%u\n", 195562306a36Sopenharmony_ci iocg->hweight_active, iocg->hweight_donating, 195662306a36Sopenharmony_ci iocg->hweight_after_donation); 195762306a36Sopenharmony_ci } 195862306a36Sopenharmony_ci } 195962306a36Sopenharmony_ci 196062306a36Sopenharmony_ci /* 196162306a36Sopenharmony_ci * Calculate the global donation rate (gamma) - the rate to adjust 196262306a36Sopenharmony_ci * non-donating budgets by. 196362306a36Sopenharmony_ci * 196462306a36Sopenharmony_ci * No need to use 64bit multiplication here as the first operand is 196562306a36Sopenharmony_ci * guaranteed to be smaller than WEIGHT_ONE (1<<16). 196662306a36Sopenharmony_ci * 196762306a36Sopenharmony_ci * We know that there are beneficiary nodes and the sum of the donating 196862306a36Sopenharmony_ci * hweights can't be whole; however, due to the round-ups during hweight 196962306a36Sopenharmony_ci * calculations, root_iocg->hweight_donating might still end up equal to 197062306a36Sopenharmony_ci * or greater than whole. Limit the range when calculating the divider. 197162306a36Sopenharmony_ci * 197262306a36Sopenharmony_ci * gamma = (1 - t_r') / (1 - t_r) 197362306a36Sopenharmony_ci */ 197462306a36Sopenharmony_ci gamma = DIV_ROUND_UP( 197562306a36Sopenharmony_ci (WEIGHT_ONE - root_iocg->hweight_after_donation) * WEIGHT_ONE, 197662306a36Sopenharmony_ci WEIGHT_ONE - min_t(u32, root_iocg->hweight_donating, WEIGHT_ONE - 1)); 197762306a36Sopenharmony_ci 197862306a36Sopenharmony_ci /* 197962306a36Sopenharmony_ci * Calculate adjusted hwi, child_adjusted_sum and inuse for the inner 198062306a36Sopenharmony_ci * nodes. 198162306a36Sopenharmony_ci */ 198262306a36Sopenharmony_ci list_for_each_entry(iocg, &inner_walk, walk_list) { 198362306a36Sopenharmony_ci struct ioc_gq *parent; 198462306a36Sopenharmony_ci u32 inuse, wpt, wptp; 198562306a36Sopenharmony_ci u64 st, sf; 198662306a36Sopenharmony_ci 198762306a36Sopenharmony_ci if (iocg->level == 0) { 198862306a36Sopenharmony_ci /* adjusted weight sum for 1st level: s' = s * b_pf / b'_pf */ 198962306a36Sopenharmony_ci iocg->child_adjusted_sum = DIV64_U64_ROUND_UP( 199062306a36Sopenharmony_ci iocg->child_active_sum * (WEIGHT_ONE - iocg->hweight_donating), 199162306a36Sopenharmony_ci WEIGHT_ONE - iocg->hweight_after_donation); 199262306a36Sopenharmony_ci continue; 199362306a36Sopenharmony_ci } 199462306a36Sopenharmony_ci 199562306a36Sopenharmony_ci parent = iocg->ancestors[iocg->level - 1]; 199662306a36Sopenharmony_ci 199762306a36Sopenharmony_ci /* b' = gamma * b_f + b_t' */ 199862306a36Sopenharmony_ci iocg->hweight_inuse = DIV64_U64_ROUND_UP( 199962306a36Sopenharmony_ci (u64)gamma * (iocg->hweight_active - iocg->hweight_donating), 200062306a36Sopenharmony_ci WEIGHT_ONE) + iocg->hweight_after_donation; 200162306a36Sopenharmony_ci 200262306a36Sopenharmony_ci /* w' = s' * b' / b'_p */ 200362306a36Sopenharmony_ci inuse = DIV64_U64_ROUND_UP( 200462306a36Sopenharmony_ci (u64)parent->child_adjusted_sum * iocg->hweight_inuse, 200562306a36Sopenharmony_ci parent->hweight_inuse); 200662306a36Sopenharmony_ci 200762306a36Sopenharmony_ci /* adjusted weight sum for children: s' = s_f + s_t * w'_pt / w_pt */ 200862306a36Sopenharmony_ci st = DIV64_U64_ROUND_UP( 200962306a36Sopenharmony_ci iocg->child_active_sum * iocg->hweight_donating, 201062306a36Sopenharmony_ci iocg->hweight_active); 201162306a36Sopenharmony_ci sf = iocg->child_active_sum - st; 201262306a36Sopenharmony_ci wpt = DIV64_U64_ROUND_UP( 201362306a36Sopenharmony_ci (u64)iocg->active * iocg->hweight_donating, 201462306a36Sopenharmony_ci iocg->hweight_active); 201562306a36Sopenharmony_ci wptp = DIV64_U64_ROUND_UP( 201662306a36Sopenharmony_ci (u64)inuse * iocg->hweight_after_donation, 201762306a36Sopenharmony_ci iocg->hweight_inuse); 201862306a36Sopenharmony_ci 201962306a36Sopenharmony_ci iocg->child_adjusted_sum = sf + DIV64_U64_ROUND_UP(st * wptp, wpt); 202062306a36Sopenharmony_ci } 202162306a36Sopenharmony_ci 202262306a36Sopenharmony_ci /* 202362306a36Sopenharmony_ci * All inner nodes now have ->hweight_inuse and ->child_adjusted_sum and 202462306a36Sopenharmony_ci * we can finally determine leaf adjustments. 202562306a36Sopenharmony_ci */ 202662306a36Sopenharmony_ci list_for_each_entry(iocg, surpluses, surplus_list) { 202762306a36Sopenharmony_ci struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; 202862306a36Sopenharmony_ci u32 inuse; 202962306a36Sopenharmony_ci 203062306a36Sopenharmony_ci /* 203162306a36Sopenharmony_ci * In-debt iocgs participated in the donation calculation with 203262306a36Sopenharmony_ci * the minimum target hweight_inuse. Configuring inuse 203362306a36Sopenharmony_ci * accordingly would work fine but debt handling expects 203462306a36Sopenharmony_ci * @iocg->inuse stay at the minimum and we don't wanna 203562306a36Sopenharmony_ci * interfere. 203662306a36Sopenharmony_ci */ 203762306a36Sopenharmony_ci if (iocg->abs_vdebt) { 203862306a36Sopenharmony_ci WARN_ON_ONCE(iocg->inuse > 1); 203962306a36Sopenharmony_ci continue; 204062306a36Sopenharmony_ci } 204162306a36Sopenharmony_ci 204262306a36Sopenharmony_ci /* w' = s' * b' / b'_p, note that b' == b'_t for donating leaves */ 204362306a36Sopenharmony_ci inuse = DIV64_U64_ROUND_UP( 204462306a36Sopenharmony_ci parent->child_adjusted_sum * iocg->hweight_after_donation, 204562306a36Sopenharmony_ci parent->hweight_inuse); 204662306a36Sopenharmony_ci 204762306a36Sopenharmony_ci TRACE_IOCG_PATH(inuse_transfer, iocg, now, 204862306a36Sopenharmony_ci iocg->inuse, inuse, 204962306a36Sopenharmony_ci iocg->hweight_inuse, 205062306a36Sopenharmony_ci iocg->hweight_after_donation); 205162306a36Sopenharmony_ci 205262306a36Sopenharmony_ci __propagate_weights(iocg, iocg->active, inuse, true, now); 205362306a36Sopenharmony_ci } 205462306a36Sopenharmony_ci 205562306a36Sopenharmony_ci /* walk list should be dissolved after use */ 205662306a36Sopenharmony_ci list_for_each_entry_safe(iocg, tiocg, &inner_walk, walk_list) 205762306a36Sopenharmony_ci list_del_init(&iocg->walk_list); 205862306a36Sopenharmony_ci} 205962306a36Sopenharmony_ci 206062306a36Sopenharmony_ci/* 206162306a36Sopenharmony_ci * A low weight iocg can amass a large amount of debt, for example, when 206262306a36Sopenharmony_ci * anonymous memory gets reclaimed aggressively. If the system has a lot of 206362306a36Sopenharmony_ci * memory paired with a slow IO device, the debt can span multiple seconds or 206462306a36Sopenharmony_ci * more. If there are no other subsequent IO issuers, the in-debt iocg may end 206562306a36Sopenharmony_ci * up blocked paying its debt while the IO device is idle. 206662306a36Sopenharmony_ci * 206762306a36Sopenharmony_ci * The following protects against such cases. If the device has been 206862306a36Sopenharmony_ci * sufficiently idle for a while, the debts are halved and delays are 206962306a36Sopenharmony_ci * recalculated. 207062306a36Sopenharmony_ci */ 207162306a36Sopenharmony_cistatic void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors, 207262306a36Sopenharmony_ci struct ioc_now *now) 207362306a36Sopenharmony_ci{ 207462306a36Sopenharmony_ci struct ioc_gq *iocg; 207562306a36Sopenharmony_ci u64 dur, usage_pct, nr_cycles; 207662306a36Sopenharmony_ci 207762306a36Sopenharmony_ci /* if no debtor, reset the cycle */ 207862306a36Sopenharmony_ci if (!nr_debtors) { 207962306a36Sopenharmony_ci ioc->dfgv_period_at = now->now; 208062306a36Sopenharmony_ci ioc->dfgv_period_rem = 0; 208162306a36Sopenharmony_ci ioc->dfgv_usage_us_sum = 0; 208262306a36Sopenharmony_ci return; 208362306a36Sopenharmony_ci } 208462306a36Sopenharmony_ci 208562306a36Sopenharmony_ci /* 208662306a36Sopenharmony_ci * Debtors can pass through a lot of writes choking the device and we 208762306a36Sopenharmony_ci * don't want to be forgiving debts while the device is struggling from 208862306a36Sopenharmony_ci * write bursts. If we're missing latency targets, consider the device 208962306a36Sopenharmony_ci * fully utilized. 209062306a36Sopenharmony_ci */ 209162306a36Sopenharmony_ci if (ioc->busy_level > 0) 209262306a36Sopenharmony_ci usage_us_sum = max_t(u64, usage_us_sum, ioc->period_us); 209362306a36Sopenharmony_ci 209462306a36Sopenharmony_ci ioc->dfgv_usage_us_sum += usage_us_sum; 209562306a36Sopenharmony_ci if (time_before64(now->now, ioc->dfgv_period_at + DFGV_PERIOD)) 209662306a36Sopenharmony_ci return; 209762306a36Sopenharmony_ci 209862306a36Sopenharmony_ci /* 209962306a36Sopenharmony_ci * At least DFGV_PERIOD has passed since the last period. Calculate the 210062306a36Sopenharmony_ci * average usage and reset the period counters. 210162306a36Sopenharmony_ci */ 210262306a36Sopenharmony_ci dur = now->now - ioc->dfgv_period_at; 210362306a36Sopenharmony_ci usage_pct = div64_u64(100 * ioc->dfgv_usage_us_sum, dur); 210462306a36Sopenharmony_ci 210562306a36Sopenharmony_ci ioc->dfgv_period_at = now->now; 210662306a36Sopenharmony_ci ioc->dfgv_usage_us_sum = 0; 210762306a36Sopenharmony_ci 210862306a36Sopenharmony_ci /* if was too busy, reset everything */ 210962306a36Sopenharmony_ci if (usage_pct > DFGV_USAGE_PCT) { 211062306a36Sopenharmony_ci ioc->dfgv_period_rem = 0; 211162306a36Sopenharmony_ci return; 211262306a36Sopenharmony_ci } 211362306a36Sopenharmony_ci 211462306a36Sopenharmony_ci /* 211562306a36Sopenharmony_ci * Usage is lower than threshold. Let's forgive some debts. Debt 211662306a36Sopenharmony_ci * forgiveness runs off of the usual ioc timer but its period usually 211762306a36Sopenharmony_ci * doesn't match ioc's. Compensate the difference by performing the 211862306a36Sopenharmony_ci * reduction as many times as would fit in the duration since the last 211962306a36Sopenharmony_ci * run and carrying over the left-over duration in @ioc->dfgv_period_rem 212062306a36Sopenharmony_ci * - if ioc period is 75% of DFGV_PERIOD, one out of three consecutive 212162306a36Sopenharmony_ci * reductions is doubled. 212262306a36Sopenharmony_ci */ 212362306a36Sopenharmony_ci nr_cycles = dur + ioc->dfgv_period_rem; 212462306a36Sopenharmony_ci ioc->dfgv_period_rem = do_div(nr_cycles, DFGV_PERIOD); 212562306a36Sopenharmony_ci 212662306a36Sopenharmony_ci list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { 212762306a36Sopenharmony_ci u64 __maybe_unused old_debt, __maybe_unused old_delay; 212862306a36Sopenharmony_ci 212962306a36Sopenharmony_ci if (!iocg->abs_vdebt && !iocg->delay) 213062306a36Sopenharmony_ci continue; 213162306a36Sopenharmony_ci 213262306a36Sopenharmony_ci spin_lock(&iocg->waitq.lock); 213362306a36Sopenharmony_ci 213462306a36Sopenharmony_ci old_debt = iocg->abs_vdebt; 213562306a36Sopenharmony_ci old_delay = iocg->delay; 213662306a36Sopenharmony_ci 213762306a36Sopenharmony_ci if (iocg->abs_vdebt) 213862306a36Sopenharmony_ci iocg->abs_vdebt = iocg->abs_vdebt >> nr_cycles ?: 1; 213962306a36Sopenharmony_ci if (iocg->delay) 214062306a36Sopenharmony_ci iocg->delay = iocg->delay >> nr_cycles ?: 1; 214162306a36Sopenharmony_ci 214262306a36Sopenharmony_ci iocg_kick_waitq(iocg, true, now); 214362306a36Sopenharmony_ci 214462306a36Sopenharmony_ci TRACE_IOCG_PATH(iocg_forgive_debt, iocg, now, usage_pct, 214562306a36Sopenharmony_ci old_debt, iocg->abs_vdebt, 214662306a36Sopenharmony_ci old_delay, iocg->delay); 214762306a36Sopenharmony_ci 214862306a36Sopenharmony_ci spin_unlock(&iocg->waitq.lock); 214962306a36Sopenharmony_ci } 215062306a36Sopenharmony_ci} 215162306a36Sopenharmony_ci 215262306a36Sopenharmony_ci/* 215362306a36Sopenharmony_ci * Check the active iocgs' state to avoid oversleeping and deactive 215462306a36Sopenharmony_ci * idle iocgs. 215562306a36Sopenharmony_ci * 215662306a36Sopenharmony_ci * Since waiters determine the sleep durations based on the vrate 215762306a36Sopenharmony_ci * they saw at the time of sleep, if vrate has increased, some 215862306a36Sopenharmony_ci * waiters could be sleeping for too long. Wake up tardy waiters 215962306a36Sopenharmony_ci * which should have woken up in the last period and expire idle 216062306a36Sopenharmony_ci * iocgs. 216162306a36Sopenharmony_ci */ 216262306a36Sopenharmony_cistatic int ioc_check_iocgs(struct ioc *ioc, struct ioc_now *now) 216362306a36Sopenharmony_ci{ 216462306a36Sopenharmony_ci int nr_debtors = 0; 216562306a36Sopenharmony_ci struct ioc_gq *iocg, *tiocg; 216662306a36Sopenharmony_ci 216762306a36Sopenharmony_ci list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) { 216862306a36Sopenharmony_ci if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt && 216962306a36Sopenharmony_ci !iocg->delay && !iocg_is_idle(iocg)) 217062306a36Sopenharmony_ci continue; 217162306a36Sopenharmony_ci 217262306a36Sopenharmony_ci spin_lock(&iocg->waitq.lock); 217362306a36Sopenharmony_ci 217462306a36Sopenharmony_ci /* flush wait and indebt stat deltas */ 217562306a36Sopenharmony_ci if (iocg->wait_since) { 217662306a36Sopenharmony_ci iocg->stat.wait_us += now->now - iocg->wait_since; 217762306a36Sopenharmony_ci iocg->wait_since = now->now; 217862306a36Sopenharmony_ci } 217962306a36Sopenharmony_ci if (iocg->indebt_since) { 218062306a36Sopenharmony_ci iocg->stat.indebt_us += 218162306a36Sopenharmony_ci now->now - iocg->indebt_since; 218262306a36Sopenharmony_ci iocg->indebt_since = now->now; 218362306a36Sopenharmony_ci } 218462306a36Sopenharmony_ci if (iocg->indelay_since) { 218562306a36Sopenharmony_ci iocg->stat.indelay_us += 218662306a36Sopenharmony_ci now->now - iocg->indelay_since; 218762306a36Sopenharmony_ci iocg->indelay_since = now->now; 218862306a36Sopenharmony_ci } 218962306a36Sopenharmony_ci 219062306a36Sopenharmony_ci if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt || 219162306a36Sopenharmony_ci iocg->delay) { 219262306a36Sopenharmony_ci /* might be oversleeping vtime / hweight changes, kick */ 219362306a36Sopenharmony_ci iocg_kick_waitq(iocg, true, now); 219462306a36Sopenharmony_ci if (iocg->abs_vdebt || iocg->delay) 219562306a36Sopenharmony_ci nr_debtors++; 219662306a36Sopenharmony_ci } else if (iocg_is_idle(iocg)) { 219762306a36Sopenharmony_ci /* no waiter and idle, deactivate */ 219862306a36Sopenharmony_ci u64 vtime = atomic64_read(&iocg->vtime); 219962306a36Sopenharmony_ci s64 excess; 220062306a36Sopenharmony_ci 220162306a36Sopenharmony_ci /* 220262306a36Sopenharmony_ci * @iocg has been inactive for a full duration and will 220362306a36Sopenharmony_ci * have a high budget. Account anything above target as 220462306a36Sopenharmony_ci * error and throw away. On reactivation, it'll start 220562306a36Sopenharmony_ci * with the target budget. 220662306a36Sopenharmony_ci */ 220762306a36Sopenharmony_ci excess = now->vnow - vtime - ioc->margins.target; 220862306a36Sopenharmony_ci if (excess > 0) { 220962306a36Sopenharmony_ci u32 old_hwi; 221062306a36Sopenharmony_ci 221162306a36Sopenharmony_ci current_hweight(iocg, NULL, &old_hwi); 221262306a36Sopenharmony_ci ioc->vtime_err -= div64_u64(excess * old_hwi, 221362306a36Sopenharmony_ci WEIGHT_ONE); 221462306a36Sopenharmony_ci } 221562306a36Sopenharmony_ci 221662306a36Sopenharmony_ci TRACE_IOCG_PATH(iocg_idle, iocg, now, 221762306a36Sopenharmony_ci atomic64_read(&iocg->active_period), 221862306a36Sopenharmony_ci atomic64_read(&ioc->cur_period), vtime); 221962306a36Sopenharmony_ci __propagate_weights(iocg, 0, 0, false, now); 222062306a36Sopenharmony_ci list_del_init(&iocg->active_list); 222162306a36Sopenharmony_ci } 222262306a36Sopenharmony_ci 222362306a36Sopenharmony_ci spin_unlock(&iocg->waitq.lock); 222462306a36Sopenharmony_ci } 222562306a36Sopenharmony_ci 222662306a36Sopenharmony_ci commit_weights(ioc); 222762306a36Sopenharmony_ci return nr_debtors; 222862306a36Sopenharmony_ci} 222962306a36Sopenharmony_ci 223062306a36Sopenharmony_cistatic void ioc_timer_fn(struct timer_list *timer) 223162306a36Sopenharmony_ci{ 223262306a36Sopenharmony_ci struct ioc *ioc = container_of(timer, struct ioc, timer); 223362306a36Sopenharmony_ci struct ioc_gq *iocg, *tiocg; 223462306a36Sopenharmony_ci struct ioc_now now; 223562306a36Sopenharmony_ci LIST_HEAD(surpluses); 223662306a36Sopenharmony_ci int nr_debtors, nr_shortages = 0, nr_lagging = 0; 223762306a36Sopenharmony_ci u64 usage_us_sum = 0; 223862306a36Sopenharmony_ci u32 ppm_rthr; 223962306a36Sopenharmony_ci u32 ppm_wthr; 224062306a36Sopenharmony_ci u32 missed_ppm[2], rq_wait_pct; 224162306a36Sopenharmony_ci u64 period_vtime; 224262306a36Sopenharmony_ci int prev_busy_level; 224362306a36Sopenharmony_ci 224462306a36Sopenharmony_ci /* how were the latencies during the period? */ 224562306a36Sopenharmony_ci ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct); 224662306a36Sopenharmony_ci 224762306a36Sopenharmony_ci /* take care of active iocgs */ 224862306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 224962306a36Sopenharmony_ci 225062306a36Sopenharmony_ci ppm_rthr = MILLION - ioc->params.qos[QOS_RPPM]; 225162306a36Sopenharmony_ci ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM]; 225262306a36Sopenharmony_ci ioc_now(ioc, &now); 225362306a36Sopenharmony_ci 225462306a36Sopenharmony_ci period_vtime = now.vnow - ioc->period_at_vtime; 225562306a36Sopenharmony_ci if (WARN_ON_ONCE(!period_vtime)) { 225662306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 225762306a36Sopenharmony_ci return; 225862306a36Sopenharmony_ci } 225962306a36Sopenharmony_ci 226062306a36Sopenharmony_ci nr_debtors = ioc_check_iocgs(ioc, &now); 226162306a36Sopenharmony_ci 226262306a36Sopenharmony_ci /* 226362306a36Sopenharmony_ci * Wait and indebt stat are flushed above and the donation calculation 226462306a36Sopenharmony_ci * below needs updated usage stat. Let's bring stat up-to-date. 226562306a36Sopenharmony_ci */ 226662306a36Sopenharmony_ci iocg_flush_stat(&ioc->active_iocgs, &now); 226762306a36Sopenharmony_ci 226862306a36Sopenharmony_ci /* calc usage and see whether some weights need to be moved around */ 226962306a36Sopenharmony_ci list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { 227062306a36Sopenharmony_ci u64 vdone, vtime, usage_us; 227162306a36Sopenharmony_ci u32 hw_active, hw_inuse; 227262306a36Sopenharmony_ci 227362306a36Sopenharmony_ci /* 227462306a36Sopenharmony_ci * Collect unused and wind vtime closer to vnow to prevent 227562306a36Sopenharmony_ci * iocgs from accumulating a large amount of budget. 227662306a36Sopenharmony_ci */ 227762306a36Sopenharmony_ci vdone = atomic64_read(&iocg->done_vtime); 227862306a36Sopenharmony_ci vtime = atomic64_read(&iocg->vtime); 227962306a36Sopenharmony_ci current_hweight(iocg, &hw_active, &hw_inuse); 228062306a36Sopenharmony_ci 228162306a36Sopenharmony_ci /* 228262306a36Sopenharmony_ci * Latency QoS detection doesn't account for IOs which are 228362306a36Sopenharmony_ci * in-flight for longer than a period. Detect them by 228462306a36Sopenharmony_ci * comparing vdone against period start. If lagging behind 228562306a36Sopenharmony_ci * IOs from past periods, don't increase vrate. 228662306a36Sopenharmony_ci */ 228762306a36Sopenharmony_ci if ((ppm_rthr != MILLION || ppm_wthr != MILLION) && 228862306a36Sopenharmony_ci !atomic_read(&iocg_to_blkg(iocg)->use_delay) && 228962306a36Sopenharmony_ci time_after64(vtime, vdone) && 229062306a36Sopenharmony_ci time_after64(vtime, now.vnow - 229162306a36Sopenharmony_ci MAX_LAGGING_PERIODS * period_vtime) && 229262306a36Sopenharmony_ci time_before64(vdone, now.vnow - period_vtime)) 229362306a36Sopenharmony_ci nr_lagging++; 229462306a36Sopenharmony_ci 229562306a36Sopenharmony_ci /* 229662306a36Sopenharmony_ci * Determine absolute usage factoring in in-flight IOs to avoid 229762306a36Sopenharmony_ci * high-latency completions appearing as idle. 229862306a36Sopenharmony_ci */ 229962306a36Sopenharmony_ci usage_us = iocg->usage_delta_us; 230062306a36Sopenharmony_ci usage_us_sum += usage_us; 230162306a36Sopenharmony_ci 230262306a36Sopenharmony_ci /* see whether there's surplus vtime */ 230362306a36Sopenharmony_ci WARN_ON_ONCE(!list_empty(&iocg->surplus_list)); 230462306a36Sopenharmony_ci if (hw_inuse < hw_active || 230562306a36Sopenharmony_ci (!waitqueue_active(&iocg->waitq) && 230662306a36Sopenharmony_ci time_before64(vtime, now.vnow - ioc->margins.low))) { 230762306a36Sopenharmony_ci u32 hwa, old_hwi, hwm, new_hwi, usage; 230862306a36Sopenharmony_ci u64 usage_dur; 230962306a36Sopenharmony_ci 231062306a36Sopenharmony_ci if (vdone != vtime) { 231162306a36Sopenharmony_ci u64 inflight_us = DIV64_U64_ROUND_UP( 231262306a36Sopenharmony_ci cost_to_abs_cost(vtime - vdone, hw_inuse), 231362306a36Sopenharmony_ci ioc->vtime_base_rate); 231462306a36Sopenharmony_ci 231562306a36Sopenharmony_ci usage_us = max(usage_us, inflight_us); 231662306a36Sopenharmony_ci } 231762306a36Sopenharmony_ci 231862306a36Sopenharmony_ci /* convert to hweight based usage ratio */ 231962306a36Sopenharmony_ci if (time_after64(iocg->activated_at, ioc->period_at)) 232062306a36Sopenharmony_ci usage_dur = max_t(u64, now.now - iocg->activated_at, 1); 232162306a36Sopenharmony_ci else 232262306a36Sopenharmony_ci usage_dur = max_t(u64, now.now - ioc->period_at, 1); 232362306a36Sopenharmony_ci 232462306a36Sopenharmony_ci usage = clamp_t(u32, 232562306a36Sopenharmony_ci DIV64_U64_ROUND_UP(usage_us * WEIGHT_ONE, 232662306a36Sopenharmony_ci usage_dur), 232762306a36Sopenharmony_ci 1, WEIGHT_ONE); 232862306a36Sopenharmony_ci 232962306a36Sopenharmony_ci /* 233062306a36Sopenharmony_ci * Already donating or accumulated enough to start. 233162306a36Sopenharmony_ci * Determine the donation amount. 233262306a36Sopenharmony_ci */ 233362306a36Sopenharmony_ci current_hweight(iocg, &hwa, &old_hwi); 233462306a36Sopenharmony_ci hwm = current_hweight_max(iocg); 233562306a36Sopenharmony_ci new_hwi = hweight_after_donation(iocg, old_hwi, hwm, 233662306a36Sopenharmony_ci usage, &now); 233762306a36Sopenharmony_ci /* 233862306a36Sopenharmony_ci * Donation calculation assumes hweight_after_donation 233962306a36Sopenharmony_ci * to be positive, a condition that a donor w/ hwa < 2 234062306a36Sopenharmony_ci * can't meet. Don't bother with donation if hwa is 234162306a36Sopenharmony_ci * below 2. It's not gonna make a meaningful difference 234262306a36Sopenharmony_ci * anyway. 234362306a36Sopenharmony_ci */ 234462306a36Sopenharmony_ci if (new_hwi < hwm && hwa >= 2) { 234562306a36Sopenharmony_ci iocg->hweight_donating = hwa; 234662306a36Sopenharmony_ci iocg->hweight_after_donation = new_hwi; 234762306a36Sopenharmony_ci list_add(&iocg->surplus_list, &surpluses); 234862306a36Sopenharmony_ci } else if (!iocg->abs_vdebt) { 234962306a36Sopenharmony_ci /* 235062306a36Sopenharmony_ci * @iocg doesn't have enough to donate. Reset 235162306a36Sopenharmony_ci * its inuse to active. 235262306a36Sopenharmony_ci * 235362306a36Sopenharmony_ci * Don't reset debtors as their inuse's are 235462306a36Sopenharmony_ci * owned by debt handling. This shouldn't affect 235562306a36Sopenharmony_ci * donation calculuation in any meaningful way 235662306a36Sopenharmony_ci * as @iocg doesn't have a meaningful amount of 235762306a36Sopenharmony_ci * share anyway. 235862306a36Sopenharmony_ci */ 235962306a36Sopenharmony_ci TRACE_IOCG_PATH(inuse_shortage, iocg, &now, 236062306a36Sopenharmony_ci iocg->inuse, iocg->active, 236162306a36Sopenharmony_ci iocg->hweight_inuse, new_hwi); 236262306a36Sopenharmony_ci 236362306a36Sopenharmony_ci __propagate_weights(iocg, iocg->active, 236462306a36Sopenharmony_ci iocg->active, true, &now); 236562306a36Sopenharmony_ci nr_shortages++; 236662306a36Sopenharmony_ci } 236762306a36Sopenharmony_ci } else { 236862306a36Sopenharmony_ci /* genuinely short on vtime */ 236962306a36Sopenharmony_ci nr_shortages++; 237062306a36Sopenharmony_ci } 237162306a36Sopenharmony_ci } 237262306a36Sopenharmony_ci 237362306a36Sopenharmony_ci if (!list_empty(&surpluses) && nr_shortages) 237462306a36Sopenharmony_ci transfer_surpluses(&surpluses, &now); 237562306a36Sopenharmony_ci 237662306a36Sopenharmony_ci commit_weights(ioc); 237762306a36Sopenharmony_ci 237862306a36Sopenharmony_ci /* surplus list should be dissolved after use */ 237962306a36Sopenharmony_ci list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list) 238062306a36Sopenharmony_ci list_del_init(&iocg->surplus_list); 238162306a36Sopenharmony_ci 238262306a36Sopenharmony_ci /* 238362306a36Sopenharmony_ci * If q is getting clogged or we're missing too much, we're issuing 238462306a36Sopenharmony_ci * too much IO and should lower vtime rate. If we're not missing 238562306a36Sopenharmony_ci * and experiencing shortages but not surpluses, we're too stingy 238662306a36Sopenharmony_ci * and should increase vtime rate. 238762306a36Sopenharmony_ci */ 238862306a36Sopenharmony_ci prev_busy_level = ioc->busy_level; 238962306a36Sopenharmony_ci if (rq_wait_pct > RQ_WAIT_BUSY_PCT || 239062306a36Sopenharmony_ci missed_ppm[READ] > ppm_rthr || 239162306a36Sopenharmony_ci missed_ppm[WRITE] > ppm_wthr) { 239262306a36Sopenharmony_ci /* clearly missing QoS targets, slow down vrate */ 239362306a36Sopenharmony_ci ioc->busy_level = max(ioc->busy_level, 0); 239462306a36Sopenharmony_ci ioc->busy_level++; 239562306a36Sopenharmony_ci } else if (rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 && 239662306a36Sopenharmony_ci missed_ppm[READ] <= ppm_rthr * UNBUSY_THR_PCT / 100 && 239762306a36Sopenharmony_ci missed_ppm[WRITE] <= ppm_wthr * UNBUSY_THR_PCT / 100) { 239862306a36Sopenharmony_ci /* QoS targets are being met with >25% margin */ 239962306a36Sopenharmony_ci if (nr_shortages) { 240062306a36Sopenharmony_ci /* 240162306a36Sopenharmony_ci * We're throttling while the device has spare 240262306a36Sopenharmony_ci * capacity. If vrate was being slowed down, stop. 240362306a36Sopenharmony_ci */ 240462306a36Sopenharmony_ci ioc->busy_level = min(ioc->busy_level, 0); 240562306a36Sopenharmony_ci 240662306a36Sopenharmony_ci /* 240762306a36Sopenharmony_ci * If there are IOs spanning multiple periods, wait 240862306a36Sopenharmony_ci * them out before pushing the device harder. 240962306a36Sopenharmony_ci */ 241062306a36Sopenharmony_ci if (!nr_lagging) 241162306a36Sopenharmony_ci ioc->busy_level--; 241262306a36Sopenharmony_ci } else { 241362306a36Sopenharmony_ci /* 241462306a36Sopenharmony_ci * Nobody is being throttled and the users aren't 241562306a36Sopenharmony_ci * issuing enough IOs to saturate the device. We 241662306a36Sopenharmony_ci * simply don't know how close the device is to 241762306a36Sopenharmony_ci * saturation. Coast. 241862306a36Sopenharmony_ci */ 241962306a36Sopenharmony_ci ioc->busy_level = 0; 242062306a36Sopenharmony_ci } 242162306a36Sopenharmony_ci } else { 242262306a36Sopenharmony_ci /* inside the hysterisis margin, we're good */ 242362306a36Sopenharmony_ci ioc->busy_level = 0; 242462306a36Sopenharmony_ci } 242562306a36Sopenharmony_ci 242662306a36Sopenharmony_ci ioc->busy_level = clamp(ioc->busy_level, -1000, 1000); 242762306a36Sopenharmony_ci 242862306a36Sopenharmony_ci ioc_adjust_base_vrate(ioc, rq_wait_pct, nr_lagging, nr_shortages, 242962306a36Sopenharmony_ci prev_busy_level, missed_ppm); 243062306a36Sopenharmony_ci 243162306a36Sopenharmony_ci ioc_refresh_params(ioc, false); 243262306a36Sopenharmony_ci 243362306a36Sopenharmony_ci ioc_forgive_debts(ioc, usage_us_sum, nr_debtors, &now); 243462306a36Sopenharmony_ci 243562306a36Sopenharmony_ci /* 243662306a36Sopenharmony_ci * This period is done. Move onto the next one. If nothing's 243762306a36Sopenharmony_ci * going on with the device, stop the timer. 243862306a36Sopenharmony_ci */ 243962306a36Sopenharmony_ci atomic64_inc(&ioc->cur_period); 244062306a36Sopenharmony_ci 244162306a36Sopenharmony_ci if (ioc->running != IOC_STOP) { 244262306a36Sopenharmony_ci if (!list_empty(&ioc->active_iocgs)) { 244362306a36Sopenharmony_ci ioc_start_period(ioc, &now); 244462306a36Sopenharmony_ci } else { 244562306a36Sopenharmony_ci ioc->busy_level = 0; 244662306a36Sopenharmony_ci ioc->vtime_err = 0; 244762306a36Sopenharmony_ci ioc->running = IOC_IDLE; 244862306a36Sopenharmony_ci } 244962306a36Sopenharmony_ci 245062306a36Sopenharmony_ci ioc_refresh_vrate(ioc, &now); 245162306a36Sopenharmony_ci } 245262306a36Sopenharmony_ci 245362306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 245462306a36Sopenharmony_ci} 245562306a36Sopenharmony_ci 245662306a36Sopenharmony_cistatic u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime, 245762306a36Sopenharmony_ci u64 abs_cost, struct ioc_now *now) 245862306a36Sopenharmony_ci{ 245962306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 246062306a36Sopenharmony_ci struct ioc_margins *margins = &ioc->margins; 246162306a36Sopenharmony_ci u32 __maybe_unused old_inuse = iocg->inuse, __maybe_unused old_hwi; 246262306a36Sopenharmony_ci u32 hwi, adj_step; 246362306a36Sopenharmony_ci s64 margin; 246462306a36Sopenharmony_ci u64 cost, new_inuse; 246562306a36Sopenharmony_ci unsigned long flags; 246662306a36Sopenharmony_ci 246762306a36Sopenharmony_ci current_hweight(iocg, NULL, &hwi); 246862306a36Sopenharmony_ci old_hwi = hwi; 246962306a36Sopenharmony_ci cost = abs_cost_to_cost(abs_cost, hwi); 247062306a36Sopenharmony_ci margin = now->vnow - vtime - cost; 247162306a36Sopenharmony_ci 247262306a36Sopenharmony_ci /* debt handling owns inuse for debtors */ 247362306a36Sopenharmony_ci if (iocg->abs_vdebt) 247462306a36Sopenharmony_ci return cost; 247562306a36Sopenharmony_ci 247662306a36Sopenharmony_ci /* 247762306a36Sopenharmony_ci * We only increase inuse during period and do so if the margin has 247862306a36Sopenharmony_ci * deteriorated since the previous adjustment. 247962306a36Sopenharmony_ci */ 248062306a36Sopenharmony_ci if (margin >= iocg->saved_margin || margin >= margins->low || 248162306a36Sopenharmony_ci iocg->inuse == iocg->active) 248262306a36Sopenharmony_ci return cost; 248362306a36Sopenharmony_ci 248462306a36Sopenharmony_ci spin_lock_irqsave(&ioc->lock, flags); 248562306a36Sopenharmony_ci 248662306a36Sopenharmony_ci /* we own inuse only when @iocg is in the normal active state */ 248762306a36Sopenharmony_ci if (iocg->abs_vdebt || list_empty(&iocg->active_list)) { 248862306a36Sopenharmony_ci spin_unlock_irqrestore(&ioc->lock, flags); 248962306a36Sopenharmony_ci return cost; 249062306a36Sopenharmony_ci } 249162306a36Sopenharmony_ci 249262306a36Sopenharmony_ci /* 249362306a36Sopenharmony_ci * Bump up inuse till @abs_cost fits in the existing budget. 249462306a36Sopenharmony_ci * adj_step must be determined after acquiring ioc->lock - we might 249562306a36Sopenharmony_ci * have raced and lost to another thread for activation and could 249662306a36Sopenharmony_ci * be reading 0 iocg->active before ioc->lock which will lead to 249762306a36Sopenharmony_ci * infinite loop. 249862306a36Sopenharmony_ci */ 249962306a36Sopenharmony_ci new_inuse = iocg->inuse; 250062306a36Sopenharmony_ci adj_step = DIV_ROUND_UP(iocg->active * INUSE_ADJ_STEP_PCT, 100); 250162306a36Sopenharmony_ci do { 250262306a36Sopenharmony_ci new_inuse = new_inuse + adj_step; 250362306a36Sopenharmony_ci propagate_weights(iocg, iocg->active, new_inuse, true, now); 250462306a36Sopenharmony_ci current_hweight(iocg, NULL, &hwi); 250562306a36Sopenharmony_ci cost = abs_cost_to_cost(abs_cost, hwi); 250662306a36Sopenharmony_ci } while (time_after64(vtime + cost, now->vnow) && 250762306a36Sopenharmony_ci iocg->inuse != iocg->active); 250862306a36Sopenharmony_ci 250962306a36Sopenharmony_ci spin_unlock_irqrestore(&ioc->lock, flags); 251062306a36Sopenharmony_ci 251162306a36Sopenharmony_ci TRACE_IOCG_PATH(inuse_adjust, iocg, now, 251262306a36Sopenharmony_ci old_inuse, iocg->inuse, old_hwi, hwi); 251362306a36Sopenharmony_ci 251462306a36Sopenharmony_ci return cost; 251562306a36Sopenharmony_ci} 251662306a36Sopenharmony_ci 251762306a36Sopenharmony_cistatic void calc_vtime_cost_builtin(struct bio *bio, struct ioc_gq *iocg, 251862306a36Sopenharmony_ci bool is_merge, u64 *costp) 251962306a36Sopenharmony_ci{ 252062306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 252162306a36Sopenharmony_ci u64 coef_seqio, coef_randio, coef_page; 252262306a36Sopenharmony_ci u64 pages = max_t(u64, bio_sectors(bio) >> IOC_SECT_TO_PAGE_SHIFT, 1); 252362306a36Sopenharmony_ci u64 seek_pages = 0; 252462306a36Sopenharmony_ci u64 cost = 0; 252562306a36Sopenharmony_ci 252662306a36Sopenharmony_ci /* Can't calculate cost for empty bio */ 252762306a36Sopenharmony_ci if (!bio->bi_iter.bi_size) 252862306a36Sopenharmony_ci goto out; 252962306a36Sopenharmony_ci 253062306a36Sopenharmony_ci switch (bio_op(bio)) { 253162306a36Sopenharmony_ci case REQ_OP_READ: 253262306a36Sopenharmony_ci coef_seqio = ioc->params.lcoefs[LCOEF_RSEQIO]; 253362306a36Sopenharmony_ci coef_randio = ioc->params.lcoefs[LCOEF_RRANDIO]; 253462306a36Sopenharmony_ci coef_page = ioc->params.lcoefs[LCOEF_RPAGE]; 253562306a36Sopenharmony_ci break; 253662306a36Sopenharmony_ci case REQ_OP_WRITE: 253762306a36Sopenharmony_ci coef_seqio = ioc->params.lcoefs[LCOEF_WSEQIO]; 253862306a36Sopenharmony_ci coef_randio = ioc->params.lcoefs[LCOEF_WRANDIO]; 253962306a36Sopenharmony_ci coef_page = ioc->params.lcoefs[LCOEF_WPAGE]; 254062306a36Sopenharmony_ci break; 254162306a36Sopenharmony_ci default: 254262306a36Sopenharmony_ci goto out; 254362306a36Sopenharmony_ci } 254462306a36Sopenharmony_ci 254562306a36Sopenharmony_ci if (iocg->cursor) { 254662306a36Sopenharmony_ci seek_pages = abs(bio->bi_iter.bi_sector - iocg->cursor); 254762306a36Sopenharmony_ci seek_pages >>= IOC_SECT_TO_PAGE_SHIFT; 254862306a36Sopenharmony_ci } 254962306a36Sopenharmony_ci 255062306a36Sopenharmony_ci if (!is_merge) { 255162306a36Sopenharmony_ci if (seek_pages > LCOEF_RANDIO_PAGES) { 255262306a36Sopenharmony_ci cost += coef_randio; 255362306a36Sopenharmony_ci } else { 255462306a36Sopenharmony_ci cost += coef_seqio; 255562306a36Sopenharmony_ci } 255662306a36Sopenharmony_ci } 255762306a36Sopenharmony_ci cost += pages * coef_page; 255862306a36Sopenharmony_ciout: 255962306a36Sopenharmony_ci *costp = cost; 256062306a36Sopenharmony_ci} 256162306a36Sopenharmony_ci 256262306a36Sopenharmony_cistatic u64 calc_vtime_cost(struct bio *bio, struct ioc_gq *iocg, bool is_merge) 256362306a36Sopenharmony_ci{ 256462306a36Sopenharmony_ci u64 cost; 256562306a36Sopenharmony_ci 256662306a36Sopenharmony_ci calc_vtime_cost_builtin(bio, iocg, is_merge, &cost); 256762306a36Sopenharmony_ci return cost; 256862306a36Sopenharmony_ci} 256962306a36Sopenharmony_ci 257062306a36Sopenharmony_cistatic void calc_size_vtime_cost_builtin(struct request *rq, struct ioc *ioc, 257162306a36Sopenharmony_ci u64 *costp) 257262306a36Sopenharmony_ci{ 257362306a36Sopenharmony_ci unsigned int pages = blk_rq_stats_sectors(rq) >> IOC_SECT_TO_PAGE_SHIFT; 257462306a36Sopenharmony_ci 257562306a36Sopenharmony_ci switch (req_op(rq)) { 257662306a36Sopenharmony_ci case REQ_OP_READ: 257762306a36Sopenharmony_ci *costp = pages * ioc->params.lcoefs[LCOEF_RPAGE]; 257862306a36Sopenharmony_ci break; 257962306a36Sopenharmony_ci case REQ_OP_WRITE: 258062306a36Sopenharmony_ci *costp = pages * ioc->params.lcoefs[LCOEF_WPAGE]; 258162306a36Sopenharmony_ci break; 258262306a36Sopenharmony_ci default: 258362306a36Sopenharmony_ci *costp = 0; 258462306a36Sopenharmony_ci } 258562306a36Sopenharmony_ci} 258662306a36Sopenharmony_ci 258762306a36Sopenharmony_cistatic u64 calc_size_vtime_cost(struct request *rq, struct ioc *ioc) 258862306a36Sopenharmony_ci{ 258962306a36Sopenharmony_ci u64 cost; 259062306a36Sopenharmony_ci 259162306a36Sopenharmony_ci calc_size_vtime_cost_builtin(rq, ioc, &cost); 259262306a36Sopenharmony_ci return cost; 259362306a36Sopenharmony_ci} 259462306a36Sopenharmony_ci 259562306a36Sopenharmony_cistatic void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) 259662306a36Sopenharmony_ci{ 259762306a36Sopenharmony_ci struct blkcg_gq *blkg = bio->bi_blkg; 259862306a36Sopenharmony_ci struct ioc *ioc = rqos_to_ioc(rqos); 259962306a36Sopenharmony_ci struct ioc_gq *iocg = blkg_to_iocg(blkg); 260062306a36Sopenharmony_ci struct ioc_now now; 260162306a36Sopenharmony_ci struct iocg_wait wait; 260262306a36Sopenharmony_ci u64 abs_cost, cost, vtime; 260362306a36Sopenharmony_ci bool use_debt, ioc_locked; 260462306a36Sopenharmony_ci unsigned long flags; 260562306a36Sopenharmony_ci 260662306a36Sopenharmony_ci /* bypass IOs if disabled, still initializing, or for root cgroup */ 260762306a36Sopenharmony_ci if (!ioc->enabled || !iocg || !iocg->level) 260862306a36Sopenharmony_ci return; 260962306a36Sopenharmony_ci 261062306a36Sopenharmony_ci /* calculate the absolute vtime cost */ 261162306a36Sopenharmony_ci abs_cost = calc_vtime_cost(bio, iocg, false); 261262306a36Sopenharmony_ci if (!abs_cost) 261362306a36Sopenharmony_ci return; 261462306a36Sopenharmony_ci 261562306a36Sopenharmony_ci if (!iocg_activate(iocg, &now)) 261662306a36Sopenharmony_ci return; 261762306a36Sopenharmony_ci 261862306a36Sopenharmony_ci iocg->cursor = bio_end_sector(bio); 261962306a36Sopenharmony_ci vtime = atomic64_read(&iocg->vtime); 262062306a36Sopenharmony_ci cost = adjust_inuse_and_calc_cost(iocg, vtime, abs_cost, &now); 262162306a36Sopenharmony_ci 262262306a36Sopenharmony_ci /* 262362306a36Sopenharmony_ci * If no one's waiting and within budget, issue right away. The 262462306a36Sopenharmony_ci * tests are racy but the races aren't systemic - we only miss once 262562306a36Sopenharmony_ci * in a while which is fine. 262662306a36Sopenharmony_ci */ 262762306a36Sopenharmony_ci if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt && 262862306a36Sopenharmony_ci time_before_eq64(vtime + cost, now.vnow)) { 262962306a36Sopenharmony_ci iocg_commit_bio(iocg, bio, abs_cost, cost); 263062306a36Sopenharmony_ci return; 263162306a36Sopenharmony_ci } 263262306a36Sopenharmony_ci 263362306a36Sopenharmony_ci /* 263462306a36Sopenharmony_ci * We're over budget. This can be handled in two ways. IOs which may 263562306a36Sopenharmony_ci * cause priority inversions are punted to @ioc->aux_iocg and charged as 263662306a36Sopenharmony_ci * debt. Otherwise, the issuer is blocked on @iocg->waitq. Debt handling 263762306a36Sopenharmony_ci * requires @ioc->lock, waitq handling @iocg->waitq.lock. Determine 263862306a36Sopenharmony_ci * whether debt handling is needed and acquire locks accordingly. 263962306a36Sopenharmony_ci */ 264062306a36Sopenharmony_ci use_debt = bio_issue_as_root_blkg(bio) || fatal_signal_pending(current); 264162306a36Sopenharmony_ci ioc_locked = use_debt || READ_ONCE(iocg->abs_vdebt); 264262306a36Sopenharmony_ciretry_lock: 264362306a36Sopenharmony_ci iocg_lock(iocg, ioc_locked, &flags); 264462306a36Sopenharmony_ci 264562306a36Sopenharmony_ci /* 264662306a36Sopenharmony_ci * @iocg must stay activated for debt and waitq handling. Deactivation 264762306a36Sopenharmony_ci * is synchronized against both ioc->lock and waitq.lock and we won't 264862306a36Sopenharmony_ci * get deactivated as long as we're waiting or has debt, so we're good 264962306a36Sopenharmony_ci * if we're activated here. In the unlikely cases that we aren't, just 265062306a36Sopenharmony_ci * issue the IO. 265162306a36Sopenharmony_ci */ 265262306a36Sopenharmony_ci if (unlikely(list_empty(&iocg->active_list))) { 265362306a36Sopenharmony_ci iocg_unlock(iocg, ioc_locked, &flags); 265462306a36Sopenharmony_ci iocg_commit_bio(iocg, bio, abs_cost, cost); 265562306a36Sopenharmony_ci return; 265662306a36Sopenharmony_ci } 265762306a36Sopenharmony_ci 265862306a36Sopenharmony_ci /* 265962306a36Sopenharmony_ci * We're over budget. If @bio has to be issued regardless, remember 266062306a36Sopenharmony_ci * the abs_cost instead of advancing vtime. iocg_kick_waitq() will pay 266162306a36Sopenharmony_ci * off the debt before waking more IOs. 266262306a36Sopenharmony_ci * 266362306a36Sopenharmony_ci * This way, the debt is continuously paid off each period with the 266462306a36Sopenharmony_ci * actual budget available to the cgroup. If we just wound vtime, we 266562306a36Sopenharmony_ci * would incorrectly use the current hw_inuse for the entire amount 266662306a36Sopenharmony_ci * which, for example, can lead to the cgroup staying blocked for a 266762306a36Sopenharmony_ci * long time even with substantially raised hw_inuse. 266862306a36Sopenharmony_ci * 266962306a36Sopenharmony_ci * An iocg with vdebt should stay online so that the timer can keep 267062306a36Sopenharmony_ci * deducting its vdebt and [de]activate use_delay mechanism 267162306a36Sopenharmony_ci * accordingly. We don't want to race against the timer trying to 267262306a36Sopenharmony_ci * clear them and leave @iocg inactive w/ dangling use_delay heavily 267362306a36Sopenharmony_ci * penalizing the cgroup and its descendants. 267462306a36Sopenharmony_ci */ 267562306a36Sopenharmony_ci if (use_debt) { 267662306a36Sopenharmony_ci iocg_incur_debt(iocg, abs_cost, &now); 267762306a36Sopenharmony_ci if (iocg_kick_delay(iocg, &now)) 267862306a36Sopenharmony_ci blkcg_schedule_throttle(rqos->disk, 267962306a36Sopenharmony_ci (bio->bi_opf & REQ_SWAP) == REQ_SWAP); 268062306a36Sopenharmony_ci iocg_unlock(iocg, ioc_locked, &flags); 268162306a36Sopenharmony_ci return; 268262306a36Sopenharmony_ci } 268362306a36Sopenharmony_ci 268462306a36Sopenharmony_ci /* guarantee that iocgs w/ waiters have maximum inuse */ 268562306a36Sopenharmony_ci if (!iocg->abs_vdebt && iocg->inuse != iocg->active) { 268662306a36Sopenharmony_ci if (!ioc_locked) { 268762306a36Sopenharmony_ci iocg_unlock(iocg, false, &flags); 268862306a36Sopenharmony_ci ioc_locked = true; 268962306a36Sopenharmony_ci goto retry_lock; 269062306a36Sopenharmony_ci } 269162306a36Sopenharmony_ci propagate_weights(iocg, iocg->active, iocg->active, true, 269262306a36Sopenharmony_ci &now); 269362306a36Sopenharmony_ci } 269462306a36Sopenharmony_ci 269562306a36Sopenharmony_ci /* 269662306a36Sopenharmony_ci * Append self to the waitq and schedule the wakeup timer if we're 269762306a36Sopenharmony_ci * the first waiter. The timer duration is calculated based on the 269862306a36Sopenharmony_ci * current vrate. vtime and hweight changes can make it too short 269962306a36Sopenharmony_ci * or too long. Each wait entry records the absolute cost it's 270062306a36Sopenharmony_ci * waiting for to allow re-evaluation using a custom wait entry. 270162306a36Sopenharmony_ci * 270262306a36Sopenharmony_ci * If too short, the timer simply reschedules itself. If too long, 270362306a36Sopenharmony_ci * the period timer will notice and trigger wakeups. 270462306a36Sopenharmony_ci * 270562306a36Sopenharmony_ci * All waiters are on iocg->waitq and the wait states are 270662306a36Sopenharmony_ci * synchronized using waitq.lock. 270762306a36Sopenharmony_ci */ 270862306a36Sopenharmony_ci init_waitqueue_func_entry(&wait.wait, iocg_wake_fn); 270962306a36Sopenharmony_ci wait.wait.private = current; 271062306a36Sopenharmony_ci wait.bio = bio; 271162306a36Sopenharmony_ci wait.abs_cost = abs_cost; 271262306a36Sopenharmony_ci wait.committed = false; /* will be set true by waker */ 271362306a36Sopenharmony_ci 271462306a36Sopenharmony_ci __add_wait_queue_entry_tail(&iocg->waitq, &wait.wait); 271562306a36Sopenharmony_ci iocg_kick_waitq(iocg, ioc_locked, &now); 271662306a36Sopenharmony_ci 271762306a36Sopenharmony_ci iocg_unlock(iocg, ioc_locked, &flags); 271862306a36Sopenharmony_ci 271962306a36Sopenharmony_ci while (true) { 272062306a36Sopenharmony_ci set_current_state(TASK_UNINTERRUPTIBLE); 272162306a36Sopenharmony_ci if (wait.committed) 272262306a36Sopenharmony_ci break; 272362306a36Sopenharmony_ci io_schedule(); 272462306a36Sopenharmony_ci } 272562306a36Sopenharmony_ci 272662306a36Sopenharmony_ci /* waker already committed us, proceed */ 272762306a36Sopenharmony_ci finish_wait(&iocg->waitq, &wait.wait); 272862306a36Sopenharmony_ci} 272962306a36Sopenharmony_ci 273062306a36Sopenharmony_cistatic void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, 273162306a36Sopenharmony_ci struct bio *bio) 273262306a36Sopenharmony_ci{ 273362306a36Sopenharmony_ci struct ioc_gq *iocg = blkg_to_iocg(bio->bi_blkg); 273462306a36Sopenharmony_ci struct ioc *ioc = rqos_to_ioc(rqos); 273562306a36Sopenharmony_ci sector_t bio_end = bio_end_sector(bio); 273662306a36Sopenharmony_ci struct ioc_now now; 273762306a36Sopenharmony_ci u64 vtime, abs_cost, cost; 273862306a36Sopenharmony_ci unsigned long flags; 273962306a36Sopenharmony_ci 274062306a36Sopenharmony_ci /* bypass if disabled, still initializing, or for root cgroup */ 274162306a36Sopenharmony_ci if (!ioc->enabled || !iocg || !iocg->level) 274262306a36Sopenharmony_ci return; 274362306a36Sopenharmony_ci 274462306a36Sopenharmony_ci abs_cost = calc_vtime_cost(bio, iocg, true); 274562306a36Sopenharmony_ci if (!abs_cost) 274662306a36Sopenharmony_ci return; 274762306a36Sopenharmony_ci 274862306a36Sopenharmony_ci ioc_now(ioc, &now); 274962306a36Sopenharmony_ci 275062306a36Sopenharmony_ci vtime = atomic64_read(&iocg->vtime); 275162306a36Sopenharmony_ci cost = adjust_inuse_and_calc_cost(iocg, vtime, abs_cost, &now); 275262306a36Sopenharmony_ci 275362306a36Sopenharmony_ci /* update cursor if backmerging into the request at the cursor */ 275462306a36Sopenharmony_ci if (blk_rq_pos(rq) < bio_end && 275562306a36Sopenharmony_ci blk_rq_pos(rq) + blk_rq_sectors(rq) == iocg->cursor) 275662306a36Sopenharmony_ci iocg->cursor = bio_end; 275762306a36Sopenharmony_ci 275862306a36Sopenharmony_ci /* 275962306a36Sopenharmony_ci * Charge if there's enough vtime budget and the existing request has 276062306a36Sopenharmony_ci * cost assigned. 276162306a36Sopenharmony_ci */ 276262306a36Sopenharmony_ci if (rq->bio && rq->bio->bi_iocost_cost && 276362306a36Sopenharmony_ci time_before_eq64(atomic64_read(&iocg->vtime) + cost, now.vnow)) { 276462306a36Sopenharmony_ci iocg_commit_bio(iocg, bio, abs_cost, cost); 276562306a36Sopenharmony_ci return; 276662306a36Sopenharmony_ci } 276762306a36Sopenharmony_ci 276862306a36Sopenharmony_ci /* 276962306a36Sopenharmony_ci * Otherwise, account it as debt if @iocg is online, which it should 277062306a36Sopenharmony_ci * be for the vast majority of cases. See debt handling in 277162306a36Sopenharmony_ci * ioc_rqos_throttle() for details. 277262306a36Sopenharmony_ci */ 277362306a36Sopenharmony_ci spin_lock_irqsave(&ioc->lock, flags); 277462306a36Sopenharmony_ci spin_lock(&iocg->waitq.lock); 277562306a36Sopenharmony_ci 277662306a36Sopenharmony_ci if (likely(!list_empty(&iocg->active_list))) { 277762306a36Sopenharmony_ci iocg_incur_debt(iocg, abs_cost, &now); 277862306a36Sopenharmony_ci if (iocg_kick_delay(iocg, &now)) 277962306a36Sopenharmony_ci blkcg_schedule_throttle(rqos->disk, 278062306a36Sopenharmony_ci (bio->bi_opf & REQ_SWAP) == REQ_SWAP); 278162306a36Sopenharmony_ci } else { 278262306a36Sopenharmony_ci iocg_commit_bio(iocg, bio, abs_cost, cost); 278362306a36Sopenharmony_ci } 278462306a36Sopenharmony_ci 278562306a36Sopenharmony_ci spin_unlock(&iocg->waitq.lock); 278662306a36Sopenharmony_ci spin_unlock_irqrestore(&ioc->lock, flags); 278762306a36Sopenharmony_ci} 278862306a36Sopenharmony_ci 278962306a36Sopenharmony_cistatic void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio) 279062306a36Sopenharmony_ci{ 279162306a36Sopenharmony_ci struct ioc_gq *iocg = blkg_to_iocg(bio->bi_blkg); 279262306a36Sopenharmony_ci 279362306a36Sopenharmony_ci if (iocg && bio->bi_iocost_cost) 279462306a36Sopenharmony_ci atomic64_add(bio->bi_iocost_cost, &iocg->done_vtime); 279562306a36Sopenharmony_ci} 279662306a36Sopenharmony_ci 279762306a36Sopenharmony_cistatic void ioc_rqos_done(struct rq_qos *rqos, struct request *rq) 279862306a36Sopenharmony_ci{ 279962306a36Sopenharmony_ci struct ioc *ioc = rqos_to_ioc(rqos); 280062306a36Sopenharmony_ci struct ioc_pcpu_stat *ccs; 280162306a36Sopenharmony_ci u64 on_q_ns, rq_wait_ns, size_nsec; 280262306a36Sopenharmony_ci int pidx, rw; 280362306a36Sopenharmony_ci 280462306a36Sopenharmony_ci if (!ioc->enabled || !rq->alloc_time_ns || !rq->start_time_ns) 280562306a36Sopenharmony_ci return; 280662306a36Sopenharmony_ci 280762306a36Sopenharmony_ci switch (req_op(rq)) { 280862306a36Sopenharmony_ci case REQ_OP_READ: 280962306a36Sopenharmony_ci pidx = QOS_RLAT; 281062306a36Sopenharmony_ci rw = READ; 281162306a36Sopenharmony_ci break; 281262306a36Sopenharmony_ci case REQ_OP_WRITE: 281362306a36Sopenharmony_ci pidx = QOS_WLAT; 281462306a36Sopenharmony_ci rw = WRITE; 281562306a36Sopenharmony_ci break; 281662306a36Sopenharmony_ci default: 281762306a36Sopenharmony_ci return; 281862306a36Sopenharmony_ci } 281962306a36Sopenharmony_ci 282062306a36Sopenharmony_ci on_q_ns = ktime_get_ns() - rq->alloc_time_ns; 282162306a36Sopenharmony_ci rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns; 282262306a36Sopenharmony_ci size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC); 282362306a36Sopenharmony_ci 282462306a36Sopenharmony_ci ccs = get_cpu_ptr(ioc->pcpu_stat); 282562306a36Sopenharmony_ci 282662306a36Sopenharmony_ci if (on_q_ns <= size_nsec || 282762306a36Sopenharmony_ci on_q_ns - size_nsec <= ioc->params.qos[pidx] * NSEC_PER_USEC) 282862306a36Sopenharmony_ci local_inc(&ccs->missed[rw].nr_met); 282962306a36Sopenharmony_ci else 283062306a36Sopenharmony_ci local_inc(&ccs->missed[rw].nr_missed); 283162306a36Sopenharmony_ci 283262306a36Sopenharmony_ci local64_add(rq_wait_ns, &ccs->rq_wait_ns); 283362306a36Sopenharmony_ci 283462306a36Sopenharmony_ci put_cpu_ptr(ccs); 283562306a36Sopenharmony_ci} 283662306a36Sopenharmony_ci 283762306a36Sopenharmony_cistatic void ioc_rqos_queue_depth_changed(struct rq_qos *rqos) 283862306a36Sopenharmony_ci{ 283962306a36Sopenharmony_ci struct ioc *ioc = rqos_to_ioc(rqos); 284062306a36Sopenharmony_ci 284162306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 284262306a36Sopenharmony_ci ioc_refresh_params(ioc, false); 284362306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 284462306a36Sopenharmony_ci} 284562306a36Sopenharmony_ci 284662306a36Sopenharmony_cistatic void ioc_rqos_exit(struct rq_qos *rqos) 284762306a36Sopenharmony_ci{ 284862306a36Sopenharmony_ci struct ioc *ioc = rqos_to_ioc(rqos); 284962306a36Sopenharmony_ci 285062306a36Sopenharmony_ci blkcg_deactivate_policy(rqos->disk, &blkcg_policy_iocost); 285162306a36Sopenharmony_ci 285262306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 285362306a36Sopenharmony_ci ioc->running = IOC_STOP; 285462306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 285562306a36Sopenharmony_ci 285662306a36Sopenharmony_ci timer_shutdown_sync(&ioc->timer); 285762306a36Sopenharmony_ci free_percpu(ioc->pcpu_stat); 285862306a36Sopenharmony_ci kfree(ioc); 285962306a36Sopenharmony_ci} 286062306a36Sopenharmony_ci 286162306a36Sopenharmony_cistatic const struct rq_qos_ops ioc_rqos_ops = { 286262306a36Sopenharmony_ci .throttle = ioc_rqos_throttle, 286362306a36Sopenharmony_ci .merge = ioc_rqos_merge, 286462306a36Sopenharmony_ci .done_bio = ioc_rqos_done_bio, 286562306a36Sopenharmony_ci .done = ioc_rqos_done, 286662306a36Sopenharmony_ci .queue_depth_changed = ioc_rqos_queue_depth_changed, 286762306a36Sopenharmony_ci .exit = ioc_rqos_exit, 286862306a36Sopenharmony_ci}; 286962306a36Sopenharmony_ci 287062306a36Sopenharmony_cistatic int blk_iocost_init(struct gendisk *disk) 287162306a36Sopenharmony_ci{ 287262306a36Sopenharmony_ci struct ioc *ioc; 287362306a36Sopenharmony_ci int i, cpu, ret; 287462306a36Sopenharmony_ci 287562306a36Sopenharmony_ci ioc = kzalloc(sizeof(*ioc), GFP_KERNEL); 287662306a36Sopenharmony_ci if (!ioc) 287762306a36Sopenharmony_ci return -ENOMEM; 287862306a36Sopenharmony_ci 287962306a36Sopenharmony_ci ioc->pcpu_stat = alloc_percpu(struct ioc_pcpu_stat); 288062306a36Sopenharmony_ci if (!ioc->pcpu_stat) { 288162306a36Sopenharmony_ci kfree(ioc); 288262306a36Sopenharmony_ci return -ENOMEM; 288362306a36Sopenharmony_ci } 288462306a36Sopenharmony_ci 288562306a36Sopenharmony_ci for_each_possible_cpu(cpu) { 288662306a36Sopenharmony_ci struct ioc_pcpu_stat *ccs = per_cpu_ptr(ioc->pcpu_stat, cpu); 288762306a36Sopenharmony_ci 288862306a36Sopenharmony_ci for (i = 0; i < ARRAY_SIZE(ccs->missed); i++) { 288962306a36Sopenharmony_ci local_set(&ccs->missed[i].nr_met, 0); 289062306a36Sopenharmony_ci local_set(&ccs->missed[i].nr_missed, 0); 289162306a36Sopenharmony_ci } 289262306a36Sopenharmony_ci local64_set(&ccs->rq_wait_ns, 0); 289362306a36Sopenharmony_ci } 289462306a36Sopenharmony_ci 289562306a36Sopenharmony_ci spin_lock_init(&ioc->lock); 289662306a36Sopenharmony_ci timer_setup(&ioc->timer, ioc_timer_fn, 0); 289762306a36Sopenharmony_ci INIT_LIST_HEAD(&ioc->active_iocgs); 289862306a36Sopenharmony_ci 289962306a36Sopenharmony_ci ioc->running = IOC_IDLE; 290062306a36Sopenharmony_ci ioc->vtime_base_rate = VTIME_PER_USEC; 290162306a36Sopenharmony_ci atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC); 290262306a36Sopenharmony_ci seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock); 290362306a36Sopenharmony_ci ioc->period_at = ktime_to_us(ktime_get()); 290462306a36Sopenharmony_ci atomic64_set(&ioc->cur_period, 0); 290562306a36Sopenharmony_ci atomic_set(&ioc->hweight_gen, 0); 290662306a36Sopenharmony_ci 290762306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 290862306a36Sopenharmony_ci ioc->autop_idx = AUTOP_INVALID; 290962306a36Sopenharmony_ci ioc_refresh_params_disk(ioc, true, disk); 291062306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 291162306a36Sopenharmony_ci 291262306a36Sopenharmony_ci /* 291362306a36Sopenharmony_ci * rqos must be added before activation to allow ioc_pd_init() to 291462306a36Sopenharmony_ci * lookup the ioc from q. This means that the rqos methods may get 291562306a36Sopenharmony_ci * called before policy activation completion, can't assume that the 291662306a36Sopenharmony_ci * target bio has an iocg associated and need to test for NULL iocg. 291762306a36Sopenharmony_ci */ 291862306a36Sopenharmony_ci ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops); 291962306a36Sopenharmony_ci if (ret) 292062306a36Sopenharmony_ci goto err_free_ioc; 292162306a36Sopenharmony_ci 292262306a36Sopenharmony_ci ret = blkcg_activate_policy(disk, &blkcg_policy_iocost); 292362306a36Sopenharmony_ci if (ret) 292462306a36Sopenharmony_ci goto err_del_qos; 292562306a36Sopenharmony_ci return 0; 292662306a36Sopenharmony_ci 292762306a36Sopenharmony_cierr_del_qos: 292862306a36Sopenharmony_ci rq_qos_del(&ioc->rqos); 292962306a36Sopenharmony_cierr_free_ioc: 293062306a36Sopenharmony_ci free_percpu(ioc->pcpu_stat); 293162306a36Sopenharmony_ci kfree(ioc); 293262306a36Sopenharmony_ci return ret; 293362306a36Sopenharmony_ci} 293462306a36Sopenharmony_ci 293562306a36Sopenharmony_cistatic struct blkcg_policy_data *ioc_cpd_alloc(gfp_t gfp) 293662306a36Sopenharmony_ci{ 293762306a36Sopenharmony_ci struct ioc_cgrp *iocc; 293862306a36Sopenharmony_ci 293962306a36Sopenharmony_ci iocc = kzalloc(sizeof(struct ioc_cgrp), gfp); 294062306a36Sopenharmony_ci if (!iocc) 294162306a36Sopenharmony_ci return NULL; 294262306a36Sopenharmony_ci 294362306a36Sopenharmony_ci iocc->dfl_weight = CGROUP_WEIGHT_DFL * WEIGHT_ONE; 294462306a36Sopenharmony_ci return &iocc->cpd; 294562306a36Sopenharmony_ci} 294662306a36Sopenharmony_ci 294762306a36Sopenharmony_cistatic void ioc_cpd_free(struct blkcg_policy_data *cpd) 294862306a36Sopenharmony_ci{ 294962306a36Sopenharmony_ci kfree(container_of(cpd, struct ioc_cgrp, cpd)); 295062306a36Sopenharmony_ci} 295162306a36Sopenharmony_ci 295262306a36Sopenharmony_cistatic struct blkg_policy_data *ioc_pd_alloc(struct gendisk *disk, 295362306a36Sopenharmony_ci struct blkcg *blkcg, gfp_t gfp) 295462306a36Sopenharmony_ci{ 295562306a36Sopenharmony_ci int levels = blkcg->css.cgroup->level + 1; 295662306a36Sopenharmony_ci struct ioc_gq *iocg; 295762306a36Sopenharmony_ci 295862306a36Sopenharmony_ci iocg = kzalloc_node(struct_size(iocg, ancestors, levels), gfp, 295962306a36Sopenharmony_ci disk->node_id); 296062306a36Sopenharmony_ci if (!iocg) 296162306a36Sopenharmony_ci return NULL; 296262306a36Sopenharmony_ci 296362306a36Sopenharmony_ci iocg->pcpu_stat = alloc_percpu_gfp(struct iocg_pcpu_stat, gfp); 296462306a36Sopenharmony_ci if (!iocg->pcpu_stat) { 296562306a36Sopenharmony_ci kfree(iocg); 296662306a36Sopenharmony_ci return NULL; 296762306a36Sopenharmony_ci } 296862306a36Sopenharmony_ci 296962306a36Sopenharmony_ci return &iocg->pd; 297062306a36Sopenharmony_ci} 297162306a36Sopenharmony_ci 297262306a36Sopenharmony_cistatic void ioc_pd_init(struct blkg_policy_data *pd) 297362306a36Sopenharmony_ci{ 297462306a36Sopenharmony_ci struct ioc_gq *iocg = pd_to_iocg(pd); 297562306a36Sopenharmony_ci struct blkcg_gq *blkg = pd_to_blkg(&iocg->pd); 297662306a36Sopenharmony_ci struct ioc *ioc = q_to_ioc(blkg->q); 297762306a36Sopenharmony_ci struct ioc_now now; 297862306a36Sopenharmony_ci struct blkcg_gq *tblkg; 297962306a36Sopenharmony_ci unsigned long flags; 298062306a36Sopenharmony_ci 298162306a36Sopenharmony_ci ioc_now(ioc, &now); 298262306a36Sopenharmony_ci 298362306a36Sopenharmony_ci iocg->ioc = ioc; 298462306a36Sopenharmony_ci atomic64_set(&iocg->vtime, now.vnow); 298562306a36Sopenharmony_ci atomic64_set(&iocg->done_vtime, now.vnow); 298662306a36Sopenharmony_ci atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period)); 298762306a36Sopenharmony_ci INIT_LIST_HEAD(&iocg->active_list); 298862306a36Sopenharmony_ci INIT_LIST_HEAD(&iocg->walk_list); 298962306a36Sopenharmony_ci INIT_LIST_HEAD(&iocg->surplus_list); 299062306a36Sopenharmony_ci iocg->hweight_active = WEIGHT_ONE; 299162306a36Sopenharmony_ci iocg->hweight_inuse = WEIGHT_ONE; 299262306a36Sopenharmony_ci 299362306a36Sopenharmony_ci init_waitqueue_head(&iocg->waitq); 299462306a36Sopenharmony_ci hrtimer_init(&iocg->waitq_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 299562306a36Sopenharmony_ci iocg->waitq_timer.function = iocg_waitq_timer_fn; 299662306a36Sopenharmony_ci 299762306a36Sopenharmony_ci iocg->level = blkg->blkcg->css.cgroup->level; 299862306a36Sopenharmony_ci 299962306a36Sopenharmony_ci for (tblkg = blkg; tblkg; tblkg = tblkg->parent) { 300062306a36Sopenharmony_ci struct ioc_gq *tiocg = blkg_to_iocg(tblkg); 300162306a36Sopenharmony_ci iocg->ancestors[tiocg->level] = tiocg; 300262306a36Sopenharmony_ci } 300362306a36Sopenharmony_ci 300462306a36Sopenharmony_ci spin_lock_irqsave(&ioc->lock, flags); 300562306a36Sopenharmony_ci weight_updated(iocg, &now); 300662306a36Sopenharmony_ci spin_unlock_irqrestore(&ioc->lock, flags); 300762306a36Sopenharmony_ci} 300862306a36Sopenharmony_ci 300962306a36Sopenharmony_cistatic void ioc_pd_free(struct blkg_policy_data *pd) 301062306a36Sopenharmony_ci{ 301162306a36Sopenharmony_ci struct ioc_gq *iocg = pd_to_iocg(pd); 301262306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 301362306a36Sopenharmony_ci unsigned long flags; 301462306a36Sopenharmony_ci 301562306a36Sopenharmony_ci if (ioc) { 301662306a36Sopenharmony_ci spin_lock_irqsave(&ioc->lock, flags); 301762306a36Sopenharmony_ci 301862306a36Sopenharmony_ci if (!list_empty(&iocg->active_list)) { 301962306a36Sopenharmony_ci struct ioc_now now; 302062306a36Sopenharmony_ci 302162306a36Sopenharmony_ci ioc_now(ioc, &now); 302262306a36Sopenharmony_ci propagate_weights(iocg, 0, 0, false, &now); 302362306a36Sopenharmony_ci list_del_init(&iocg->active_list); 302462306a36Sopenharmony_ci } 302562306a36Sopenharmony_ci 302662306a36Sopenharmony_ci WARN_ON_ONCE(!list_empty(&iocg->walk_list)); 302762306a36Sopenharmony_ci WARN_ON_ONCE(!list_empty(&iocg->surplus_list)); 302862306a36Sopenharmony_ci 302962306a36Sopenharmony_ci spin_unlock_irqrestore(&ioc->lock, flags); 303062306a36Sopenharmony_ci 303162306a36Sopenharmony_ci hrtimer_cancel(&iocg->waitq_timer); 303262306a36Sopenharmony_ci } 303362306a36Sopenharmony_ci free_percpu(iocg->pcpu_stat); 303462306a36Sopenharmony_ci kfree(iocg); 303562306a36Sopenharmony_ci} 303662306a36Sopenharmony_ci 303762306a36Sopenharmony_cistatic void ioc_pd_stat(struct blkg_policy_data *pd, struct seq_file *s) 303862306a36Sopenharmony_ci{ 303962306a36Sopenharmony_ci struct ioc_gq *iocg = pd_to_iocg(pd); 304062306a36Sopenharmony_ci struct ioc *ioc = iocg->ioc; 304162306a36Sopenharmony_ci 304262306a36Sopenharmony_ci if (!ioc->enabled) 304362306a36Sopenharmony_ci return; 304462306a36Sopenharmony_ci 304562306a36Sopenharmony_ci if (iocg->level == 0) { 304662306a36Sopenharmony_ci unsigned vp10k = DIV64_U64_ROUND_CLOSEST( 304762306a36Sopenharmony_ci ioc->vtime_base_rate * 10000, 304862306a36Sopenharmony_ci VTIME_PER_USEC); 304962306a36Sopenharmony_ci seq_printf(s, " cost.vrate=%u.%02u", vp10k / 100, vp10k % 100); 305062306a36Sopenharmony_ci } 305162306a36Sopenharmony_ci 305262306a36Sopenharmony_ci seq_printf(s, " cost.usage=%llu", iocg->last_stat.usage_us); 305362306a36Sopenharmony_ci 305462306a36Sopenharmony_ci if (blkcg_debug_stats) 305562306a36Sopenharmony_ci seq_printf(s, " cost.wait=%llu cost.indebt=%llu cost.indelay=%llu", 305662306a36Sopenharmony_ci iocg->last_stat.wait_us, 305762306a36Sopenharmony_ci iocg->last_stat.indebt_us, 305862306a36Sopenharmony_ci iocg->last_stat.indelay_us); 305962306a36Sopenharmony_ci} 306062306a36Sopenharmony_ci 306162306a36Sopenharmony_cistatic u64 ioc_weight_prfill(struct seq_file *sf, struct blkg_policy_data *pd, 306262306a36Sopenharmony_ci int off) 306362306a36Sopenharmony_ci{ 306462306a36Sopenharmony_ci const char *dname = blkg_dev_name(pd->blkg); 306562306a36Sopenharmony_ci struct ioc_gq *iocg = pd_to_iocg(pd); 306662306a36Sopenharmony_ci 306762306a36Sopenharmony_ci if (dname && iocg->cfg_weight) 306862306a36Sopenharmony_ci seq_printf(sf, "%s %u\n", dname, iocg->cfg_weight / WEIGHT_ONE); 306962306a36Sopenharmony_ci return 0; 307062306a36Sopenharmony_ci} 307162306a36Sopenharmony_ci 307262306a36Sopenharmony_ci 307362306a36Sopenharmony_cistatic int ioc_weight_show(struct seq_file *sf, void *v) 307462306a36Sopenharmony_ci{ 307562306a36Sopenharmony_ci struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); 307662306a36Sopenharmony_ci struct ioc_cgrp *iocc = blkcg_to_iocc(blkcg); 307762306a36Sopenharmony_ci 307862306a36Sopenharmony_ci seq_printf(sf, "default %u\n", iocc->dfl_weight / WEIGHT_ONE); 307962306a36Sopenharmony_ci blkcg_print_blkgs(sf, blkcg, ioc_weight_prfill, 308062306a36Sopenharmony_ci &blkcg_policy_iocost, seq_cft(sf)->private, false); 308162306a36Sopenharmony_ci return 0; 308262306a36Sopenharmony_ci} 308362306a36Sopenharmony_ci 308462306a36Sopenharmony_cistatic ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf, 308562306a36Sopenharmony_ci size_t nbytes, loff_t off) 308662306a36Sopenharmony_ci{ 308762306a36Sopenharmony_ci struct blkcg *blkcg = css_to_blkcg(of_css(of)); 308862306a36Sopenharmony_ci struct ioc_cgrp *iocc = blkcg_to_iocc(blkcg); 308962306a36Sopenharmony_ci struct blkg_conf_ctx ctx; 309062306a36Sopenharmony_ci struct ioc_now now; 309162306a36Sopenharmony_ci struct ioc_gq *iocg; 309262306a36Sopenharmony_ci u32 v; 309362306a36Sopenharmony_ci int ret; 309462306a36Sopenharmony_ci 309562306a36Sopenharmony_ci if (!strchr(buf, ':')) { 309662306a36Sopenharmony_ci struct blkcg_gq *blkg; 309762306a36Sopenharmony_ci 309862306a36Sopenharmony_ci if (!sscanf(buf, "default %u", &v) && !sscanf(buf, "%u", &v)) 309962306a36Sopenharmony_ci return -EINVAL; 310062306a36Sopenharmony_ci 310162306a36Sopenharmony_ci if (v < CGROUP_WEIGHT_MIN || v > CGROUP_WEIGHT_MAX) 310262306a36Sopenharmony_ci return -EINVAL; 310362306a36Sopenharmony_ci 310462306a36Sopenharmony_ci spin_lock_irq(&blkcg->lock); 310562306a36Sopenharmony_ci iocc->dfl_weight = v * WEIGHT_ONE; 310662306a36Sopenharmony_ci hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) { 310762306a36Sopenharmony_ci struct ioc_gq *iocg = blkg_to_iocg(blkg); 310862306a36Sopenharmony_ci 310962306a36Sopenharmony_ci if (iocg) { 311062306a36Sopenharmony_ci spin_lock(&iocg->ioc->lock); 311162306a36Sopenharmony_ci ioc_now(iocg->ioc, &now); 311262306a36Sopenharmony_ci weight_updated(iocg, &now); 311362306a36Sopenharmony_ci spin_unlock(&iocg->ioc->lock); 311462306a36Sopenharmony_ci } 311562306a36Sopenharmony_ci } 311662306a36Sopenharmony_ci spin_unlock_irq(&blkcg->lock); 311762306a36Sopenharmony_ci 311862306a36Sopenharmony_ci return nbytes; 311962306a36Sopenharmony_ci } 312062306a36Sopenharmony_ci 312162306a36Sopenharmony_ci blkg_conf_init(&ctx, buf); 312262306a36Sopenharmony_ci 312362306a36Sopenharmony_ci ret = blkg_conf_prep(blkcg, &blkcg_policy_iocost, &ctx); 312462306a36Sopenharmony_ci if (ret) 312562306a36Sopenharmony_ci goto err; 312662306a36Sopenharmony_ci 312762306a36Sopenharmony_ci iocg = blkg_to_iocg(ctx.blkg); 312862306a36Sopenharmony_ci 312962306a36Sopenharmony_ci if (!strncmp(ctx.body, "default", 7)) { 313062306a36Sopenharmony_ci v = 0; 313162306a36Sopenharmony_ci } else { 313262306a36Sopenharmony_ci if (!sscanf(ctx.body, "%u", &v)) 313362306a36Sopenharmony_ci goto einval; 313462306a36Sopenharmony_ci if (v < CGROUP_WEIGHT_MIN || v > CGROUP_WEIGHT_MAX) 313562306a36Sopenharmony_ci goto einval; 313662306a36Sopenharmony_ci } 313762306a36Sopenharmony_ci 313862306a36Sopenharmony_ci spin_lock(&iocg->ioc->lock); 313962306a36Sopenharmony_ci iocg->cfg_weight = v * WEIGHT_ONE; 314062306a36Sopenharmony_ci ioc_now(iocg->ioc, &now); 314162306a36Sopenharmony_ci weight_updated(iocg, &now); 314262306a36Sopenharmony_ci spin_unlock(&iocg->ioc->lock); 314362306a36Sopenharmony_ci 314462306a36Sopenharmony_ci blkg_conf_exit(&ctx); 314562306a36Sopenharmony_ci return nbytes; 314662306a36Sopenharmony_ci 314762306a36Sopenharmony_cieinval: 314862306a36Sopenharmony_ci ret = -EINVAL; 314962306a36Sopenharmony_cierr: 315062306a36Sopenharmony_ci blkg_conf_exit(&ctx); 315162306a36Sopenharmony_ci return ret; 315262306a36Sopenharmony_ci} 315362306a36Sopenharmony_ci 315462306a36Sopenharmony_cistatic u64 ioc_qos_prfill(struct seq_file *sf, struct blkg_policy_data *pd, 315562306a36Sopenharmony_ci int off) 315662306a36Sopenharmony_ci{ 315762306a36Sopenharmony_ci const char *dname = blkg_dev_name(pd->blkg); 315862306a36Sopenharmony_ci struct ioc *ioc = pd_to_iocg(pd)->ioc; 315962306a36Sopenharmony_ci 316062306a36Sopenharmony_ci if (!dname) 316162306a36Sopenharmony_ci return 0; 316262306a36Sopenharmony_ci 316362306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 316462306a36Sopenharmony_ci seq_printf(sf, "%s enable=%d ctrl=%s rpct=%u.%02u rlat=%u wpct=%u.%02u wlat=%u min=%u.%02u max=%u.%02u\n", 316562306a36Sopenharmony_ci dname, ioc->enabled, ioc->user_qos_params ? "user" : "auto", 316662306a36Sopenharmony_ci ioc->params.qos[QOS_RPPM] / 10000, 316762306a36Sopenharmony_ci ioc->params.qos[QOS_RPPM] % 10000 / 100, 316862306a36Sopenharmony_ci ioc->params.qos[QOS_RLAT], 316962306a36Sopenharmony_ci ioc->params.qos[QOS_WPPM] / 10000, 317062306a36Sopenharmony_ci ioc->params.qos[QOS_WPPM] % 10000 / 100, 317162306a36Sopenharmony_ci ioc->params.qos[QOS_WLAT], 317262306a36Sopenharmony_ci ioc->params.qos[QOS_MIN] / 10000, 317362306a36Sopenharmony_ci ioc->params.qos[QOS_MIN] % 10000 / 100, 317462306a36Sopenharmony_ci ioc->params.qos[QOS_MAX] / 10000, 317562306a36Sopenharmony_ci ioc->params.qos[QOS_MAX] % 10000 / 100); 317662306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 317762306a36Sopenharmony_ci return 0; 317862306a36Sopenharmony_ci} 317962306a36Sopenharmony_ci 318062306a36Sopenharmony_cistatic int ioc_qos_show(struct seq_file *sf, void *v) 318162306a36Sopenharmony_ci{ 318262306a36Sopenharmony_ci struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); 318362306a36Sopenharmony_ci 318462306a36Sopenharmony_ci blkcg_print_blkgs(sf, blkcg, ioc_qos_prfill, 318562306a36Sopenharmony_ci &blkcg_policy_iocost, seq_cft(sf)->private, false); 318662306a36Sopenharmony_ci return 0; 318762306a36Sopenharmony_ci} 318862306a36Sopenharmony_ci 318962306a36Sopenharmony_cistatic const match_table_t qos_ctrl_tokens = { 319062306a36Sopenharmony_ci { QOS_ENABLE, "enable=%u" }, 319162306a36Sopenharmony_ci { QOS_CTRL, "ctrl=%s" }, 319262306a36Sopenharmony_ci { NR_QOS_CTRL_PARAMS, NULL }, 319362306a36Sopenharmony_ci}; 319462306a36Sopenharmony_ci 319562306a36Sopenharmony_cistatic const match_table_t qos_tokens = { 319662306a36Sopenharmony_ci { QOS_RPPM, "rpct=%s" }, 319762306a36Sopenharmony_ci { QOS_RLAT, "rlat=%u" }, 319862306a36Sopenharmony_ci { QOS_WPPM, "wpct=%s" }, 319962306a36Sopenharmony_ci { QOS_WLAT, "wlat=%u" }, 320062306a36Sopenharmony_ci { QOS_MIN, "min=%s" }, 320162306a36Sopenharmony_ci { QOS_MAX, "max=%s" }, 320262306a36Sopenharmony_ci { NR_QOS_PARAMS, NULL }, 320362306a36Sopenharmony_ci}; 320462306a36Sopenharmony_ci 320562306a36Sopenharmony_cistatic ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input, 320662306a36Sopenharmony_ci size_t nbytes, loff_t off) 320762306a36Sopenharmony_ci{ 320862306a36Sopenharmony_ci struct blkg_conf_ctx ctx; 320962306a36Sopenharmony_ci struct gendisk *disk; 321062306a36Sopenharmony_ci struct ioc *ioc; 321162306a36Sopenharmony_ci u32 qos[NR_QOS_PARAMS]; 321262306a36Sopenharmony_ci bool enable, user; 321362306a36Sopenharmony_ci char *body, *p; 321462306a36Sopenharmony_ci int ret; 321562306a36Sopenharmony_ci 321662306a36Sopenharmony_ci blkg_conf_init(&ctx, input); 321762306a36Sopenharmony_ci 321862306a36Sopenharmony_ci ret = blkg_conf_open_bdev(&ctx); 321962306a36Sopenharmony_ci if (ret) 322062306a36Sopenharmony_ci goto err; 322162306a36Sopenharmony_ci 322262306a36Sopenharmony_ci body = ctx.body; 322362306a36Sopenharmony_ci disk = ctx.bdev->bd_disk; 322462306a36Sopenharmony_ci if (!queue_is_mq(disk->queue)) { 322562306a36Sopenharmony_ci ret = -EOPNOTSUPP; 322662306a36Sopenharmony_ci goto err; 322762306a36Sopenharmony_ci } 322862306a36Sopenharmony_ci 322962306a36Sopenharmony_ci ioc = q_to_ioc(disk->queue); 323062306a36Sopenharmony_ci if (!ioc) { 323162306a36Sopenharmony_ci ret = blk_iocost_init(disk); 323262306a36Sopenharmony_ci if (ret) 323362306a36Sopenharmony_ci goto err; 323462306a36Sopenharmony_ci ioc = q_to_ioc(disk->queue); 323562306a36Sopenharmony_ci } 323662306a36Sopenharmony_ci 323762306a36Sopenharmony_ci blk_mq_freeze_queue(disk->queue); 323862306a36Sopenharmony_ci blk_mq_quiesce_queue(disk->queue); 323962306a36Sopenharmony_ci 324062306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 324162306a36Sopenharmony_ci memcpy(qos, ioc->params.qos, sizeof(qos)); 324262306a36Sopenharmony_ci enable = ioc->enabled; 324362306a36Sopenharmony_ci user = ioc->user_qos_params; 324462306a36Sopenharmony_ci 324562306a36Sopenharmony_ci while ((p = strsep(&body, " \t\n"))) { 324662306a36Sopenharmony_ci substring_t args[MAX_OPT_ARGS]; 324762306a36Sopenharmony_ci char buf[32]; 324862306a36Sopenharmony_ci int tok; 324962306a36Sopenharmony_ci s64 v; 325062306a36Sopenharmony_ci 325162306a36Sopenharmony_ci if (!*p) 325262306a36Sopenharmony_ci continue; 325362306a36Sopenharmony_ci 325462306a36Sopenharmony_ci switch (match_token(p, qos_ctrl_tokens, args)) { 325562306a36Sopenharmony_ci case QOS_ENABLE: 325662306a36Sopenharmony_ci if (match_u64(&args[0], &v)) 325762306a36Sopenharmony_ci goto einval; 325862306a36Sopenharmony_ci enable = v; 325962306a36Sopenharmony_ci continue; 326062306a36Sopenharmony_ci case QOS_CTRL: 326162306a36Sopenharmony_ci match_strlcpy(buf, &args[0], sizeof(buf)); 326262306a36Sopenharmony_ci if (!strcmp(buf, "auto")) 326362306a36Sopenharmony_ci user = false; 326462306a36Sopenharmony_ci else if (!strcmp(buf, "user")) 326562306a36Sopenharmony_ci user = true; 326662306a36Sopenharmony_ci else 326762306a36Sopenharmony_ci goto einval; 326862306a36Sopenharmony_ci continue; 326962306a36Sopenharmony_ci } 327062306a36Sopenharmony_ci 327162306a36Sopenharmony_ci tok = match_token(p, qos_tokens, args); 327262306a36Sopenharmony_ci switch (tok) { 327362306a36Sopenharmony_ci case QOS_RPPM: 327462306a36Sopenharmony_ci case QOS_WPPM: 327562306a36Sopenharmony_ci if (match_strlcpy(buf, &args[0], sizeof(buf)) >= 327662306a36Sopenharmony_ci sizeof(buf)) 327762306a36Sopenharmony_ci goto einval; 327862306a36Sopenharmony_ci if (cgroup_parse_float(buf, 2, &v)) 327962306a36Sopenharmony_ci goto einval; 328062306a36Sopenharmony_ci if (v < 0 || v > 10000) 328162306a36Sopenharmony_ci goto einval; 328262306a36Sopenharmony_ci qos[tok] = v * 100; 328362306a36Sopenharmony_ci break; 328462306a36Sopenharmony_ci case QOS_RLAT: 328562306a36Sopenharmony_ci case QOS_WLAT: 328662306a36Sopenharmony_ci if (match_u64(&args[0], &v)) 328762306a36Sopenharmony_ci goto einval; 328862306a36Sopenharmony_ci qos[tok] = v; 328962306a36Sopenharmony_ci break; 329062306a36Sopenharmony_ci case QOS_MIN: 329162306a36Sopenharmony_ci case QOS_MAX: 329262306a36Sopenharmony_ci if (match_strlcpy(buf, &args[0], sizeof(buf)) >= 329362306a36Sopenharmony_ci sizeof(buf)) 329462306a36Sopenharmony_ci goto einval; 329562306a36Sopenharmony_ci if (cgroup_parse_float(buf, 2, &v)) 329662306a36Sopenharmony_ci goto einval; 329762306a36Sopenharmony_ci if (v < 0) 329862306a36Sopenharmony_ci goto einval; 329962306a36Sopenharmony_ci qos[tok] = clamp_t(s64, v * 100, 330062306a36Sopenharmony_ci VRATE_MIN_PPM, VRATE_MAX_PPM); 330162306a36Sopenharmony_ci break; 330262306a36Sopenharmony_ci default: 330362306a36Sopenharmony_ci goto einval; 330462306a36Sopenharmony_ci } 330562306a36Sopenharmony_ci user = true; 330662306a36Sopenharmony_ci } 330762306a36Sopenharmony_ci 330862306a36Sopenharmony_ci if (qos[QOS_MIN] > qos[QOS_MAX]) 330962306a36Sopenharmony_ci goto einval; 331062306a36Sopenharmony_ci 331162306a36Sopenharmony_ci if (enable && !ioc->enabled) { 331262306a36Sopenharmony_ci blk_stat_enable_accounting(disk->queue); 331362306a36Sopenharmony_ci blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, disk->queue); 331462306a36Sopenharmony_ci ioc->enabled = true; 331562306a36Sopenharmony_ci } else if (!enable && ioc->enabled) { 331662306a36Sopenharmony_ci blk_stat_disable_accounting(disk->queue); 331762306a36Sopenharmony_ci blk_queue_flag_clear(QUEUE_FLAG_RQ_ALLOC_TIME, disk->queue); 331862306a36Sopenharmony_ci ioc->enabled = false; 331962306a36Sopenharmony_ci } 332062306a36Sopenharmony_ci 332162306a36Sopenharmony_ci if (user) { 332262306a36Sopenharmony_ci memcpy(ioc->params.qos, qos, sizeof(qos)); 332362306a36Sopenharmony_ci ioc->user_qos_params = true; 332462306a36Sopenharmony_ci } else { 332562306a36Sopenharmony_ci ioc->user_qos_params = false; 332662306a36Sopenharmony_ci } 332762306a36Sopenharmony_ci 332862306a36Sopenharmony_ci ioc_refresh_params(ioc, true); 332962306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 333062306a36Sopenharmony_ci 333162306a36Sopenharmony_ci if (enable) 333262306a36Sopenharmony_ci wbt_disable_default(disk); 333362306a36Sopenharmony_ci else 333462306a36Sopenharmony_ci wbt_enable_default(disk); 333562306a36Sopenharmony_ci 333662306a36Sopenharmony_ci blk_mq_unquiesce_queue(disk->queue); 333762306a36Sopenharmony_ci blk_mq_unfreeze_queue(disk->queue); 333862306a36Sopenharmony_ci 333962306a36Sopenharmony_ci blkg_conf_exit(&ctx); 334062306a36Sopenharmony_ci return nbytes; 334162306a36Sopenharmony_cieinval: 334262306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 334362306a36Sopenharmony_ci 334462306a36Sopenharmony_ci blk_mq_unquiesce_queue(disk->queue); 334562306a36Sopenharmony_ci blk_mq_unfreeze_queue(disk->queue); 334662306a36Sopenharmony_ci 334762306a36Sopenharmony_ci ret = -EINVAL; 334862306a36Sopenharmony_cierr: 334962306a36Sopenharmony_ci blkg_conf_exit(&ctx); 335062306a36Sopenharmony_ci return ret; 335162306a36Sopenharmony_ci} 335262306a36Sopenharmony_ci 335362306a36Sopenharmony_cistatic u64 ioc_cost_model_prfill(struct seq_file *sf, 335462306a36Sopenharmony_ci struct blkg_policy_data *pd, int off) 335562306a36Sopenharmony_ci{ 335662306a36Sopenharmony_ci const char *dname = blkg_dev_name(pd->blkg); 335762306a36Sopenharmony_ci struct ioc *ioc = pd_to_iocg(pd)->ioc; 335862306a36Sopenharmony_ci u64 *u = ioc->params.i_lcoefs; 335962306a36Sopenharmony_ci 336062306a36Sopenharmony_ci if (!dname) 336162306a36Sopenharmony_ci return 0; 336262306a36Sopenharmony_ci 336362306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 336462306a36Sopenharmony_ci seq_printf(sf, "%s ctrl=%s model=linear " 336562306a36Sopenharmony_ci "rbps=%llu rseqiops=%llu rrandiops=%llu " 336662306a36Sopenharmony_ci "wbps=%llu wseqiops=%llu wrandiops=%llu\n", 336762306a36Sopenharmony_ci dname, ioc->user_cost_model ? "user" : "auto", 336862306a36Sopenharmony_ci u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS], 336962306a36Sopenharmony_ci u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS]); 337062306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 337162306a36Sopenharmony_ci return 0; 337262306a36Sopenharmony_ci} 337362306a36Sopenharmony_ci 337462306a36Sopenharmony_cistatic int ioc_cost_model_show(struct seq_file *sf, void *v) 337562306a36Sopenharmony_ci{ 337662306a36Sopenharmony_ci struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); 337762306a36Sopenharmony_ci 337862306a36Sopenharmony_ci blkcg_print_blkgs(sf, blkcg, ioc_cost_model_prfill, 337962306a36Sopenharmony_ci &blkcg_policy_iocost, seq_cft(sf)->private, false); 338062306a36Sopenharmony_ci return 0; 338162306a36Sopenharmony_ci} 338262306a36Sopenharmony_ci 338362306a36Sopenharmony_cistatic const match_table_t cost_ctrl_tokens = { 338462306a36Sopenharmony_ci { COST_CTRL, "ctrl=%s" }, 338562306a36Sopenharmony_ci { COST_MODEL, "model=%s" }, 338662306a36Sopenharmony_ci { NR_COST_CTRL_PARAMS, NULL }, 338762306a36Sopenharmony_ci}; 338862306a36Sopenharmony_ci 338962306a36Sopenharmony_cistatic const match_table_t i_lcoef_tokens = { 339062306a36Sopenharmony_ci { I_LCOEF_RBPS, "rbps=%u" }, 339162306a36Sopenharmony_ci { I_LCOEF_RSEQIOPS, "rseqiops=%u" }, 339262306a36Sopenharmony_ci { I_LCOEF_RRANDIOPS, "rrandiops=%u" }, 339362306a36Sopenharmony_ci { I_LCOEF_WBPS, "wbps=%u" }, 339462306a36Sopenharmony_ci { I_LCOEF_WSEQIOPS, "wseqiops=%u" }, 339562306a36Sopenharmony_ci { I_LCOEF_WRANDIOPS, "wrandiops=%u" }, 339662306a36Sopenharmony_ci { NR_I_LCOEFS, NULL }, 339762306a36Sopenharmony_ci}; 339862306a36Sopenharmony_ci 339962306a36Sopenharmony_cistatic ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input, 340062306a36Sopenharmony_ci size_t nbytes, loff_t off) 340162306a36Sopenharmony_ci{ 340262306a36Sopenharmony_ci struct blkg_conf_ctx ctx; 340362306a36Sopenharmony_ci struct request_queue *q; 340462306a36Sopenharmony_ci struct ioc *ioc; 340562306a36Sopenharmony_ci u64 u[NR_I_LCOEFS]; 340662306a36Sopenharmony_ci bool user; 340762306a36Sopenharmony_ci char *body, *p; 340862306a36Sopenharmony_ci int ret; 340962306a36Sopenharmony_ci 341062306a36Sopenharmony_ci blkg_conf_init(&ctx, input); 341162306a36Sopenharmony_ci 341262306a36Sopenharmony_ci ret = blkg_conf_open_bdev(&ctx); 341362306a36Sopenharmony_ci if (ret) 341462306a36Sopenharmony_ci goto err; 341562306a36Sopenharmony_ci 341662306a36Sopenharmony_ci body = ctx.body; 341762306a36Sopenharmony_ci q = bdev_get_queue(ctx.bdev); 341862306a36Sopenharmony_ci if (!queue_is_mq(q)) { 341962306a36Sopenharmony_ci ret = -EOPNOTSUPP; 342062306a36Sopenharmony_ci goto err; 342162306a36Sopenharmony_ci } 342262306a36Sopenharmony_ci 342362306a36Sopenharmony_ci ioc = q_to_ioc(q); 342462306a36Sopenharmony_ci if (!ioc) { 342562306a36Sopenharmony_ci ret = blk_iocost_init(ctx.bdev->bd_disk); 342662306a36Sopenharmony_ci if (ret) 342762306a36Sopenharmony_ci goto err; 342862306a36Sopenharmony_ci ioc = q_to_ioc(q); 342962306a36Sopenharmony_ci } 343062306a36Sopenharmony_ci 343162306a36Sopenharmony_ci blk_mq_freeze_queue(q); 343262306a36Sopenharmony_ci blk_mq_quiesce_queue(q); 343362306a36Sopenharmony_ci 343462306a36Sopenharmony_ci spin_lock_irq(&ioc->lock); 343562306a36Sopenharmony_ci memcpy(u, ioc->params.i_lcoefs, sizeof(u)); 343662306a36Sopenharmony_ci user = ioc->user_cost_model; 343762306a36Sopenharmony_ci 343862306a36Sopenharmony_ci while ((p = strsep(&body, " \t\n"))) { 343962306a36Sopenharmony_ci substring_t args[MAX_OPT_ARGS]; 344062306a36Sopenharmony_ci char buf[32]; 344162306a36Sopenharmony_ci int tok; 344262306a36Sopenharmony_ci u64 v; 344362306a36Sopenharmony_ci 344462306a36Sopenharmony_ci if (!*p) 344562306a36Sopenharmony_ci continue; 344662306a36Sopenharmony_ci 344762306a36Sopenharmony_ci switch (match_token(p, cost_ctrl_tokens, args)) { 344862306a36Sopenharmony_ci case COST_CTRL: 344962306a36Sopenharmony_ci match_strlcpy(buf, &args[0], sizeof(buf)); 345062306a36Sopenharmony_ci if (!strcmp(buf, "auto")) 345162306a36Sopenharmony_ci user = false; 345262306a36Sopenharmony_ci else if (!strcmp(buf, "user")) 345362306a36Sopenharmony_ci user = true; 345462306a36Sopenharmony_ci else 345562306a36Sopenharmony_ci goto einval; 345662306a36Sopenharmony_ci continue; 345762306a36Sopenharmony_ci case COST_MODEL: 345862306a36Sopenharmony_ci match_strlcpy(buf, &args[0], sizeof(buf)); 345962306a36Sopenharmony_ci if (strcmp(buf, "linear")) 346062306a36Sopenharmony_ci goto einval; 346162306a36Sopenharmony_ci continue; 346262306a36Sopenharmony_ci } 346362306a36Sopenharmony_ci 346462306a36Sopenharmony_ci tok = match_token(p, i_lcoef_tokens, args); 346562306a36Sopenharmony_ci if (tok == NR_I_LCOEFS) 346662306a36Sopenharmony_ci goto einval; 346762306a36Sopenharmony_ci if (match_u64(&args[0], &v)) 346862306a36Sopenharmony_ci goto einval; 346962306a36Sopenharmony_ci u[tok] = v; 347062306a36Sopenharmony_ci user = true; 347162306a36Sopenharmony_ci } 347262306a36Sopenharmony_ci 347362306a36Sopenharmony_ci if (user) { 347462306a36Sopenharmony_ci memcpy(ioc->params.i_lcoefs, u, sizeof(u)); 347562306a36Sopenharmony_ci ioc->user_cost_model = true; 347662306a36Sopenharmony_ci } else { 347762306a36Sopenharmony_ci ioc->user_cost_model = false; 347862306a36Sopenharmony_ci } 347962306a36Sopenharmony_ci ioc_refresh_params(ioc, true); 348062306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 348162306a36Sopenharmony_ci 348262306a36Sopenharmony_ci blk_mq_unquiesce_queue(q); 348362306a36Sopenharmony_ci blk_mq_unfreeze_queue(q); 348462306a36Sopenharmony_ci 348562306a36Sopenharmony_ci blkg_conf_exit(&ctx); 348662306a36Sopenharmony_ci return nbytes; 348762306a36Sopenharmony_ci 348862306a36Sopenharmony_cieinval: 348962306a36Sopenharmony_ci spin_unlock_irq(&ioc->lock); 349062306a36Sopenharmony_ci 349162306a36Sopenharmony_ci blk_mq_unquiesce_queue(q); 349262306a36Sopenharmony_ci blk_mq_unfreeze_queue(q); 349362306a36Sopenharmony_ci 349462306a36Sopenharmony_ci ret = -EINVAL; 349562306a36Sopenharmony_cierr: 349662306a36Sopenharmony_ci blkg_conf_exit(&ctx); 349762306a36Sopenharmony_ci return ret; 349862306a36Sopenharmony_ci} 349962306a36Sopenharmony_ci 350062306a36Sopenharmony_cistatic struct cftype ioc_files[] = { 350162306a36Sopenharmony_ci { 350262306a36Sopenharmony_ci .name = "weight", 350362306a36Sopenharmony_ci .flags = CFTYPE_NOT_ON_ROOT, 350462306a36Sopenharmony_ci .seq_show = ioc_weight_show, 350562306a36Sopenharmony_ci .write = ioc_weight_write, 350662306a36Sopenharmony_ci }, 350762306a36Sopenharmony_ci { 350862306a36Sopenharmony_ci .name = "cost.qos", 350962306a36Sopenharmony_ci .flags = CFTYPE_ONLY_ON_ROOT, 351062306a36Sopenharmony_ci .seq_show = ioc_qos_show, 351162306a36Sopenharmony_ci .write = ioc_qos_write, 351262306a36Sopenharmony_ci }, 351362306a36Sopenharmony_ci { 351462306a36Sopenharmony_ci .name = "cost.model", 351562306a36Sopenharmony_ci .flags = CFTYPE_ONLY_ON_ROOT, 351662306a36Sopenharmony_ci .seq_show = ioc_cost_model_show, 351762306a36Sopenharmony_ci .write = ioc_cost_model_write, 351862306a36Sopenharmony_ci }, 351962306a36Sopenharmony_ci {} 352062306a36Sopenharmony_ci}; 352162306a36Sopenharmony_ci 352262306a36Sopenharmony_cistatic struct blkcg_policy blkcg_policy_iocost = { 352362306a36Sopenharmony_ci .dfl_cftypes = ioc_files, 352462306a36Sopenharmony_ci .cpd_alloc_fn = ioc_cpd_alloc, 352562306a36Sopenharmony_ci .cpd_free_fn = ioc_cpd_free, 352662306a36Sopenharmony_ci .pd_alloc_fn = ioc_pd_alloc, 352762306a36Sopenharmony_ci .pd_init_fn = ioc_pd_init, 352862306a36Sopenharmony_ci .pd_free_fn = ioc_pd_free, 352962306a36Sopenharmony_ci .pd_stat_fn = ioc_pd_stat, 353062306a36Sopenharmony_ci}; 353162306a36Sopenharmony_ci 353262306a36Sopenharmony_cistatic int __init ioc_init(void) 353362306a36Sopenharmony_ci{ 353462306a36Sopenharmony_ci return blkcg_policy_register(&blkcg_policy_iocost); 353562306a36Sopenharmony_ci} 353662306a36Sopenharmony_ci 353762306a36Sopenharmony_cistatic void __exit ioc_exit(void) 353862306a36Sopenharmony_ci{ 353962306a36Sopenharmony_ci blkcg_policy_unregister(&blkcg_policy_iocost); 354062306a36Sopenharmony_ci} 354162306a36Sopenharmony_ci 354262306a36Sopenharmony_cimodule_init(ioc_init); 354362306a36Sopenharmony_cimodule_exit(ioc_exit); 3544