162306a36Sopenharmony_ci=============== 262306a36Sopenharmony_ciRDMA Controller 362306a36Sopenharmony_ci=============== 462306a36Sopenharmony_ci 562306a36Sopenharmony_ci.. Contents 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci 1. Overview 862306a36Sopenharmony_ci 1-1. What is RDMA controller? 962306a36Sopenharmony_ci 1-2. Why RDMA controller needed? 1062306a36Sopenharmony_ci 1-3. How is RDMA controller implemented? 1162306a36Sopenharmony_ci 2. Usage Examples 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ci1. Overview 1462306a36Sopenharmony_ci=========== 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ci1-1. What is RDMA controller? 1762306a36Sopenharmony_ci----------------------------- 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciRDMA controller allows user to limit RDMA/IB specific resources that a given 2062306a36Sopenharmony_ciset of processes can use. These processes are grouped using RDMA controller. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciRDMA controller defines two resources which can be limited for processes of a 2362306a36Sopenharmony_cicgroup. 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ci1-2. Why RDMA controller needed? 2662306a36Sopenharmony_ci-------------------------------- 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ciCurrently user space applications can easily take away all the rdma verb 2962306a36Sopenharmony_cispecific resources such as AH, CQ, QP, MR etc. Due to which other applications 3062306a36Sopenharmony_ciin other cgroup or kernel space ULPs may not even get chance to allocate any 3162306a36Sopenharmony_cirdma resources. This can lead to service unavailability. 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ciTherefore RDMA controller is needed through which resource consumption 3462306a36Sopenharmony_ciof processes can be limited. Through this controller different rdma 3562306a36Sopenharmony_ciresources can be accounted. 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ci1-3. How is RDMA controller implemented? 3862306a36Sopenharmony_ci---------------------------------------- 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ciRDMA cgroup allows limit configuration of resources. Rdma cgroup maintains 4162306a36Sopenharmony_ciresource accounting per cgroup, per device using resource pool structure. 4262306a36Sopenharmony_ciEach such resource pool is limited up to 64 resources in given resource pool 4362306a36Sopenharmony_ciby rdma cgroup, which can be extended later if required. 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciThis resource pool object is linked to the cgroup css. Typically there 4662306a36Sopenharmony_ciare 0 to 4 resource pool instances per cgroup, per device in most use cases. 4762306a36Sopenharmony_ciBut nothing limits to have it more. At present hundreds of RDMA devices per 4862306a36Sopenharmony_cisingle cgroup may not be handled optimally, however there is no 4962306a36Sopenharmony_ciknown use case or requirement for such configuration either. 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ciSince RDMA resources can be allocated from any process and can be freed by any 5262306a36Sopenharmony_ciof the child processes which shares the address space, rdma resources are 5362306a36Sopenharmony_cialways owned by the creator cgroup css. This allows process migration from one 5462306a36Sopenharmony_cito other cgroup without major complexity of transferring resource ownership; 5562306a36Sopenharmony_cibecause such ownership is not really present due to shared nature of 5662306a36Sopenharmony_cirdma resources. Linking resources around css also ensures that cgroups can be 5762306a36Sopenharmony_cideleted after processes migrated. This allow progress migration as well with 5862306a36Sopenharmony_ciactive resources, even though that is not a primary use case. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciWhenever RDMA resource charging occurs, owner rdma cgroup is returned to 6162306a36Sopenharmony_cithe caller. Same rdma cgroup should be passed while uncharging the resource. 6262306a36Sopenharmony_ciThis also allows process migrated with active RDMA resource to charge 6362306a36Sopenharmony_cito new owner cgroup for new resource. It also allows to uncharge resource of 6462306a36Sopenharmony_cia process from previously charged cgroup which is migrated to new cgroup, 6562306a36Sopenharmony_cieven though that is not a primary use case. 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciResource pool object is created in following situations. 6862306a36Sopenharmony_ci(a) User sets the limit and no previous resource pool exist for the device 6962306a36Sopenharmony_ciof interest for the cgroup. 7062306a36Sopenharmony_ci(b) No resource limits were configured, but IB/RDMA stack tries to 7162306a36Sopenharmony_cicharge the resource. So that it correctly uncharge them when applications are 7262306a36Sopenharmony_cirunning without limits and later on when limits are enforced during uncharging, 7362306a36Sopenharmony_ciotherwise usage count will drop to negative. 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ciResource pool is destroyed if all the resource limits are set to max and 7662306a36Sopenharmony_ciit is the last resource getting deallocated. 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_ciUser should set all the limit to max value if it intents to remove/unconfigure 7962306a36Sopenharmony_cithe resource pool for a particular device. 8062306a36Sopenharmony_ci 8162306a36Sopenharmony_ciIB stack honors limits enforced by the rdma controller. When application 8262306a36Sopenharmony_ciquery about maximum resource limits of IB device, it returns minimum of 8362306a36Sopenharmony_ciwhat is configured by user for a given cgroup and what is supported by 8462306a36Sopenharmony_ciIB device. 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ciFollowing resources can be accounted by rdma controller. 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci ========== ============================= 8962306a36Sopenharmony_ci hca_handle Maximum number of HCA Handles 9062306a36Sopenharmony_ci hca_object Maximum number of HCA Objects 9162306a36Sopenharmony_ci ========== ============================= 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ci2. Usage Examples 9462306a36Sopenharmony_ci================= 9562306a36Sopenharmony_ci 9662306a36Sopenharmony_ci(a) Configure resource limit:: 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ci echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max 9962306a36Sopenharmony_ci echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ci(b) Query resource limit:: 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ci cat /sys/fs/cgroup/rdma/2/rdma.max 10462306a36Sopenharmony_ci #Output: 10562306a36Sopenharmony_ci mlx4_0 hca_handle=2 hca_object=2000 10662306a36Sopenharmony_ci ocrdma1 hca_handle=3 hca_object=max 10762306a36Sopenharmony_ci 10862306a36Sopenharmony_ci(c) Query current usage:: 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ci cat /sys/fs/cgroup/rdma/2/rdma.current 11162306a36Sopenharmony_ci #Output: 11262306a36Sopenharmony_ci mlx4_0 hca_handle=1 hca_object=20 11362306a36Sopenharmony_ci ocrdma1 hca_handle=1 hca_object=23 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ci(d) Delete resource limit:: 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ci echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max 118