162306a36Sopenharmony_ci============== 262306a36Sopenharmony_ciControl Groups 362306a36Sopenharmony_ci============== 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciWritten by Paul Menage <menage@google.com> based on 662306a36Sopenharmony_ciDocumentation/admin-guide/cgroup-v1/cpusets.rst 762306a36Sopenharmony_ci 862306a36Sopenharmony_ciOriginal copyright statements from cpusets.txt: 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciPortions Copyright (C) 2004 BULL SA. 1162306a36Sopenharmony_ci 1262306a36Sopenharmony_ciPortions Copyright (c) 2004-2006 Silicon Graphics, Inc. 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ciModified by Paul Jackson <pj@sgi.com> 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciModified by Christoph Lameter <cl@linux.com> 1762306a36Sopenharmony_ci 1862306a36Sopenharmony_ci.. CONTENTS: 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ci 1. Control Groups 2162306a36Sopenharmony_ci 1.1 What are cgroups ? 2262306a36Sopenharmony_ci 1.2 Why are cgroups needed ? 2362306a36Sopenharmony_ci 1.3 How are cgroups implemented ? 2462306a36Sopenharmony_ci 1.4 What does notify_on_release do ? 2562306a36Sopenharmony_ci 1.5 What does clone_children do ? 2662306a36Sopenharmony_ci 1.6 How do I use cgroups ? 2762306a36Sopenharmony_ci 2. Usage Examples and Syntax 2862306a36Sopenharmony_ci 2.1 Basic Usage 2962306a36Sopenharmony_ci 2.2 Attaching processes 3062306a36Sopenharmony_ci 2.3 Mounting hierarchies by name 3162306a36Sopenharmony_ci 3. Kernel API 3262306a36Sopenharmony_ci 3.1 Overview 3362306a36Sopenharmony_ci 3.2 Synchronization 3462306a36Sopenharmony_ci 3.3 Subsystem API 3562306a36Sopenharmony_ci 4. Extended attributes usage 3662306a36Sopenharmony_ci 5. Questions 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ci1. Control Groups 3962306a36Sopenharmony_ci================= 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ci1.1 What are cgroups ? 4262306a36Sopenharmony_ci---------------------- 4362306a36Sopenharmony_ci 4462306a36Sopenharmony_ciControl Groups provide a mechanism for aggregating/partitioning sets of 4562306a36Sopenharmony_citasks, and all their future children, into hierarchical groups with 4662306a36Sopenharmony_cispecialized behaviour. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciDefinitions: 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ciA *cgroup* associates a set of tasks with a set of parameters for one 5162306a36Sopenharmony_cior more subsystems. 5262306a36Sopenharmony_ci 5362306a36Sopenharmony_ciA *subsystem* is a module that makes use of the task grouping 5462306a36Sopenharmony_cifacilities provided by cgroups to treat groups of tasks in 5562306a36Sopenharmony_ciparticular ways. A subsystem is typically a "resource controller" that 5662306a36Sopenharmony_cischedules a resource or applies per-cgroup limits, but it may be 5762306a36Sopenharmony_cianything that wants to act on a group of processes, e.g. a 5862306a36Sopenharmony_civirtualization subsystem. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciA *hierarchy* is a set of cgroups arranged in a tree, such that 6162306a36Sopenharmony_cievery task in the system is in exactly one of the cgroups in the 6262306a36Sopenharmony_cihierarchy, and a set of subsystems; each subsystem has system-specific 6362306a36Sopenharmony_cistate attached to each cgroup in the hierarchy. Each hierarchy has 6462306a36Sopenharmony_cian instance of the cgroup virtual filesystem associated with it. 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ciAt any one time there may be multiple active hierarchies of task 6762306a36Sopenharmony_cicgroups. Each hierarchy is a partition of all tasks in the system. 6862306a36Sopenharmony_ci 6962306a36Sopenharmony_ciUser-level code may create and destroy cgroups by name in an 7062306a36Sopenharmony_ciinstance of the cgroup virtual file system, specify and query to 7162306a36Sopenharmony_ciwhich cgroup a task is assigned, and list the task PIDs assigned to 7262306a36Sopenharmony_cia cgroup. Those creations and assignments only affect the hierarchy 7362306a36Sopenharmony_ciassociated with that instance of the cgroup file system. 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ciOn their own, the only use for cgroups is for simple job 7662306a36Sopenharmony_citracking. The intention is that other subsystems hook into the generic 7762306a36Sopenharmony_cicgroup support to provide new attributes for cgroups, such as 7862306a36Sopenharmony_ciaccounting/limiting the resources which processes in a cgroup can 7962306a36Sopenharmony_ciaccess. For example, cpusets (see Documentation/admin-guide/cgroup-v1/cpusets.rst) allow 8062306a36Sopenharmony_ciyou to associate a set of CPUs and a set of memory nodes with the 8162306a36Sopenharmony_citasks in each cgroup. 8262306a36Sopenharmony_ci 8362306a36Sopenharmony_ci.. _cgroups-why-needed: 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ci1.2 Why are cgroups needed ? 8662306a36Sopenharmony_ci---------------------------- 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ciThere are multiple efforts to provide process aggregations in the 8962306a36Sopenharmony_ciLinux kernel, mainly for resource-tracking purposes. Such efforts 9062306a36Sopenharmony_ciinclude cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server 9162306a36Sopenharmony_cinamespaces. These all require the basic notion of a 9262306a36Sopenharmony_cigrouping/partitioning of processes, with newly forked processes ending 9362306a36Sopenharmony_ciup in the same group (cgroup) as their parent process. 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ciThe kernel cgroup patch provides the minimum essential kernel 9662306a36Sopenharmony_cimechanisms required to efficiently implement such groups. It has 9762306a36Sopenharmony_ciminimal impact on the system fast paths, and provides hooks for 9862306a36Sopenharmony_cispecific subsystems such as cpusets to provide additional behaviour as 9962306a36Sopenharmony_cidesired. 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciMultiple hierarchy support is provided to allow for situations where 10262306a36Sopenharmony_cithe division of tasks into cgroups is distinctly different for 10362306a36Sopenharmony_cidifferent subsystems - having parallel hierarchies allows each 10462306a36Sopenharmony_cihierarchy to be a natural division of tasks, without having to handle 10562306a36Sopenharmony_cicomplex combinations of tasks that would be present if several 10662306a36Sopenharmony_ciunrelated subsystems needed to be forced into the same tree of 10762306a36Sopenharmony_cicgroups. 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciAt one extreme, each resource controller or subsystem could be in a 11062306a36Sopenharmony_ciseparate hierarchy; at the other extreme, all subsystems 11162306a36Sopenharmony_ciwould be attached to the same hierarchy. 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ciAs an example of a scenario (originally proposed by vatsa@in.ibm.com) 11462306a36Sopenharmony_cithat can benefit from multiple hierarchies, consider a large 11562306a36Sopenharmony_ciuniversity server with various users - students, professors, system 11662306a36Sopenharmony_citasks etc. The resource planning for this server could be along the 11762306a36Sopenharmony_cifollowing lines:: 11862306a36Sopenharmony_ci 11962306a36Sopenharmony_ci CPU : "Top cpuset" 12062306a36Sopenharmony_ci / \ 12162306a36Sopenharmony_ci CPUSet1 CPUSet2 12262306a36Sopenharmony_ci | | 12362306a36Sopenharmony_ci (Professors) (Students) 12462306a36Sopenharmony_ci 12562306a36Sopenharmony_ci In addition (system tasks) are attached to topcpuset (so 12662306a36Sopenharmony_ci that they can run anywhere) with a limit of 20% 12762306a36Sopenharmony_ci 12862306a36Sopenharmony_ci Memory : Professors (50%), Students (30%), system (20%) 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ci Disk : Professors (50%), Students (30%), system (20%) 13162306a36Sopenharmony_ci 13262306a36Sopenharmony_ci Network : WWW browsing (20%), Network File System (60%), others (20%) 13362306a36Sopenharmony_ci / \ 13462306a36Sopenharmony_ci Professors (15%) students (5%) 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciBrowsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes 13762306a36Sopenharmony_ciinto the NFS network class. 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ciAt the same time Firefox/Lynx will share an appropriate CPU/Memory class 14062306a36Sopenharmony_cidepending on who launched it (prof/student). 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ciWith the ability to classify tasks differently for different resources 14362306a36Sopenharmony_ci(by putting those resource subsystems in different hierarchies), 14462306a36Sopenharmony_cithe admin can easily set up a script which receives exec notifications 14562306a36Sopenharmony_ciand depending on who is launching the browser he can:: 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks 14862306a36Sopenharmony_ci 14962306a36Sopenharmony_ciWith only a single hierarchy, he now would potentially have to create 15062306a36Sopenharmony_cia separate cgroup for every browser launched and associate it with 15162306a36Sopenharmony_ciappropriate network and other resource class. This may lead to 15262306a36Sopenharmony_ciproliferation of such cgroups. 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciAlso let's say that the administrator would like to give enhanced network 15562306a36Sopenharmony_ciaccess temporarily to a student's browser (since it is night and the user 15662306a36Sopenharmony_ciwants to do online gaming :)) OR give one of the student's simulation 15762306a36Sopenharmony_ciapps enhanced CPU power. 15862306a36Sopenharmony_ci 15962306a36Sopenharmony_ciWith ability to write PIDs directly to resource classes, it's just a 16062306a36Sopenharmony_cimatter of:: 16162306a36Sopenharmony_ci 16262306a36Sopenharmony_ci # echo pid > /sys/fs/cgroup/network/<new_class>/tasks 16362306a36Sopenharmony_ci (after some time) 16462306a36Sopenharmony_ci # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks 16562306a36Sopenharmony_ci 16662306a36Sopenharmony_ciWithout this ability, the administrator would have to split the cgroup into 16762306a36Sopenharmony_cimultiple separate ones and then associate the new cgroups with the 16862306a36Sopenharmony_cinew resource classes. 16962306a36Sopenharmony_ci 17062306a36Sopenharmony_ci 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ci1.3 How are cgroups implemented ? 17362306a36Sopenharmony_ci--------------------------------- 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ciControl Groups extends the kernel as follows: 17662306a36Sopenharmony_ci 17762306a36Sopenharmony_ci - Each task in the system has a reference-counted pointer to a 17862306a36Sopenharmony_ci css_set. 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ci - A css_set contains a set of reference-counted pointers to 18162306a36Sopenharmony_ci cgroup_subsys_state objects, one for each cgroup subsystem 18262306a36Sopenharmony_ci registered in the system. There is no direct link from a task to 18362306a36Sopenharmony_ci the cgroup of which it's a member in each hierarchy, but this 18462306a36Sopenharmony_ci can be determined by following pointers through the 18562306a36Sopenharmony_ci cgroup_subsys_state objects. This is because accessing the 18662306a36Sopenharmony_ci subsystem state is something that's expected to happen frequently 18762306a36Sopenharmony_ci and in performance-critical code, whereas operations that require a 18862306a36Sopenharmony_ci task's actual cgroup assignments (in particular, moving between 18962306a36Sopenharmony_ci cgroups) are less common. A linked list runs through the cg_list 19062306a36Sopenharmony_ci field of each task_struct using the css_set, anchored at 19162306a36Sopenharmony_ci css_set->tasks. 19262306a36Sopenharmony_ci 19362306a36Sopenharmony_ci - A cgroup hierarchy filesystem can be mounted for browsing and 19462306a36Sopenharmony_ci manipulation from user space. 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ci - You can list all the tasks (by PID) attached to any cgroup. 19762306a36Sopenharmony_ci 19862306a36Sopenharmony_ciThe implementation of cgroups requires a few, simple hooks 19962306a36Sopenharmony_ciinto the rest of the kernel, none in performance-critical paths: 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_ci - in init/main.c, to initialize the root cgroups and initial 20262306a36Sopenharmony_ci css_set at system boot. 20362306a36Sopenharmony_ci 20462306a36Sopenharmony_ci - in fork and exit, to attach and detach a task from its css_set. 20562306a36Sopenharmony_ci 20662306a36Sopenharmony_ciIn addition, a new file system of type "cgroup" may be mounted, to 20762306a36Sopenharmony_cienable browsing and modifying the cgroups presently known to the 20862306a36Sopenharmony_cikernel. When mounting a cgroup hierarchy, you may specify a 20962306a36Sopenharmony_cicomma-separated list of subsystems to mount as the filesystem mount 21062306a36Sopenharmony_cioptions. By default, mounting the cgroup filesystem attempts to 21162306a36Sopenharmony_cimount a hierarchy containing all registered subsystems. 21262306a36Sopenharmony_ci 21362306a36Sopenharmony_ciIf an active hierarchy with exactly the same set of subsystems already 21462306a36Sopenharmony_ciexists, it will be reused for the new mount. If no existing hierarchy 21562306a36Sopenharmony_cimatches, and any of the requested subsystems are in use in an existing 21662306a36Sopenharmony_cihierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy 21762306a36Sopenharmony_ciis activated, associated with the requested subsystems. 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ciIt's not currently possible to bind a new subsystem to an active 22062306a36Sopenharmony_cicgroup hierarchy, or to unbind a subsystem from an active cgroup 22162306a36Sopenharmony_cihierarchy. This may be possible in future, but is fraught with nasty 22262306a36Sopenharmony_cierror-recovery issues. 22362306a36Sopenharmony_ci 22462306a36Sopenharmony_ciWhen a cgroup filesystem is unmounted, if there are any 22562306a36Sopenharmony_cichild cgroups created below the top-level cgroup, that hierarchy 22662306a36Sopenharmony_ciwill remain active even though unmounted; if there are no 22762306a36Sopenharmony_cichild cgroups then the hierarchy will be deactivated. 22862306a36Sopenharmony_ci 22962306a36Sopenharmony_ciNo new system calls are added for cgroups - all support for 23062306a36Sopenharmony_ciquerying and modifying cgroups is via this cgroup file system. 23162306a36Sopenharmony_ci 23262306a36Sopenharmony_ciEach task under /proc has an added file named 'cgroup' displaying, 23362306a36Sopenharmony_cifor each active hierarchy, the subsystem names and the cgroup name 23462306a36Sopenharmony_cias the path relative to the root of the cgroup file system. 23562306a36Sopenharmony_ci 23662306a36Sopenharmony_ciEach cgroup is represented by a directory in the cgroup file system 23762306a36Sopenharmony_cicontaining the following files describing that cgroup: 23862306a36Sopenharmony_ci 23962306a36Sopenharmony_ci - tasks: list of tasks (by PID) attached to that cgroup. This list 24062306a36Sopenharmony_ci is not guaranteed to be sorted. Writing a thread ID into this file 24162306a36Sopenharmony_ci moves the thread into this cgroup. 24262306a36Sopenharmony_ci - cgroup.procs: list of thread group IDs in the cgroup. This list is 24362306a36Sopenharmony_ci not guaranteed to be sorted or free of duplicate TGIDs, and userspace 24462306a36Sopenharmony_ci should sort/uniquify the list if this property is required. 24562306a36Sopenharmony_ci Writing a thread group ID into this file moves all threads in that 24662306a36Sopenharmony_ci group into this cgroup. 24762306a36Sopenharmony_ci - notify_on_release flag: run the release agent on exit? 24862306a36Sopenharmony_ci - release_agent: the path to use for release notifications (this file 24962306a36Sopenharmony_ci exists in the top cgroup only) 25062306a36Sopenharmony_ci 25162306a36Sopenharmony_ciOther subsystems such as cpusets may add additional files in each 25262306a36Sopenharmony_cicgroup dir. 25362306a36Sopenharmony_ci 25462306a36Sopenharmony_ciNew cgroups are created using the mkdir system call or shell 25562306a36Sopenharmony_cicommand. The properties of a cgroup, such as its flags, are 25662306a36Sopenharmony_cimodified by writing to the appropriate file in that cgroups 25762306a36Sopenharmony_cidirectory, as listed above. 25862306a36Sopenharmony_ci 25962306a36Sopenharmony_ciThe named hierarchical structure of nested cgroups allows partitioning 26062306a36Sopenharmony_cia large system into nested, dynamically changeable, "soft-partitions". 26162306a36Sopenharmony_ci 26262306a36Sopenharmony_ciThe attachment of each task, automatically inherited at fork by any 26362306a36Sopenharmony_cichildren of that task, to a cgroup allows organizing the work load 26462306a36Sopenharmony_cion a system into related sets of tasks. A task may be re-attached to 26562306a36Sopenharmony_ciany other cgroup, if allowed by the permissions on the necessary 26662306a36Sopenharmony_cicgroup file system directories. 26762306a36Sopenharmony_ci 26862306a36Sopenharmony_ciWhen a task is moved from one cgroup to another, it gets a new 26962306a36Sopenharmony_cicss_set pointer - if there's an already existing css_set with the 27062306a36Sopenharmony_cidesired collection of cgroups then that group is reused, otherwise a new 27162306a36Sopenharmony_cicss_set is allocated. The appropriate existing css_set is located by 27262306a36Sopenharmony_cilooking into a hash table. 27362306a36Sopenharmony_ci 27462306a36Sopenharmony_ciTo allow access from a cgroup to the css_sets (and hence tasks) 27562306a36Sopenharmony_cithat comprise it, a set of cg_cgroup_link objects form a lattice; 27662306a36Sopenharmony_cieach cg_cgroup_link is linked into a list of cg_cgroup_links for 27762306a36Sopenharmony_cia single cgroup on its cgrp_link_list field, and a list of 27862306a36Sopenharmony_cicg_cgroup_links for a single css_set on its cg_link_list. 27962306a36Sopenharmony_ci 28062306a36Sopenharmony_ciThus the set of tasks in a cgroup can be listed by iterating over 28162306a36Sopenharmony_cieach css_set that references the cgroup, and sub-iterating over 28262306a36Sopenharmony_cieach css_set's task set. 28362306a36Sopenharmony_ci 28462306a36Sopenharmony_ciThe use of a Linux virtual file system (vfs) to represent the 28562306a36Sopenharmony_cicgroup hierarchy provides for a familiar permission and name space 28662306a36Sopenharmony_cifor cgroups, with a minimum of additional kernel code. 28762306a36Sopenharmony_ci 28862306a36Sopenharmony_ci1.4 What does notify_on_release do ? 28962306a36Sopenharmony_ci------------------------------------ 29062306a36Sopenharmony_ci 29162306a36Sopenharmony_ciIf the notify_on_release flag is enabled (1) in a cgroup, then 29262306a36Sopenharmony_ciwhenever the last task in the cgroup leaves (exits or attaches to 29362306a36Sopenharmony_cisome other cgroup) and the last child cgroup of that cgroup 29462306a36Sopenharmony_ciis removed, then the kernel runs the command specified by the contents 29562306a36Sopenharmony_ciof the "release_agent" file in that hierarchy's root directory, 29662306a36Sopenharmony_cisupplying the pathname (relative to the mount point of the cgroup 29762306a36Sopenharmony_cifile system) of the abandoned cgroup. This enables automatic 29862306a36Sopenharmony_ciremoval of abandoned cgroups. The default value of 29962306a36Sopenharmony_cinotify_on_release in the root cgroup at system boot is disabled 30062306a36Sopenharmony_ci(0). The default value of other cgroups at creation is the current 30162306a36Sopenharmony_civalue of their parents' notify_on_release settings. The default value of 30262306a36Sopenharmony_cia cgroup hierarchy's release_agent path is empty. 30362306a36Sopenharmony_ci 30462306a36Sopenharmony_ci1.5 What does clone_children do ? 30562306a36Sopenharmony_ci--------------------------------- 30662306a36Sopenharmony_ci 30762306a36Sopenharmony_ciThis flag only affects the cpuset controller. If the clone_children 30862306a36Sopenharmony_ciflag is enabled (1) in a cgroup, a new cpuset cgroup will copy its 30962306a36Sopenharmony_ciconfiguration from the parent during initialization. 31062306a36Sopenharmony_ci 31162306a36Sopenharmony_ci1.6 How do I use cgroups ? 31262306a36Sopenharmony_ci-------------------------- 31362306a36Sopenharmony_ci 31462306a36Sopenharmony_ciTo start a new job that is to be contained within a cgroup, using 31562306a36Sopenharmony_cithe "cpuset" cgroup subsystem, the steps are something like:: 31662306a36Sopenharmony_ci 31762306a36Sopenharmony_ci 1) mount -t tmpfs cgroup_root /sys/fs/cgroup 31862306a36Sopenharmony_ci 2) mkdir /sys/fs/cgroup/cpuset 31962306a36Sopenharmony_ci 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset 32062306a36Sopenharmony_ci 4) Create the new cgroup by doing mkdir's and write's (or echo's) in 32162306a36Sopenharmony_ci the /sys/fs/cgroup/cpuset virtual file system. 32262306a36Sopenharmony_ci 5) Start a task that will be the "founding father" of the new job. 32362306a36Sopenharmony_ci 6) Attach that task to the new cgroup by writing its PID to the 32462306a36Sopenharmony_ci /sys/fs/cgroup/cpuset tasks file for that cgroup. 32562306a36Sopenharmony_ci 7) fork, exec or clone the job tasks from this founding father task. 32662306a36Sopenharmony_ci 32762306a36Sopenharmony_ciFor example, the following sequence of commands will setup a cgroup 32862306a36Sopenharmony_cinamed "Charlie", containing just CPUs 2 and 3, and Memory Node 1, 32962306a36Sopenharmony_ciand then start a subshell 'sh' in that cgroup:: 33062306a36Sopenharmony_ci 33162306a36Sopenharmony_ci mount -t tmpfs cgroup_root /sys/fs/cgroup 33262306a36Sopenharmony_ci mkdir /sys/fs/cgroup/cpuset 33362306a36Sopenharmony_ci mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset 33462306a36Sopenharmony_ci cd /sys/fs/cgroup/cpuset 33562306a36Sopenharmony_ci mkdir Charlie 33662306a36Sopenharmony_ci cd Charlie 33762306a36Sopenharmony_ci /bin/echo 2-3 > cpuset.cpus 33862306a36Sopenharmony_ci /bin/echo 1 > cpuset.mems 33962306a36Sopenharmony_ci /bin/echo $$ > tasks 34062306a36Sopenharmony_ci sh 34162306a36Sopenharmony_ci # The subshell 'sh' is now running in cgroup Charlie 34262306a36Sopenharmony_ci # The next line should display '/Charlie' 34362306a36Sopenharmony_ci cat /proc/self/cgroup 34462306a36Sopenharmony_ci 34562306a36Sopenharmony_ci2. Usage Examples and Syntax 34662306a36Sopenharmony_ci============================ 34762306a36Sopenharmony_ci 34862306a36Sopenharmony_ci2.1 Basic Usage 34962306a36Sopenharmony_ci--------------- 35062306a36Sopenharmony_ci 35162306a36Sopenharmony_ciCreating, modifying, using cgroups can be done through the cgroup 35262306a36Sopenharmony_civirtual filesystem. 35362306a36Sopenharmony_ci 35462306a36Sopenharmony_ciTo mount a cgroup hierarchy with all available subsystems, type:: 35562306a36Sopenharmony_ci 35662306a36Sopenharmony_ci # mount -t cgroup xxx /sys/fs/cgroup 35762306a36Sopenharmony_ci 35862306a36Sopenharmony_ciThe "xxx" is not interpreted by the cgroup code, but will appear in 35962306a36Sopenharmony_ci/proc/mounts so may be any useful identifying string that you like. 36062306a36Sopenharmony_ci 36162306a36Sopenharmony_ciNote: Some subsystems do not work without some user input first. For instance, 36262306a36Sopenharmony_ciif cpusets are enabled the user will have to populate the cpus and mems files 36362306a36Sopenharmony_cifor each new cgroup created before that group can be used. 36462306a36Sopenharmony_ci 36562306a36Sopenharmony_ciAs explained in section `1.2 Why are cgroups needed?` you should create 36662306a36Sopenharmony_cidifferent hierarchies of cgroups for each single resource or group of 36762306a36Sopenharmony_ciresources you want to control. Therefore, you should mount a tmpfs on 36862306a36Sopenharmony_ci/sys/fs/cgroup and create directories for each cgroup resource or resource 36962306a36Sopenharmony_cigroup:: 37062306a36Sopenharmony_ci 37162306a36Sopenharmony_ci # mount -t tmpfs cgroup_root /sys/fs/cgroup 37262306a36Sopenharmony_ci # mkdir /sys/fs/cgroup/rg1 37362306a36Sopenharmony_ci 37462306a36Sopenharmony_ciTo mount a cgroup hierarchy with just the cpuset and memory 37562306a36Sopenharmony_cisubsystems, type:: 37662306a36Sopenharmony_ci 37762306a36Sopenharmony_ci # mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1 37862306a36Sopenharmony_ci 37962306a36Sopenharmony_ciWhile remounting cgroups is currently supported, it is not recommend 38062306a36Sopenharmony_cito use it. Remounting allows changing bound subsystems and 38162306a36Sopenharmony_cirelease_agent. Rebinding is hardly useful as it only works when the 38262306a36Sopenharmony_cihierarchy is empty and release_agent itself should be replaced with 38362306a36Sopenharmony_ciconventional fsnotify. The support for remounting will be removed in 38462306a36Sopenharmony_cithe future. 38562306a36Sopenharmony_ci 38662306a36Sopenharmony_ciTo Specify a hierarchy's release_agent:: 38762306a36Sopenharmony_ci 38862306a36Sopenharmony_ci # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ 38962306a36Sopenharmony_ci xxx /sys/fs/cgroup/rg1 39062306a36Sopenharmony_ci 39162306a36Sopenharmony_ciNote that specifying 'release_agent' more than once will return failure. 39262306a36Sopenharmony_ci 39362306a36Sopenharmony_ciNote that changing the set of subsystems is currently only supported 39462306a36Sopenharmony_ciwhen the hierarchy consists of a single (root) cgroup. Supporting 39562306a36Sopenharmony_cithe ability to arbitrarily bind/unbind subsystems from an existing 39662306a36Sopenharmony_cicgroup hierarchy is intended to be implemented in the future. 39762306a36Sopenharmony_ci 39862306a36Sopenharmony_ciThen under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the 39962306a36Sopenharmony_citree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 40062306a36Sopenharmony_ciis the cgroup that holds the whole system. 40162306a36Sopenharmony_ci 40262306a36Sopenharmony_ciIf you want to change the value of release_agent:: 40362306a36Sopenharmony_ci 40462306a36Sopenharmony_ci # echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent 40562306a36Sopenharmony_ci 40662306a36Sopenharmony_ciIt can also be changed via remount. 40762306a36Sopenharmony_ci 40862306a36Sopenharmony_ciIf you want to create a new cgroup under /sys/fs/cgroup/rg1:: 40962306a36Sopenharmony_ci 41062306a36Sopenharmony_ci # cd /sys/fs/cgroup/rg1 41162306a36Sopenharmony_ci # mkdir my_cgroup 41262306a36Sopenharmony_ci 41362306a36Sopenharmony_ciNow you want to do something with this cgroup: 41462306a36Sopenharmony_ci 41562306a36Sopenharmony_ci # cd my_cgroup 41662306a36Sopenharmony_ci 41762306a36Sopenharmony_ciIn this directory you can find several files:: 41862306a36Sopenharmony_ci 41962306a36Sopenharmony_ci # ls 42062306a36Sopenharmony_ci cgroup.procs notify_on_release tasks 42162306a36Sopenharmony_ci (plus whatever files added by the attached subsystems) 42262306a36Sopenharmony_ci 42362306a36Sopenharmony_ciNow attach your shell to this cgroup:: 42462306a36Sopenharmony_ci 42562306a36Sopenharmony_ci # /bin/echo $$ > tasks 42662306a36Sopenharmony_ci 42762306a36Sopenharmony_ciYou can also create cgroups inside your cgroup by using mkdir in this 42862306a36Sopenharmony_cidirectory:: 42962306a36Sopenharmony_ci 43062306a36Sopenharmony_ci # mkdir my_sub_cs 43162306a36Sopenharmony_ci 43262306a36Sopenharmony_ciTo remove a cgroup, just use rmdir:: 43362306a36Sopenharmony_ci 43462306a36Sopenharmony_ci # rmdir my_sub_cs 43562306a36Sopenharmony_ci 43662306a36Sopenharmony_ciThis will fail if the cgroup is in use (has cgroups inside, or 43762306a36Sopenharmony_cihas processes attached, or is held alive by other subsystem-specific 43862306a36Sopenharmony_cireference). 43962306a36Sopenharmony_ci 44062306a36Sopenharmony_ci2.2 Attaching processes 44162306a36Sopenharmony_ci----------------------- 44262306a36Sopenharmony_ci 44362306a36Sopenharmony_ci:: 44462306a36Sopenharmony_ci 44562306a36Sopenharmony_ci # /bin/echo PID > tasks 44662306a36Sopenharmony_ci 44762306a36Sopenharmony_ciNote that it is PID, not PIDs. You can only attach ONE task at a time. 44862306a36Sopenharmony_ciIf you have several tasks to attach, you have to do it one after another:: 44962306a36Sopenharmony_ci 45062306a36Sopenharmony_ci # /bin/echo PID1 > tasks 45162306a36Sopenharmony_ci # /bin/echo PID2 > tasks 45262306a36Sopenharmony_ci ... 45362306a36Sopenharmony_ci # /bin/echo PIDn > tasks 45462306a36Sopenharmony_ci 45562306a36Sopenharmony_ciYou can attach the current shell task by echoing 0:: 45662306a36Sopenharmony_ci 45762306a36Sopenharmony_ci # echo 0 > tasks 45862306a36Sopenharmony_ci 45962306a36Sopenharmony_ciYou can use the cgroup.procs file instead of the tasks file to move all 46062306a36Sopenharmony_cithreads in a threadgroup at once. Echoing the PID of any task in a 46162306a36Sopenharmony_cithreadgroup to cgroup.procs causes all tasks in that threadgroup to be 46262306a36Sopenharmony_ciattached to the cgroup. Writing 0 to cgroup.procs moves all tasks 46362306a36Sopenharmony_ciin the writing task's threadgroup. 46462306a36Sopenharmony_ci 46562306a36Sopenharmony_ciNote: Since every task is always a member of exactly one cgroup in each 46662306a36Sopenharmony_cimounted hierarchy, to remove a task from its current cgroup you must 46762306a36Sopenharmony_cimove it into a new cgroup (possibly the root cgroup) by writing to the 46862306a36Sopenharmony_cinew cgroup's tasks file. 46962306a36Sopenharmony_ci 47062306a36Sopenharmony_ciNote: Due to some restrictions enforced by some cgroup subsystems, moving 47162306a36Sopenharmony_cia process to another cgroup can fail. 47262306a36Sopenharmony_ci 47362306a36Sopenharmony_ci2.3 Mounting hierarchies by name 47462306a36Sopenharmony_ci-------------------------------- 47562306a36Sopenharmony_ci 47662306a36Sopenharmony_ciPassing the name=<x> option when mounting a cgroups hierarchy 47762306a36Sopenharmony_ciassociates the given name with the hierarchy. This can be used when 47862306a36Sopenharmony_cimounting a pre-existing hierarchy, in order to refer to it by name 47962306a36Sopenharmony_cirather than by its set of active subsystems. Each hierarchy is either 48062306a36Sopenharmony_cinameless, or has a unique name. 48162306a36Sopenharmony_ci 48262306a36Sopenharmony_ciThe name should match [\w.-]+ 48362306a36Sopenharmony_ci 48462306a36Sopenharmony_ciWhen passing a name=<x> option for a new hierarchy, you need to 48562306a36Sopenharmony_cispecify subsystems manually; the legacy behaviour of mounting all 48662306a36Sopenharmony_cisubsystems when none are explicitly specified is not supported when 48762306a36Sopenharmony_ciyou give a subsystem a name. 48862306a36Sopenharmony_ci 48962306a36Sopenharmony_ciThe name of the subsystem appears as part of the hierarchy description 49062306a36Sopenharmony_ciin /proc/mounts and /proc/<pid>/cgroups. 49162306a36Sopenharmony_ci 49262306a36Sopenharmony_ci 49362306a36Sopenharmony_ci3. Kernel API 49462306a36Sopenharmony_ci============= 49562306a36Sopenharmony_ci 49662306a36Sopenharmony_ci3.1 Overview 49762306a36Sopenharmony_ci------------ 49862306a36Sopenharmony_ci 49962306a36Sopenharmony_ciEach kernel subsystem that wants to hook into the generic cgroup 50062306a36Sopenharmony_cisystem needs to create a cgroup_subsys object. This contains 50162306a36Sopenharmony_civarious methods, which are callbacks from the cgroup system, along 50262306a36Sopenharmony_ciwith a subsystem ID which will be assigned by the cgroup system. 50362306a36Sopenharmony_ci 50462306a36Sopenharmony_ciOther fields in the cgroup_subsys object include: 50562306a36Sopenharmony_ci 50662306a36Sopenharmony_ci- subsys_id: a unique array index for the subsystem, indicating which 50762306a36Sopenharmony_ci entry in cgroup->subsys[] this subsystem should be managing. 50862306a36Sopenharmony_ci 50962306a36Sopenharmony_ci- name: should be initialized to a unique subsystem name. Should be 51062306a36Sopenharmony_ci no longer than MAX_CGROUP_TYPE_NAMELEN. 51162306a36Sopenharmony_ci 51262306a36Sopenharmony_ci- early_init: indicate if the subsystem needs early initialization 51362306a36Sopenharmony_ci at system boot. 51462306a36Sopenharmony_ci 51562306a36Sopenharmony_ciEach cgroup object created by the system has an array of pointers, 51662306a36Sopenharmony_ciindexed by subsystem ID; this pointer is entirely managed by the 51762306a36Sopenharmony_cisubsystem; the generic cgroup code will never touch this pointer. 51862306a36Sopenharmony_ci 51962306a36Sopenharmony_ci3.2 Synchronization 52062306a36Sopenharmony_ci------------------- 52162306a36Sopenharmony_ci 52262306a36Sopenharmony_ciThere is a global mutex, cgroup_mutex, used by the cgroup 52362306a36Sopenharmony_cisystem. This should be taken by anything that wants to modify a 52462306a36Sopenharmony_cicgroup. It may also be taken to prevent cgroups from being 52562306a36Sopenharmony_cimodified, but more specific locks may be more appropriate in that 52662306a36Sopenharmony_cisituation. 52762306a36Sopenharmony_ci 52862306a36Sopenharmony_ciSee kernel/cgroup.c for more details. 52962306a36Sopenharmony_ci 53062306a36Sopenharmony_ciSubsystems can take/release the cgroup_mutex via the functions 53162306a36Sopenharmony_cicgroup_lock()/cgroup_unlock(). 53262306a36Sopenharmony_ci 53362306a36Sopenharmony_ciAccessing a task's cgroup pointer may be done in the following ways: 53462306a36Sopenharmony_ci- while holding cgroup_mutex 53562306a36Sopenharmony_ci- while holding the task's alloc_lock (via task_lock()) 53662306a36Sopenharmony_ci- inside an rcu_read_lock() section via rcu_dereference() 53762306a36Sopenharmony_ci 53862306a36Sopenharmony_ci3.3 Subsystem API 53962306a36Sopenharmony_ci----------------- 54062306a36Sopenharmony_ci 54162306a36Sopenharmony_ciEach subsystem should: 54262306a36Sopenharmony_ci 54362306a36Sopenharmony_ci- add an entry in linux/cgroup_subsys.h 54462306a36Sopenharmony_ci- define a cgroup_subsys object called <name>_cgrp_subsys 54562306a36Sopenharmony_ci 54662306a36Sopenharmony_ciEach subsystem may export the following methods. The only mandatory 54762306a36Sopenharmony_cimethods are css_alloc/free. Any others that are null are presumed to 54862306a36Sopenharmony_cibe successful no-ops. 54962306a36Sopenharmony_ci 55062306a36Sopenharmony_ci``struct cgroup_subsys_state *css_alloc(struct cgroup *cgrp)`` 55162306a36Sopenharmony_ci(cgroup_mutex held by caller) 55262306a36Sopenharmony_ci 55362306a36Sopenharmony_ciCalled to allocate a subsystem state object for a cgroup. The 55462306a36Sopenharmony_cisubsystem should allocate its subsystem state object for the passed 55562306a36Sopenharmony_cicgroup, returning a pointer to the new object on success or a 55662306a36Sopenharmony_ciERR_PTR() value. On success, the subsystem pointer should point to 55762306a36Sopenharmony_cia structure of type cgroup_subsys_state (typically embedded in a 55862306a36Sopenharmony_cilarger subsystem-specific object), which will be initialized by the 55962306a36Sopenharmony_cicgroup system. Note that this will be called at initialization to 56062306a36Sopenharmony_cicreate the root subsystem state for this subsystem; this case can be 56162306a36Sopenharmony_ciidentified by the passed cgroup object having a NULL parent (since 56262306a36Sopenharmony_ciit's the root of the hierarchy) and may be an appropriate place for 56362306a36Sopenharmony_ciinitialization code. 56462306a36Sopenharmony_ci 56562306a36Sopenharmony_ci``int css_online(struct cgroup *cgrp)`` 56662306a36Sopenharmony_ci(cgroup_mutex held by caller) 56762306a36Sopenharmony_ci 56862306a36Sopenharmony_ciCalled after @cgrp successfully completed all allocations and made 56962306a36Sopenharmony_civisible to cgroup_for_each_child/descendant_*() iterators. The 57062306a36Sopenharmony_cisubsystem may choose to fail creation by returning -errno. This 57162306a36Sopenharmony_cicallback can be used to implement reliable state sharing and 57262306a36Sopenharmony_cipropagation along the hierarchy. See the comment on 57362306a36Sopenharmony_cicgroup_for_each_descendant_pre() for details. 57462306a36Sopenharmony_ci 57562306a36Sopenharmony_ci``void css_offline(struct cgroup *cgrp);`` 57662306a36Sopenharmony_ci(cgroup_mutex held by caller) 57762306a36Sopenharmony_ci 57862306a36Sopenharmony_ciThis is the counterpart of css_online() and called iff css_online() 57962306a36Sopenharmony_cihas succeeded on @cgrp. This signifies the beginning of the end of 58062306a36Sopenharmony_ci@cgrp. @cgrp is being removed and the subsystem should start dropping 58162306a36Sopenharmony_ciall references it's holding on @cgrp. When all references are dropped, 58262306a36Sopenharmony_cicgroup removal will proceed to the next step - css_free(). After this 58362306a36Sopenharmony_cicallback, @cgrp should be considered dead to the subsystem. 58462306a36Sopenharmony_ci 58562306a36Sopenharmony_ci``void css_free(struct cgroup *cgrp)`` 58662306a36Sopenharmony_ci(cgroup_mutex held by caller) 58762306a36Sopenharmony_ci 58862306a36Sopenharmony_ciThe cgroup system is about to free @cgrp; the subsystem should free 58962306a36Sopenharmony_ciits subsystem state object. By the time this method is called, @cgrp 59062306a36Sopenharmony_ciis completely unused; @cgrp->parent is still valid. (Note - can also 59162306a36Sopenharmony_cibe called for a newly-created cgroup if an error occurs after this 59262306a36Sopenharmony_cisubsystem's create() method has been called for the new cgroup). 59362306a36Sopenharmony_ci 59462306a36Sopenharmony_ci``int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)`` 59562306a36Sopenharmony_ci(cgroup_mutex held by caller) 59662306a36Sopenharmony_ci 59762306a36Sopenharmony_ciCalled prior to moving one or more tasks into a cgroup; if the 59862306a36Sopenharmony_cisubsystem returns an error, this will abort the attach operation. 59962306a36Sopenharmony_ci@tset contains the tasks to be attached and is guaranteed to have at 60062306a36Sopenharmony_cileast one task in it. 60162306a36Sopenharmony_ci 60262306a36Sopenharmony_ciIf there are multiple tasks in the taskset, then: 60362306a36Sopenharmony_ci - it's guaranteed that all are from the same thread group 60462306a36Sopenharmony_ci - @tset contains all tasks from the thread group whether or not 60562306a36Sopenharmony_ci they're switching cgroups 60662306a36Sopenharmony_ci - the first task is the leader 60762306a36Sopenharmony_ci 60862306a36Sopenharmony_ciEach @tset entry also contains the task's old cgroup and tasks which 60962306a36Sopenharmony_ciaren't switching cgroup can be skipped easily using the 61062306a36Sopenharmony_cicgroup_taskset_for_each() iterator. Note that this isn't called on a 61162306a36Sopenharmony_cifork. If this method returns 0 (success) then this should remain valid 61262306a36Sopenharmony_ciwhile the caller holds cgroup_mutex and it is ensured that either 61362306a36Sopenharmony_ciattach() or cancel_attach() will be called in future. 61462306a36Sopenharmony_ci 61562306a36Sopenharmony_ci``void css_reset(struct cgroup_subsys_state *css)`` 61662306a36Sopenharmony_ci(cgroup_mutex held by caller) 61762306a36Sopenharmony_ci 61862306a36Sopenharmony_ciAn optional operation which should restore @css's configuration to the 61962306a36Sopenharmony_ciinitial state. This is currently only used on the unified hierarchy 62062306a36Sopenharmony_ciwhen a subsystem is disabled on a cgroup through 62162306a36Sopenharmony_ci"cgroup.subtree_control" but should remain enabled because other 62262306a36Sopenharmony_cisubsystems depend on it. cgroup core makes such a css invisible by 62362306a36Sopenharmony_ciremoving the associated interface files and invokes this callback so 62462306a36Sopenharmony_cithat the hidden subsystem can return to the initial neutral state. 62562306a36Sopenharmony_ciThis prevents unexpected resource control from a hidden css and 62662306a36Sopenharmony_ciensures that the configuration is in the initial state when it is made 62762306a36Sopenharmony_civisible again later. 62862306a36Sopenharmony_ci 62962306a36Sopenharmony_ci``void cancel_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)`` 63062306a36Sopenharmony_ci(cgroup_mutex held by caller) 63162306a36Sopenharmony_ci 63262306a36Sopenharmony_ciCalled when a task attach operation has failed after can_attach() has succeeded. 63362306a36Sopenharmony_ciA subsystem whose can_attach() has some side-effects should provide this 63462306a36Sopenharmony_cifunction, so that the subsystem can implement a rollback. If not, not necessary. 63562306a36Sopenharmony_ciThis will be called only about subsystems whose can_attach() operation have 63662306a36Sopenharmony_cisucceeded. The parameters are identical to can_attach(). 63762306a36Sopenharmony_ci 63862306a36Sopenharmony_ci``void attach(struct cgroup *cgrp, struct cgroup_taskset *tset)`` 63962306a36Sopenharmony_ci(cgroup_mutex held by caller) 64062306a36Sopenharmony_ci 64162306a36Sopenharmony_ciCalled after the task has been attached to the cgroup, to allow any 64262306a36Sopenharmony_cipost-attachment activity that requires memory allocations or blocking. 64362306a36Sopenharmony_ciThe parameters are identical to can_attach(). 64462306a36Sopenharmony_ci 64562306a36Sopenharmony_ci``void fork(struct task_struct *task)`` 64662306a36Sopenharmony_ci 64762306a36Sopenharmony_ciCalled when a task is forked into a cgroup. 64862306a36Sopenharmony_ci 64962306a36Sopenharmony_ci``void exit(struct task_struct *task)`` 65062306a36Sopenharmony_ci 65162306a36Sopenharmony_ciCalled during task exit. 65262306a36Sopenharmony_ci 65362306a36Sopenharmony_ci``void free(struct task_struct *task)`` 65462306a36Sopenharmony_ci 65562306a36Sopenharmony_ciCalled when the task_struct is freed. 65662306a36Sopenharmony_ci 65762306a36Sopenharmony_ci``void bind(struct cgroup *root)`` 65862306a36Sopenharmony_ci(cgroup_mutex held by caller) 65962306a36Sopenharmony_ci 66062306a36Sopenharmony_ciCalled when a cgroup subsystem is rebound to a different hierarchy 66162306a36Sopenharmony_ciand root cgroup. Currently this will only involve movement between 66262306a36Sopenharmony_cithe default hierarchy (which never has sub-cgroups) and a hierarchy 66362306a36Sopenharmony_cithat is being created/destroyed (and hence has no sub-cgroups). 66462306a36Sopenharmony_ci 66562306a36Sopenharmony_ci4. Extended attribute usage 66662306a36Sopenharmony_ci=========================== 66762306a36Sopenharmony_ci 66862306a36Sopenharmony_cicgroup filesystem supports certain types of extended attributes in its 66962306a36Sopenharmony_cidirectories and files. The current supported types are: 67062306a36Sopenharmony_ci 67162306a36Sopenharmony_ci - Trusted (XATTR_TRUSTED) 67262306a36Sopenharmony_ci - Security (XATTR_SECURITY) 67362306a36Sopenharmony_ci 67462306a36Sopenharmony_ciBoth require CAP_SYS_ADMIN capability to set. 67562306a36Sopenharmony_ci 67662306a36Sopenharmony_ciLike in tmpfs, the extended attributes in cgroup filesystem are stored 67762306a36Sopenharmony_ciusing kernel memory and it's advised to keep the usage at minimum. This 67862306a36Sopenharmony_ciis the reason why user defined extended attributes are not supported, since 67962306a36Sopenharmony_ciany user can do it and there's no limit in the value size. 68062306a36Sopenharmony_ci 68162306a36Sopenharmony_ciThe current known users for this feature are SELinux to limit cgroup usage 68262306a36Sopenharmony_ciin containers and systemd for assorted meta data like main PID in a cgroup 68362306a36Sopenharmony_ci(systemd creates a cgroup per service). 68462306a36Sopenharmony_ci 68562306a36Sopenharmony_ci5. Questions 68662306a36Sopenharmony_ci============ 68762306a36Sopenharmony_ci 68862306a36Sopenharmony_ci:: 68962306a36Sopenharmony_ci 69062306a36Sopenharmony_ci Q: what's up with this '/bin/echo' ? 69162306a36Sopenharmony_ci A: bash's builtin 'echo' command does not check calls to write() against 69262306a36Sopenharmony_ci errors. If you use it in the cgroup file system, you won't be 69362306a36Sopenharmony_ci able to tell whether a command succeeded or failed. 69462306a36Sopenharmony_ci 69562306a36Sopenharmony_ci Q: When I attach processes, only the first of the line gets really attached ! 69662306a36Sopenharmony_ci A: We can only return one error code per call to write(). So you should also 69762306a36Sopenharmony_ci put only ONE PID. 698