18c2ecf20Sopenharmony_ci==============
28c2ecf20Sopenharmony_ciControl Groups
38c2ecf20Sopenharmony_ci==============
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciWritten by Paul Menage <menage@google.com> based on
68c2ecf20Sopenharmony_ciDocumentation/admin-guide/cgroup-v1/cpusets.rst
78c2ecf20Sopenharmony_ci
88c2ecf20Sopenharmony_ciOriginal copyright statements from cpusets.txt:
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciPortions Copyright (C) 2004 BULL SA.
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciPortions Copyright (c) 2004-2006 Silicon Graphics, Inc.
138c2ecf20Sopenharmony_ci
148c2ecf20Sopenharmony_ciModified by Paul Jackson <pj@sgi.com>
158c2ecf20Sopenharmony_ci
168c2ecf20Sopenharmony_ciModified by Christoph Lameter <cl@linux.com>
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_ci.. CONTENTS:
198c2ecf20Sopenharmony_ci
208c2ecf20Sopenharmony_ci	1. Control Groups
218c2ecf20Sopenharmony_ci	1.1 What are cgroups ?
228c2ecf20Sopenharmony_ci	1.2 Why are cgroups needed ?
238c2ecf20Sopenharmony_ci	1.3 How are cgroups implemented ?
248c2ecf20Sopenharmony_ci	1.4 What does notify_on_release do ?
258c2ecf20Sopenharmony_ci	1.5 What does clone_children do ?
268c2ecf20Sopenharmony_ci	1.6 How do I use cgroups ?
278c2ecf20Sopenharmony_ci	2. Usage Examples and Syntax
288c2ecf20Sopenharmony_ci	2.1 Basic Usage
298c2ecf20Sopenharmony_ci	2.2 Attaching processes
308c2ecf20Sopenharmony_ci	2.3 Mounting hierarchies by name
318c2ecf20Sopenharmony_ci	3. Kernel API
328c2ecf20Sopenharmony_ci	3.1 Overview
338c2ecf20Sopenharmony_ci	3.2 Synchronization
348c2ecf20Sopenharmony_ci	3.3 Subsystem API
358c2ecf20Sopenharmony_ci	4. Extended attributes usage
368c2ecf20Sopenharmony_ci	5. Questions
378c2ecf20Sopenharmony_ci
388c2ecf20Sopenharmony_ci1. Control Groups
398c2ecf20Sopenharmony_ci=================
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ci1.1 What are cgroups ?
428c2ecf20Sopenharmony_ci----------------------
438c2ecf20Sopenharmony_ci
448c2ecf20Sopenharmony_ciControl Groups provide a mechanism for aggregating/partitioning sets of
458c2ecf20Sopenharmony_citasks, and all their future children, into hierarchical groups with
468c2ecf20Sopenharmony_cispecialized behaviour.
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ciDefinitions:
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_ciA *cgroup* associates a set of tasks with a set of parameters for one
518c2ecf20Sopenharmony_cior more subsystems.
528c2ecf20Sopenharmony_ci
538c2ecf20Sopenharmony_ciA *subsystem* is a module that makes use of the task grouping
548c2ecf20Sopenharmony_cifacilities provided by cgroups to treat groups of tasks in
558c2ecf20Sopenharmony_ciparticular ways. A subsystem is typically a "resource controller" that
568c2ecf20Sopenharmony_cischedules a resource or applies per-cgroup limits, but it may be
578c2ecf20Sopenharmony_cianything that wants to act on a group of processes, e.g. a
588c2ecf20Sopenharmony_civirtualization subsystem.
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_ciA *hierarchy* is a set of cgroups arranged in a tree, such that
618c2ecf20Sopenharmony_cievery task in the system is in exactly one of the cgroups in the
628c2ecf20Sopenharmony_cihierarchy, and a set of subsystems; each subsystem has system-specific
638c2ecf20Sopenharmony_cistate attached to each cgroup in the hierarchy.  Each hierarchy has
648c2ecf20Sopenharmony_cian instance of the cgroup virtual filesystem associated with it.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_ciAt any one time there may be multiple active hierarchies of task
678c2ecf20Sopenharmony_cicgroups. Each hierarchy is a partition of all tasks in the system.
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_ciUser-level code may create and destroy cgroups by name in an
708c2ecf20Sopenharmony_ciinstance of the cgroup virtual file system, specify and query to
718c2ecf20Sopenharmony_ciwhich cgroup a task is assigned, and list the task PIDs assigned to
728c2ecf20Sopenharmony_cia cgroup. Those creations and assignments only affect the hierarchy
738c2ecf20Sopenharmony_ciassociated with that instance of the cgroup file system.
748c2ecf20Sopenharmony_ci
758c2ecf20Sopenharmony_ciOn their own, the only use for cgroups is for simple job
768c2ecf20Sopenharmony_citracking. The intention is that other subsystems hook into the generic
778c2ecf20Sopenharmony_cicgroup support to provide new attributes for cgroups, such as
788c2ecf20Sopenharmony_ciaccounting/limiting the resources which processes in a cgroup can
798c2ecf20Sopenharmony_ciaccess. For example, cpusets (see Documentation/admin-guide/cgroup-v1/cpusets.rst) allow
808c2ecf20Sopenharmony_ciyou to associate a set of CPUs and a set of memory nodes with the
818c2ecf20Sopenharmony_citasks in each cgroup.
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ci1.2 Why are cgroups needed ?
848c2ecf20Sopenharmony_ci----------------------------
858c2ecf20Sopenharmony_ci
868c2ecf20Sopenharmony_ciThere are multiple efforts to provide process aggregations in the
878c2ecf20Sopenharmony_ciLinux kernel, mainly for resource-tracking purposes. Such efforts
888c2ecf20Sopenharmony_ciinclude cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
898c2ecf20Sopenharmony_cinamespaces. These all require the basic notion of a
908c2ecf20Sopenharmony_cigrouping/partitioning of processes, with newly forked processes ending
918c2ecf20Sopenharmony_ciup in the same group (cgroup) as their parent process.
928c2ecf20Sopenharmony_ci
938c2ecf20Sopenharmony_ciThe kernel cgroup patch provides the minimum essential kernel
948c2ecf20Sopenharmony_cimechanisms required to efficiently implement such groups. It has
958c2ecf20Sopenharmony_ciminimal impact on the system fast paths, and provides hooks for
968c2ecf20Sopenharmony_cispecific subsystems such as cpusets to provide additional behaviour as
978c2ecf20Sopenharmony_cidesired.
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_ciMultiple hierarchy support is provided to allow for situations where
1008c2ecf20Sopenharmony_cithe division of tasks into cgroups is distinctly different for
1018c2ecf20Sopenharmony_cidifferent subsystems - having parallel hierarchies allows each
1028c2ecf20Sopenharmony_cihierarchy to be a natural division of tasks, without having to handle
1038c2ecf20Sopenharmony_cicomplex combinations of tasks that would be present if several
1048c2ecf20Sopenharmony_ciunrelated subsystems needed to be forced into the same tree of
1058c2ecf20Sopenharmony_cicgroups.
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciAt one extreme, each resource controller or subsystem could be in a
1088c2ecf20Sopenharmony_ciseparate hierarchy; at the other extreme, all subsystems
1098c2ecf20Sopenharmony_ciwould be attached to the same hierarchy.
1108c2ecf20Sopenharmony_ci
1118c2ecf20Sopenharmony_ciAs an example of a scenario (originally proposed by vatsa@in.ibm.com)
1128c2ecf20Sopenharmony_cithat can benefit from multiple hierarchies, consider a large
1138c2ecf20Sopenharmony_ciuniversity server with various users - students, professors, system
1148c2ecf20Sopenharmony_citasks etc. The resource planning for this server could be along the
1158c2ecf20Sopenharmony_cifollowing lines::
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ci       CPU :          "Top cpuset"
1188c2ecf20Sopenharmony_ci                       /       \
1198c2ecf20Sopenharmony_ci               CPUSet1         CPUSet2
1208c2ecf20Sopenharmony_ci                  |               |
1218c2ecf20Sopenharmony_ci               (Professors)    (Students)
1228c2ecf20Sopenharmony_ci
1238c2ecf20Sopenharmony_ci               In addition (system tasks) are attached to topcpuset (so
1248c2ecf20Sopenharmony_ci               that they can run anywhere) with a limit of 20%
1258c2ecf20Sopenharmony_ci
1268c2ecf20Sopenharmony_ci       Memory : Professors (50%), Students (30%), system (20%)
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ci       Disk : Professors (50%), Students (30%), system (20%)
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ci       Network : WWW browsing (20%), Network File System (60%), others (20%)
1318c2ecf20Sopenharmony_ci                               / \
1328c2ecf20Sopenharmony_ci               Professors (15%)  students (5%)
1338c2ecf20Sopenharmony_ci
1348c2ecf20Sopenharmony_ciBrowsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes
1358c2ecf20Sopenharmony_ciinto the NFS network class.
1368c2ecf20Sopenharmony_ci
1378c2ecf20Sopenharmony_ciAt the same time Firefox/Lynx will share an appropriate CPU/Memory class
1388c2ecf20Sopenharmony_cidepending on who launched it (prof/student).
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ciWith the ability to classify tasks differently for different resources
1418c2ecf20Sopenharmony_ci(by putting those resource subsystems in different hierarchies),
1428c2ecf20Sopenharmony_cithe admin can easily set up a script which receives exec notifications
1438c2ecf20Sopenharmony_ciand depending on who is launching the browser he can::
1448c2ecf20Sopenharmony_ci
1458c2ecf20Sopenharmony_ci    # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks
1468c2ecf20Sopenharmony_ci
1478c2ecf20Sopenharmony_ciWith only a single hierarchy, he now would potentially have to create
1488c2ecf20Sopenharmony_cia separate cgroup for every browser launched and associate it with
1498c2ecf20Sopenharmony_ciappropriate network and other resource class.  This may lead to
1508c2ecf20Sopenharmony_ciproliferation of such cgroups.
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ciAlso let's say that the administrator would like to give enhanced network
1538c2ecf20Sopenharmony_ciaccess temporarily to a student's browser (since it is night and the user
1548c2ecf20Sopenharmony_ciwants to do online gaming :))  OR give one of the student's simulation
1558c2ecf20Sopenharmony_ciapps enhanced CPU power.
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ciWith ability to write PIDs directly to resource classes, it's just a
1588c2ecf20Sopenharmony_cimatter of::
1598c2ecf20Sopenharmony_ci
1608c2ecf20Sopenharmony_ci       # echo pid > /sys/fs/cgroup/network/<new_class>/tasks
1618c2ecf20Sopenharmony_ci       (after some time)
1628c2ecf20Sopenharmony_ci       # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ciWithout this ability, the administrator would have to split the cgroup into
1658c2ecf20Sopenharmony_cimultiple separate ones and then associate the new cgroups with the
1668c2ecf20Sopenharmony_cinew resource classes.
1678c2ecf20Sopenharmony_ci
1688c2ecf20Sopenharmony_ci
1698c2ecf20Sopenharmony_ci
1708c2ecf20Sopenharmony_ci1.3 How are cgroups implemented ?
1718c2ecf20Sopenharmony_ci---------------------------------
1728c2ecf20Sopenharmony_ci
1738c2ecf20Sopenharmony_ciControl Groups extends the kernel as follows:
1748c2ecf20Sopenharmony_ci
1758c2ecf20Sopenharmony_ci - Each task in the system has a reference-counted pointer to a
1768c2ecf20Sopenharmony_ci   css_set.
1778c2ecf20Sopenharmony_ci
1788c2ecf20Sopenharmony_ci - A css_set contains a set of reference-counted pointers to
1798c2ecf20Sopenharmony_ci   cgroup_subsys_state objects, one for each cgroup subsystem
1808c2ecf20Sopenharmony_ci   registered in the system. There is no direct link from a task to
1818c2ecf20Sopenharmony_ci   the cgroup of which it's a member in each hierarchy, but this
1828c2ecf20Sopenharmony_ci   can be determined by following pointers through the
1838c2ecf20Sopenharmony_ci   cgroup_subsys_state objects. This is because accessing the
1848c2ecf20Sopenharmony_ci   subsystem state is something that's expected to happen frequently
1858c2ecf20Sopenharmony_ci   and in performance-critical code, whereas operations that require a
1868c2ecf20Sopenharmony_ci   task's actual cgroup assignments (in particular, moving between
1878c2ecf20Sopenharmony_ci   cgroups) are less common. A linked list runs through the cg_list
1888c2ecf20Sopenharmony_ci   field of each task_struct using the css_set, anchored at
1898c2ecf20Sopenharmony_ci   css_set->tasks.
1908c2ecf20Sopenharmony_ci
1918c2ecf20Sopenharmony_ci - A cgroup hierarchy filesystem can be mounted for browsing and
1928c2ecf20Sopenharmony_ci   manipulation from user space.
1938c2ecf20Sopenharmony_ci
1948c2ecf20Sopenharmony_ci - You can list all the tasks (by PID) attached to any cgroup.
1958c2ecf20Sopenharmony_ci
1968c2ecf20Sopenharmony_ciThe implementation of cgroups requires a few, simple hooks
1978c2ecf20Sopenharmony_ciinto the rest of the kernel, none in performance-critical paths:
1988c2ecf20Sopenharmony_ci
1998c2ecf20Sopenharmony_ci - in init/main.c, to initialize the root cgroups and initial
2008c2ecf20Sopenharmony_ci   css_set at system boot.
2018c2ecf20Sopenharmony_ci
2028c2ecf20Sopenharmony_ci - in fork and exit, to attach and detach a task from its css_set.
2038c2ecf20Sopenharmony_ci
2048c2ecf20Sopenharmony_ciIn addition, a new file system of type "cgroup" may be mounted, to
2058c2ecf20Sopenharmony_cienable browsing and modifying the cgroups presently known to the
2068c2ecf20Sopenharmony_cikernel.  When mounting a cgroup hierarchy, you may specify a
2078c2ecf20Sopenharmony_cicomma-separated list of subsystems to mount as the filesystem mount
2088c2ecf20Sopenharmony_cioptions.  By default, mounting the cgroup filesystem attempts to
2098c2ecf20Sopenharmony_cimount a hierarchy containing all registered subsystems.
2108c2ecf20Sopenharmony_ci
2118c2ecf20Sopenharmony_ciIf an active hierarchy with exactly the same set of subsystems already
2128c2ecf20Sopenharmony_ciexists, it will be reused for the new mount. If no existing hierarchy
2138c2ecf20Sopenharmony_cimatches, and any of the requested subsystems are in use in an existing
2148c2ecf20Sopenharmony_cihierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy
2158c2ecf20Sopenharmony_ciis activated, associated with the requested subsystems.
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_ciIt's not currently possible to bind a new subsystem to an active
2188c2ecf20Sopenharmony_cicgroup hierarchy, or to unbind a subsystem from an active cgroup
2198c2ecf20Sopenharmony_cihierarchy. This may be possible in future, but is fraught with nasty
2208c2ecf20Sopenharmony_cierror-recovery issues.
2218c2ecf20Sopenharmony_ci
2228c2ecf20Sopenharmony_ciWhen a cgroup filesystem is unmounted, if there are any
2238c2ecf20Sopenharmony_cichild cgroups created below the top-level cgroup, that hierarchy
2248c2ecf20Sopenharmony_ciwill remain active even though unmounted; if there are no
2258c2ecf20Sopenharmony_cichild cgroups then the hierarchy will be deactivated.
2268c2ecf20Sopenharmony_ci
2278c2ecf20Sopenharmony_ciNo new system calls are added for cgroups - all support for
2288c2ecf20Sopenharmony_ciquerying and modifying cgroups is via this cgroup file system.
2298c2ecf20Sopenharmony_ci
2308c2ecf20Sopenharmony_ciEach task under /proc has an added file named 'cgroup' displaying,
2318c2ecf20Sopenharmony_cifor each active hierarchy, the subsystem names and the cgroup name
2328c2ecf20Sopenharmony_cias the path relative to the root of the cgroup file system.
2338c2ecf20Sopenharmony_ci
2348c2ecf20Sopenharmony_ciEach cgroup is represented by a directory in the cgroup file system
2358c2ecf20Sopenharmony_cicontaining the following files describing that cgroup:
2368c2ecf20Sopenharmony_ci
2378c2ecf20Sopenharmony_ci - tasks: list of tasks (by PID) attached to that cgroup.  This list
2388c2ecf20Sopenharmony_ci   is not guaranteed to be sorted.  Writing a thread ID into this file
2398c2ecf20Sopenharmony_ci   moves the thread into this cgroup.
2408c2ecf20Sopenharmony_ci - cgroup.procs: list of thread group IDs in the cgroup.  This list is
2418c2ecf20Sopenharmony_ci   not guaranteed to be sorted or free of duplicate TGIDs, and userspace
2428c2ecf20Sopenharmony_ci   should sort/uniquify the list if this property is required.
2438c2ecf20Sopenharmony_ci   Writing a thread group ID into this file moves all threads in that
2448c2ecf20Sopenharmony_ci   group into this cgroup.
2458c2ecf20Sopenharmony_ci - notify_on_release flag: run the release agent on exit?
2468c2ecf20Sopenharmony_ci - release_agent: the path to use for release notifications (this file
2478c2ecf20Sopenharmony_ci   exists in the top cgroup only)
2488c2ecf20Sopenharmony_ci
2498c2ecf20Sopenharmony_ciOther subsystems such as cpusets may add additional files in each
2508c2ecf20Sopenharmony_cicgroup dir.
2518c2ecf20Sopenharmony_ci
2528c2ecf20Sopenharmony_ciNew cgroups are created using the mkdir system call or shell
2538c2ecf20Sopenharmony_cicommand.  The properties of a cgroup, such as its flags, are
2548c2ecf20Sopenharmony_cimodified by writing to the appropriate file in that cgroups
2558c2ecf20Sopenharmony_cidirectory, as listed above.
2568c2ecf20Sopenharmony_ci
2578c2ecf20Sopenharmony_ciThe named hierarchical structure of nested cgroups allows partitioning
2588c2ecf20Sopenharmony_cia large system into nested, dynamically changeable, "soft-partitions".
2598c2ecf20Sopenharmony_ci
2608c2ecf20Sopenharmony_ciThe attachment of each task, automatically inherited at fork by any
2618c2ecf20Sopenharmony_cichildren of that task, to a cgroup allows organizing the work load
2628c2ecf20Sopenharmony_cion a system into related sets of tasks.  A task may be re-attached to
2638c2ecf20Sopenharmony_ciany other cgroup, if allowed by the permissions on the necessary
2648c2ecf20Sopenharmony_cicgroup file system directories.
2658c2ecf20Sopenharmony_ci
2668c2ecf20Sopenharmony_ciWhen a task is moved from one cgroup to another, it gets a new
2678c2ecf20Sopenharmony_cicss_set pointer - if there's an already existing css_set with the
2688c2ecf20Sopenharmony_cidesired collection of cgroups then that group is reused, otherwise a new
2698c2ecf20Sopenharmony_cicss_set is allocated. The appropriate existing css_set is located by
2708c2ecf20Sopenharmony_cilooking into a hash table.
2718c2ecf20Sopenharmony_ci
2728c2ecf20Sopenharmony_ciTo allow access from a cgroup to the css_sets (and hence tasks)
2738c2ecf20Sopenharmony_cithat comprise it, a set of cg_cgroup_link objects form a lattice;
2748c2ecf20Sopenharmony_cieach cg_cgroup_link is linked into a list of cg_cgroup_links for
2758c2ecf20Sopenharmony_cia single cgroup on its cgrp_link_list field, and a list of
2768c2ecf20Sopenharmony_cicg_cgroup_links for a single css_set on its cg_link_list.
2778c2ecf20Sopenharmony_ci
2788c2ecf20Sopenharmony_ciThus the set of tasks in a cgroup can be listed by iterating over
2798c2ecf20Sopenharmony_cieach css_set that references the cgroup, and sub-iterating over
2808c2ecf20Sopenharmony_cieach css_set's task set.
2818c2ecf20Sopenharmony_ci
2828c2ecf20Sopenharmony_ciThe use of a Linux virtual file system (vfs) to represent the
2838c2ecf20Sopenharmony_cicgroup hierarchy provides for a familiar permission and name space
2848c2ecf20Sopenharmony_cifor cgroups, with a minimum of additional kernel code.
2858c2ecf20Sopenharmony_ci
2868c2ecf20Sopenharmony_ci1.4 What does notify_on_release do ?
2878c2ecf20Sopenharmony_ci------------------------------------
2888c2ecf20Sopenharmony_ci
2898c2ecf20Sopenharmony_ciIf the notify_on_release flag is enabled (1) in a cgroup, then
2908c2ecf20Sopenharmony_ciwhenever the last task in the cgroup leaves (exits or attaches to
2918c2ecf20Sopenharmony_cisome other cgroup) and the last child cgroup of that cgroup
2928c2ecf20Sopenharmony_ciis removed, then the kernel runs the command specified by the contents
2938c2ecf20Sopenharmony_ciof the "release_agent" file in that hierarchy's root directory,
2948c2ecf20Sopenharmony_cisupplying the pathname (relative to the mount point of the cgroup
2958c2ecf20Sopenharmony_cifile system) of the abandoned cgroup.  This enables automatic
2968c2ecf20Sopenharmony_ciremoval of abandoned cgroups.  The default value of
2978c2ecf20Sopenharmony_cinotify_on_release in the root cgroup at system boot is disabled
2988c2ecf20Sopenharmony_ci(0).  The default value of other cgroups at creation is the current
2998c2ecf20Sopenharmony_civalue of their parents' notify_on_release settings. The default value of
3008c2ecf20Sopenharmony_cia cgroup hierarchy's release_agent path is empty.
3018c2ecf20Sopenharmony_ci
3028c2ecf20Sopenharmony_ci1.5 What does clone_children do ?
3038c2ecf20Sopenharmony_ci---------------------------------
3048c2ecf20Sopenharmony_ci
3058c2ecf20Sopenharmony_ciThis flag only affects the cpuset controller. If the clone_children
3068c2ecf20Sopenharmony_ciflag is enabled (1) in a cgroup, a new cpuset cgroup will copy its
3078c2ecf20Sopenharmony_ciconfiguration from the parent during initialization.
3088c2ecf20Sopenharmony_ci
3098c2ecf20Sopenharmony_ci1.6 How do I use cgroups ?
3108c2ecf20Sopenharmony_ci--------------------------
3118c2ecf20Sopenharmony_ci
3128c2ecf20Sopenharmony_ciTo start a new job that is to be contained within a cgroup, using
3138c2ecf20Sopenharmony_cithe "cpuset" cgroup subsystem, the steps are something like::
3148c2ecf20Sopenharmony_ci
3158c2ecf20Sopenharmony_ci 1) mount -t tmpfs cgroup_root /sys/fs/cgroup
3168c2ecf20Sopenharmony_ci 2) mkdir /sys/fs/cgroup/cpuset
3178c2ecf20Sopenharmony_ci 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
3188c2ecf20Sopenharmony_ci 4) Create the new cgroup by doing mkdir's and write's (or echo's) in
3198c2ecf20Sopenharmony_ci    the /sys/fs/cgroup/cpuset virtual file system.
3208c2ecf20Sopenharmony_ci 5) Start a task that will be the "founding father" of the new job.
3218c2ecf20Sopenharmony_ci 6) Attach that task to the new cgroup by writing its PID to the
3228c2ecf20Sopenharmony_ci    /sys/fs/cgroup/cpuset tasks file for that cgroup.
3238c2ecf20Sopenharmony_ci 7) fork, exec or clone the job tasks from this founding father task.
3248c2ecf20Sopenharmony_ci
3258c2ecf20Sopenharmony_ciFor example, the following sequence of commands will setup a cgroup
3268c2ecf20Sopenharmony_cinamed "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
3278c2ecf20Sopenharmony_ciand then start a subshell 'sh' in that cgroup::
3288c2ecf20Sopenharmony_ci
3298c2ecf20Sopenharmony_ci  mount -t tmpfs cgroup_root /sys/fs/cgroup
3308c2ecf20Sopenharmony_ci  mkdir /sys/fs/cgroup/cpuset
3318c2ecf20Sopenharmony_ci  mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
3328c2ecf20Sopenharmony_ci  cd /sys/fs/cgroup/cpuset
3338c2ecf20Sopenharmony_ci  mkdir Charlie
3348c2ecf20Sopenharmony_ci  cd Charlie
3358c2ecf20Sopenharmony_ci  /bin/echo 2-3 > cpuset.cpus
3368c2ecf20Sopenharmony_ci  /bin/echo 1 > cpuset.mems
3378c2ecf20Sopenharmony_ci  /bin/echo $$ > tasks
3388c2ecf20Sopenharmony_ci  sh
3398c2ecf20Sopenharmony_ci  # The subshell 'sh' is now running in cgroup Charlie
3408c2ecf20Sopenharmony_ci  # The next line should display '/Charlie'
3418c2ecf20Sopenharmony_ci  cat /proc/self/cgroup
3428c2ecf20Sopenharmony_ci
3438c2ecf20Sopenharmony_ci2. Usage Examples and Syntax
3448c2ecf20Sopenharmony_ci============================
3458c2ecf20Sopenharmony_ci
3468c2ecf20Sopenharmony_ci2.1 Basic Usage
3478c2ecf20Sopenharmony_ci---------------
3488c2ecf20Sopenharmony_ci
3498c2ecf20Sopenharmony_ciCreating, modifying, using cgroups can be done through the cgroup
3508c2ecf20Sopenharmony_civirtual filesystem.
3518c2ecf20Sopenharmony_ci
3528c2ecf20Sopenharmony_ciTo mount a cgroup hierarchy with all available subsystems, type::
3538c2ecf20Sopenharmony_ci
3548c2ecf20Sopenharmony_ci  # mount -t cgroup xxx /sys/fs/cgroup
3558c2ecf20Sopenharmony_ci
3568c2ecf20Sopenharmony_ciThe "xxx" is not interpreted by the cgroup code, but will appear in
3578c2ecf20Sopenharmony_ci/proc/mounts so may be any useful identifying string that you like.
3588c2ecf20Sopenharmony_ci
3598c2ecf20Sopenharmony_ciNote: Some subsystems do not work without some user input first.  For instance,
3608c2ecf20Sopenharmony_ciif cpusets are enabled the user will have to populate the cpus and mems files
3618c2ecf20Sopenharmony_cifor each new cgroup created before that group can be used.
3628c2ecf20Sopenharmony_ci
3638c2ecf20Sopenharmony_ciAs explained in section `1.2 Why are cgroups needed?` you should create
3648c2ecf20Sopenharmony_cidifferent hierarchies of cgroups for each single resource or group of
3658c2ecf20Sopenharmony_ciresources you want to control. Therefore, you should mount a tmpfs on
3668c2ecf20Sopenharmony_ci/sys/fs/cgroup and create directories for each cgroup resource or resource
3678c2ecf20Sopenharmony_cigroup::
3688c2ecf20Sopenharmony_ci
3698c2ecf20Sopenharmony_ci  # mount -t tmpfs cgroup_root /sys/fs/cgroup
3708c2ecf20Sopenharmony_ci  # mkdir /sys/fs/cgroup/rg1
3718c2ecf20Sopenharmony_ci
3728c2ecf20Sopenharmony_ciTo mount a cgroup hierarchy with just the cpuset and memory
3738c2ecf20Sopenharmony_cisubsystems, type::
3748c2ecf20Sopenharmony_ci
3758c2ecf20Sopenharmony_ci  # mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1
3768c2ecf20Sopenharmony_ci
3778c2ecf20Sopenharmony_ciWhile remounting cgroups is currently supported, it is not recommend
3788c2ecf20Sopenharmony_cito use it. Remounting allows changing bound subsystems and
3798c2ecf20Sopenharmony_cirelease_agent. Rebinding is hardly useful as it only works when the
3808c2ecf20Sopenharmony_cihierarchy is empty and release_agent itself should be replaced with
3818c2ecf20Sopenharmony_ciconventional fsnotify. The support for remounting will be removed in
3828c2ecf20Sopenharmony_cithe future.
3838c2ecf20Sopenharmony_ci
3848c2ecf20Sopenharmony_ciTo Specify a hierarchy's release_agent::
3858c2ecf20Sopenharmony_ci
3868c2ecf20Sopenharmony_ci  # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
3878c2ecf20Sopenharmony_ci    xxx /sys/fs/cgroup/rg1
3888c2ecf20Sopenharmony_ci
3898c2ecf20Sopenharmony_ciNote that specifying 'release_agent' more than once will return failure.
3908c2ecf20Sopenharmony_ci
3918c2ecf20Sopenharmony_ciNote that changing the set of subsystems is currently only supported
3928c2ecf20Sopenharmony_ciwhen the hierarchy consists of a single (root) cgroup. Supporting
3938c2ecf20Sopenharmony_cithe ability to arbitrarily bind/unbind subsystems from an existing
3948c2ecf20Sopenharmony_cicgroup hierarchy is intended to be implemented in the future.
3958c2ecf20Sopenharmony_ci
3968c2ecf20Sopenharmony_ciThen under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the
3978c2ecf20Sopenharmony_citree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1
3988c2ecf20Sopenharmony_ciis the cgroup that holds the whole system.
3998c2ecf20Sopenharmony_ci
4008c2ecf20Sopenharmony_ciIf you want to change the value of release_agent::
4018c2ecf20Sopenharmony_ci
4028c2ecf20Sopenharmony_ci  # echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent
4038c2ecf20Sopenharmony_ci
4048c2ecf20Sopenharmony_ciIt can also be changed via remount.
4058c2ecf20Sopenharmony_ci
4068c2ecf20Sopenharmony_ciIf you want to create a new cgroup under /sys/fs/cgroup/rg1::
4078c2ecf20Sopenharmony_ci
4088c2ecf20Sopenharmony_ci  # cd /sys/fs/cgroup/rg1
4098c2ecf20Sopenharmony_ci  # mkdir my_cgroup
4108c2ecf20Sopenharmony_ci
4118c2ecf20Sopenharmony_ciNow you want to do something with this cgroup:
4128c2ecf20Sopenharmony_ci
4138c2ecf20Sopenharmony_ci  # cd my_cgroup
4148c2ecf20Sopenharmony_ci
4158c2ecf20Sopenharmony_ciIn this directory you can find several files::
4168c2ecf20Sopenharmony_ci
4178c2ecf20Sopenharmony_ci  # ls
4188c2ecf20Sopenharmony_ci  cgroup.procs notify_on_release tasks
4198c2ecf20Sopenharmony_ci  (plus whatever files added by the attached subsystems)
4208c2ecf20Sopenharmony_ci
4218c2ecf20Sopenharmony_ciNow attach your shell to this cgroup::
4228c2ecf20Sopenharmony_ci
4238c2ecf20Sopenharmony_ci  # /bin/echo $$ > tasks
4248c2ecf20Sopenharmony_ci
4258c2ecf20Sopenharmony_ciYou can also create cgroups inside your cgroup by using mkdir in this
4268c2ecf20Sopenharmony_cidirectory::
4278c2ecf20Sopenharmony_ci
4288c2ecf20Sopenharmony_ci  # mkdir my_sub_cs
4298c2ecf20Sopenharmony_ci
4308c2ecf20Sopenharmony_ciTo remove a cgroup, just use rmdir::
4318c2ecf20Sopenharmony_ci
4328c2ecf20Sopenharmony_ci  # rmdir my_sub_cs
4338c2ecf20Sopenharmony_ci
4348c2ecf20Sopenharmony_ciThis will fail if the cgroup is in use (has cgroups inside, or
4358c2ecf20Sopenharmony_cihas processes attached, or is held alive by other subsystem-specific
4368c2ecf20Sopenharmony_cireference).
4378c2ecf20Sopenharmony_ci
4388c2ecf20Sopenharmony_ci2.2 Attaching processes
4398c2ecf20Sopenharmony_ci-----------------------
4408c2ecf20Sopenharmony_ci
4418c2ecf20Sopenharmony_ci::
4428c2ecf20Sopenharmony_ci
4438c2ecf20Sopenharmony_ci  # /bin/echo PID > tasks
4448c2ecf20Sopenharmony_ci
4458c2ecf20Sopenharmony_ciNote that it is PID, not PIDs. You can only attach ONE task at a time.
4468c2ecf20Sopenharmony_ciIf you have several tasks to attach, you have to do it one after another::
4478c2ecf20Sopenharmony_ci
4488c2ecf20Sopenharmony_ci  # /bin/echo PID1 > tasks
4498c2ecf20Sopenharmony_ci  # /bin/echo PID2 > tasks
4508c2ecf20Sopenharmony_ci	  ...
4518c2ecf20Sopenharmony_ci  # /bin/echo PIDn > tasks
4528c2ecf20Sopenharmony_ci
4538c2ecf20Sopenharmony_ciYou can attach the current shell task by echoing 0::
4548c2ecf20Sopenharmony_ci
4558c2ecf20Sopenharmony_ci  # echo 0 > tasks
4568c2ecf20Sopenharmony_ci
4578c2ecf20Sopenharmony_ciYou can use the cgroup.procs file instead of the tasks file to move all
4588c2ecf20Sopenharmony_cithreads in a threadgroup at once. Echoing the PID of any task in a
4598c2ecf20Sopenharmony_cithreadgroup to cgroup.procs causes all tasks in that threadgroup to be
4608c2ecf20Sopenharmony_ciattached to the cgroup. Writing 0 to cgroup.procs moves all tasks
4618c2ecf20Sopenharmony_ciin the writing task's threadgroup.
4628c2ecf20Sopenharmony_ci
4638c2ecf20Sopenharmony_ciNote: Since every task is always a member of exactly one cgroup in each
4648c2ecf20Sopenharmony_cimounted hierarchy, to remove a task from its current cgroup you must
4658c2ecf20Sopenharmony_cimove it into a new cgroup (possibly the root cgroup) by writing to the
4668c2ecf20Sopenharmony_cinew cgroup's tasks file.
4678c2ecf20Sopenharmony_ci
4688c2ecf20Sopenharmony_ciNote: Due to some restrictions enforced by some cgroup subsystems, moving
4698c2ecf20Sopenharmony_cia process to another cgroup can fail.
4708c2ecf20Sopenharmony_ci
4718c2ecf20Sopenharmony_ci2.3 Mounting hierarchies by name
4728c2ecf20Sopenharmony_ci--------------------------------
4738c2ecf20Sopenharmony_ci
4748c2ecf20Sopenharmony_ciPassing the name=<x> option when mounting a cgroups hierarchy
4758c2ecf20Sopenharmony_ciassociates the given name with the hierarchy.  This can be used when
4768c2ecf20Sopenharmony_cimounting a pre-existing hierarchy, in order to refer to it by name
4778c2ecf20Sopenharmony_cirather than by its set of active subsystems.  Each hierarchy is either
4788c2ecf20Sopenharmony_cinameless, or has a unique name.
4798c2ecf20Sopenharmony_ci
4808c2ecf20Sopenharmony_ciThe name should match [\w.-]+
4818c2ecf20Sopenharmony_ci
4828c2ecf20Sopenharmony_ciWhen passing a name=<x> option for a new hierarchy, you need to
4838c2ecf20Sopenharmony_cispecify subsystems manually; the legacy behaviour of mounting all
4848c2ecf20Sopenharmony_cisubsystems when none are explicitly specified is not supported when
4858c2ecf20Sopenharmony_ciyou give a subsystem a name.
4868c2ecf20Sopenharmony_ci
4878c2ecf20Sopenharmony_ciThe name of the subsystem appears as part of the hierarchy description
4888c2ecf20Sopenharmony_ciin /proc/mounts and /proc/<pid>/cgroups.
4898c2ecf20Sopenharmony_ci
4908c2ecf20Sopenharmony_ci
4918c2ecf20Sopenharmony_ci3. Kernel API
4928c2ecf20Sopenharmony_ci=============
4938c2ecf20Sopenharmony_ci
4948c2ecf20Sopenharmony_ci3.1 Overview
4958c2ecf20Sopenharmony_ci------------
4968c2ecf20Sopenharmony_ci
4978c2ecf20Sopenharmony_ciEach kernel subsystem that wants to hook into the generic cgroup
4988c2ecf20Sopenharmony_cisystem needs to create a cgroup_subsys object. This contains
4998c2ecf20Sopenharmony_civarious methods, which are callbacks from the cgroup system, along
5008c2ecf20Sopenharmony_ciwith a subsystem ID which will be assigned by the cgroup system.
5018c2ecf20Sopenharmony_ci
5028c2ecf20Sopenharmony_ciOther fields in the cgroup_subsys object include:
5038c2ecf20Sopenharmony_ci
5048c2ecf20Sopenharmony_ci- subsys_id: a unique array index for the subsystem, indicating which
5058c2ecf20Sopenharmony_ci  entry in cgroup->subsys[] this subsystem should be managing.
5068c2ecf20Sopenharmony_ci
5078c2ecf20Sopenharmony_ci- name: should be initialized to a unique subsystem name. Should be
5088c2ecf20Sopenharmony_ci  no longer than MAX_CGROUP_TYPE_NAMELEN.
5098c2ecf20Sopenharmony_ci
5108c2ecf20Sopenharmony_ci- early_init: indicate if the subsystem needs early initialization
5118c2ecf20Sopenharmony_ci  at system boot.
5128c2ecf20Sopenharmony_ci
5138c2ecf20Sopenharmony_ciEach cgroup object created by the system has an array of pointers,
5148c2ecf20Sopenharmony_ciindexed by subsystem ID; this pointer is entirely managed by the
5158c2ecf20Sopenharmony_cisubsystem; the generic cgroup code will never touch this pointer.
5168c2ecf20Sopenharmony_ci
5178c2ecf20Sopenharmony_ci3.2 Synchronization
5188c2ecf20Sopenharmony_ci-------------------
5198c2ecf20Sopenharmony_ci
5208c2ecf20Sopenharmony_ciThere is a global mutex, cgroup_mutex, used by the cgroup
5218c2ecf20Sopenharmony_cisystem. This should be taken by anything that wants to modify a
5228c2ecf20Sopenharmony_cicgroup. It may also be taken to prevent cgroups from being
5238c2ecf20Sopenharmony_cimodified, but more specific locks may be more appropriate in that
5248c2ecf20Sopenharmony_cisituation.
5258c2ecf20Sopenharmony_ci
5268c2ecf20Sopenharmony_ciSee kernel/cgroup.c for more details.
5278c2ecf20Sopenharmony_ci
5288c2ecf20Sopenharmony_ciSubsystems can take/release the cgroup_mutex via the functions
5298c2ecf20Sopenharmony_cicgroup_lock()/cgroup_unlock().
5308c2ecf20Sopenharmony_ci
5318c2ecf20Sopenharmony_ciAccessing a task's cgroup pointer may be done in the following ways:
5328c2ecf20Sopenharmony_ci- while holding cgroup_mutex
5338c2ecf20Sopenharmony_ci- while holding the task's alloc_lock (via task_lock())
5348c2ecf20Sopenharmony_ci- inside an rcu_read_lock() section via rcu_dereference()
5358c2ecf20Sopenharmony_ci
5368c2ecf20Sopenharmony_ci3.3 Subsystem API
5378c2ecf20Sopenharmony_ci-----------------
5388c2ecf20Sopenharmony_ci
5398c2ecf20Sopenharmony_ciEach subsystem should:
5408c2ecf20Sopenharmony_ci
5418c2ecf20Sopenharmony_ci- add an entry in linux/cgroup_subsys.h
5428c2ecf20Sopenharmony_ci- define a cgroup_subsys object called <name>_cgrp_subsys
5438c2ecf20Sopenharmony_ci
5448c2ecf20Sopenharmony_ciEach subsystem may export the following methods. The only mandatory
5458c2ecf20Sopenharmony_cimethods are css_alloc/free. Any others that are null are presumed to
5468c2ecf20Sopenharmony_cibe successful no-ops.
5478c2ecf20Sopenharmony_ci
5488c2ecf20Sopenharmony_ci``struct cgroup_subsys_state *css_alloc(struct cgroup *cgrp)``
5498c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
5508c2ecf20Sopenharmony_ci
5518c2ecf20Sopenharmony_ciCalled to allocate a subsystem state object for a cgroup. The
5528c2ecf20Sopenharmony_cisubsystem should allocate its subsystem state object for the passed
5538c2ecf20Sopenharmony_cicgroup, returning a pointer to the new object on success or a
5548c2ecf20Sopenharmony_ciERR_PTR() value. On success, the subsystem pointer should point to
5558c2ecf20Sopenharmony_cia structure of type cgroup_subsys_state (typically embedded in a
5568c2ecf20Sopenharmony_cilarger subsystem-specific object), which will be initialized by the
5578c2ecf20Sopenharmony_cicgroup system. Note that this will be called at initialization to
5588c2ecf20Sopenharmony_cicreate the root subsystem state for this subsystem; this case can be
5598c2ecf20Sopenharmony_ciidentified by the passed cgroup object having a NULL parent (since
5608c2ecf20Sopenharmony_ciit's the root of the hierarchy) and may be an appropriate place for
5618c2ecf20Sopenharmony_ciinitialization code.
5628c2ecf20Sopenharmony_ci
5638c2ecf20Sopenharmony_ci``int css_online(struct cgroup *cgrp)``
5648c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
5658c2ecf20Sopenharmony_ci
5668c2ecf20Sopenharmony_ciCalled after @cgrp successfully completed all allocations and made
5678c2ecf20Sopenharmony_civisible to cgroup_for_each_child/descendant_*() iterators. The
5688c2ecf20Sopenharmony_cisubsystem may choose to fail creation by returning -errno. This
5698c2ecf20Sopenharmony_cicallback can be used to implement reliable state sharing and
5708c2ecf20Sopenharmony_cipropagation along the hierarchy. See the comment on
5718c2ecf20Sopenharmony_cicgroup_for_each_descendant_pre() for details.
5728c2ecf20Sopenharmony_ci
5738c2ecf20Sopenharmony_ci``void css_offline(struct cgroup *cgrp);``
5748c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
5758c2ecf20Sopenharmony_ci
5768c2ecf20Sopenharmony_ciThis is the counterpart of css_online() and called iff css_online()
5778c2ecf20Sopenharmony_cihas succeeded on @cgrp. This signifies the beginning of the end of
5788c2ecf20Sopenharmony_ci@cgrp. @cgrp is being removed and the subsystem should start dropping
5798c2ecf20Sopenharmony_ciall references it's holding on @cgrp. When all references are dropped,
5808c2ecf20Sopenharmony_cicgroup removal will proceed to the next step - css_free(). After this
5818c2ecf20Sopenharmony_cicallback, @cgrp should be considered dead to the subsystem.
5828c2ecf20Sopenharmony_ci
5838c2ecf20Sopenharmony_ci``void css_free(struct cgroup *cgrp)``
5848c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
5858c2ecf20Sopenharmony_ci
5868c2ecf20Sopenharmony_ciThe cgroup system is about to free @cgrp; the subsystem should free
5878c2ecf20Sopenharmony_ciits subsystem state object. By the time this method is called, @cgrp
5888c2ecf20Sopenharmony_ciis completely unused; @cgrp->parent is still valid. (Note - can also
5898c2ecf20Sopenharmony_cibe called for a newly-created cgroup if an error occurs after this
5908c2ecf20Sopenharmony_cisubsystem's create() method has been called for the new cgroup).
5918c2ecf20Sopenharmony_ci
5928c2ecf20Sopenharmony_ci``int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
5938c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
5948c2ecf20Sopenharmony_ci
5958c2ecf20Sopenharmony_ciCalled prior to moving one or more tasks into a cgroup; if the
5968c2ecf20Sopenharmony_cisubsystem returns an error, this will abort the attach operation.
5978c2ecf20Sopenharmony_ci@tset contains the tasks to be attached and is guaranteed to have at
5988c2ecf20Sopenharmony_cileast one task in it.
5998c2ecf20Sopenharmony_ci
6008c2ecf20Sopenharmony_ciIf there are multiple tasks in the taskset, then:
6018c2ecf20Sopenharmony_ci  - it's guaranteed that all are from the same thread group
6028c2ecf20Sopenharmony_ci  - @tset contains all tasks from the thread group whether or not
6038c2ecf20Sopenharmony_ci    they're switching cgroups
6048c2ecf20Sopenharmony_ci  - the first task is the leader
6058c2ecf20Sopenharmony_ci
6068c2ecf20Sopenharmony_ciEach @tset entry also contains the task's old cgroup and tasks which
6078c2ecf20Sopenharmony_ciaren't switching cgroup can be skipped easily using the
6088c2ecf20Sopenharmony_cicgroup_taskset_for_each() iterator. Note that this isn't called on a
6098c2ecf20Sopenharmony_cifork. If this method returns 0 (success) then this should remain valid
6108c2ecf20Sopenharmony_ciwhile the caller holds cgroup_mutex and it is ensured that either
6118c2ecf20Sopenharmony_ciattach() or cancel_attach() will be called in future.
6128c2ecf20Sopenharmony_ci
6138c2ecf20Sopenharmony_ci``void css_reset(struct cgroup_subsys_state *css)``
6148c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
6158c2ecf20Sopenharmony_ci
6168c2ecf20Sopenharmony_ciAn optional operation which should restore @css's configuration to the
6178c2ecf20Sopenharmony_ciinitial state.  This is currently only used on the unified hierarchy
6188c2ecf20Sopenharmony_ciwhen a subsystem is disabled on a cgroup through
6198c2ecf20Sopenharmony_ci"cgroup.subtree_control" but should remain enabled because other
6208c2ecf20Sopenharmony_cisubsystems depend on it.  cgroup core makes such a css invisible by
6218c2ecf20Sopenharmony_ciremoving the associated interface files and invokes this callback so
6228c2ecf20Sopenharmony_cithat the hidden subsystem can return to the initial neutral state.
6238c2ecf20Sopenharmony_ciThis prevents unexpected resource control from a hidden css and
6248c2ecf20Sopenharmony_ciensures that the configuration is in the initial state when it is made
6258c2ecf20Sopenharmony_civisible again later.
6268c2ecf20Sopenharmony_ci
6278c2ecf20Sopenharmony_ci``void cancel_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
6288c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
6298c2ecf20Sopenharmony_ci
6308c2ecf20Sopenharmony_ciCalled when a task attach operation has failed after can_attach() has succeeded.
6318c2ecf20Sopenharmony_ciA subsystem whose can_attach() has some side-effects should provide this
6328c2ecf20Sopenharmony_cifunction, so that the subsystem can implement a rollback. If not, not necessary.
6338c2ecf20Sopenharmony_ciThis will be called only about subsystems whose can_attach() operation have
6348c2ecf20Sopenharmony_cisucceeded. The parameters are identical to can_attach().
6358c2ecf20Sopenharmony_ci
6368c2ecf20Sopenharmony_ci``void attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
6378c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
6388c2ecf20Sopenharmony_ci
6398c2ecf20Sopenharmony_ciCalled after the task has been attached to the cgroup, to allow any
6408c2ecf20Sopenharmony_cipost-attachment activity that requires memory allocations or blocking.
6418c2ecf20Sopenharmony_ciThe parameters are identical to can_attach().
6428c2ecf20Sopenharmony_ci
6438c2ecf20Sopenharmony_ci``void fork(struct task_struct *task)``
6448c2ecf20Sopenharmony_ci
6458c2ecf20Sopenharmony_ciCalled when a task is forked into a cgroup.
6468c2ecf20Sopenharmony_ci
6478c2ecf20Sopenharmony_ci``void exit(struct task_struct *task)``
6488c2ecf20Sopenharmony_ci
6498c2ecf20Sopenharmony_ciCalled during task exit.
6508c2ecf20Sopenharmony_ci
6518c2ecf20Sopenharmony_ci``void free(struct task_struct *task)``
6528c2ecf20Sopenharmony_ci
6538c2ecf20Sopenharmony_ciCalled when the task_struct is freed.
6548c2ecf20Sopenharmony_ci
6558c2ecf20Sopenharmony_ci``void bind(struct cgroup *root)``
6568c2ecf20Sopenharmony_ci(cgroup_mutex held by caller)
6578c2ecf20Sopenharmony_ci
6588c2ecf20Sopenharmony_ciCalled when a cgroup subsystem is rebound to a different hierarchy
6598c2ecf20Sopenharmony_ciand root cgroup. Currently this will only involve movement between
6608c2ecf20Sopenharmony_cithe default hierarchy (which never has sub-cgroups) and a hierarchy
6618c2ecf20Sopenharmony_cithat is being created/destroyed (and hence has no sub-cgroups).
6628c2ecf20Sopenharmony_ci
6638c2ecf20Sopenharmony_ci4. Extended attribute usage
6648c2ecf20Sopenharmony_ci===========================
6658c2ecf20Sopenharmony_ci
6668c2ecf20Sopenharmony_cicgroup filesystem supports certain types of extended attributes in its
6678c2ecf20Sopenharmony_cidirectories and files.  The current supported types are:
6688c2ecf20Sopenharmony_ci
6698c2ecf20Sopenharmony_ci	- Trusted (XATTR_TRUSTED)
6708c2ecf20Sopenharmony_ci	- Security (XATTR_SECURITY)
6718c2ecf20Sopenharmony_ci
6728c2ecf20Sopenharmony_ciBoth require CAP_SYS_ADMIN capability to set.
6738c2ecf20Sopenharmony_ci
6748c2ecf20Sopenharmony_ciLike in tmpfs, the extended attributes in cgroup filesystem are stored
6758c2ecf20Sopenharmony_ciusing kernel memory and it's advised to keep the usage at minimum.  This
6768c2ecf20Sopenharmony_ciis the reason why user defined extended attributes are not supported, since
6778c2ecf20Sopenharmony_ciany user can do it and there's no limit in the value size.
6788c2ecf20Sopenharmony_ci
6798c2ecf20Sopenharmony_ciThe current known users for this feature are SELinux to limit cgroup usage
6808c2ecf20Sopenharmony_ciin containers and systemd for assorted meta data like main PID in a cgroup
6818c2ecf20Sopenharmony_ci(systemd creates a cgroup per service).
6828c2ecf20Sopenharmony_ci
6838c2ecf20Sopenharmony_ci5. Questions
6848c2ecf20Sopenharmony_ci============
6858c2ecf20Sopenharmony_ci
6868c2ecf20Sopenharmony_ci::
6878c2ecf20Sopenharmony_ci
6888c2ecf20Sopenharmony_ci  Q: what's up with this '/bin/echo' ?
6898c2ecf20Sopenharmony_ci  A: bash's builtin 'echo' command does not check calls to write() against
6908c2ecf20Sopenharmony_ci     errors. If you use it in the cgroup file system, you won't be
6918c2ecf20Sopenharmony_ci     able to tell whether a command succeeded or failed.
6928c2ecf20Sopenharmony_ci
6938c2ecf20Sopenharmony_ci  Q: When I attach processes, only the first of the line gets really attached !
6948c2ecf20Sopenharmony_ci  A: We can only return one error code per call to write(). So you should also
6958c2ecf20Sopenharmony_ci     put only ONE PID.
696