162306a36Sopenharmony_ci============= 262306a36Sopenharmony_ciBPF Iterators 362306a36Sopenharmony_ci============= 462306a36Sopenharmony_ci 562306a36Sopenharmony_ci 662306a36Sopenharmony_ci---------- 762306a36Sopenharmony_ciMotivation 862306a36Sopenharmony_ci---------- 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciThere are a few existing ways to dump kernel data into user space. The most 1162306a36Sopenharmony_cipopular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps 1262306a36Sopenharmony_ciall tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink 1362306a36Sopenharmony_cisockets in the system. However, their output format tends to be fixed, and if 1462306a36Sopenharmony_ciusers want more information about these sockets, they have to patch the kernel, 1562306a36Sopenharmony_ciwhich often takes time to publish upstream and release. The same is true for popular 1662306a36Sopenharmony_citools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any 1762306a36Sopenharmony_ciadditional information needs a kernel patch. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciTo solve this problem, the `drgn 2062306a36Sopenharmony_ci<https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to 2162306a36Sopenharmony_cidig out the kernel data with no kernel change. However, the main drawback for 2262306a36Sopenharmony_cidrgn is performance, as it cannot do pointer tracing inside the kernel. In 2362306a36Sopenharmony_ciaddition, drgn cannot validate a pointer value and may read invalid data if the 2462306a36Sopenharmony_cipointer becomes invalid inside the kernel. 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciThe BPF iterator solves the above problem by providing flexibility on what data 2762306a36Sopenharmony_ci(e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel 2862306a36Sopenharmony_cidata object. 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ci---------------------- 3162306a36Sopenharmony_ciHow BPF Iterators Work 3262306a36Sopenharmony_ci---------------------- 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciA BPF iterator is a type of BPF program that allows users to iterate over 3562306a36Sopenharmony_cispecific types of kernel objects. Unlike traditional BPF tracing programs that 3662306a36Sopenharmony_ciallow users to define callbacks that are invoked at particular points of 3762306a36Sopenharmony_ciexecution in the kernel, BPF iterators allow users to define callbacks that 3862306a36Sopenharmony_cishould be executed for every entry in a variety of kernel data structures. 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ciFor example, users can define a BPF iterator that iterates over every task on 4162306a36Sopenharmony_cithe system and dumps the total amount of CPU runtime currently used by each of 4262306a36Sopenharmony_cithem. Another BPF task iterator may instead dump the cgroup information for each 4362306a36Sopenharmony_citask. Such flexibility is the core value of BPF iterators. 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciA BPF program is always loaded into the kernel at the behest of a user space 4662306a36Sopenharmony_ciprocess. A user space process loads a BPF program by opening and initializing 4762306a36Sopenharmony_cithe program skeleton as required and then invoking a syscall to have the BPF 4862306a36Sopenharmony_ciprogram verified and loaded by the kernel. 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ciIn traditional tracing programs, a program is activated by having user space 5162306a36Sopenharmony_ciobtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once 5262306a36Sopenharmony_ciactivated, the program callback will be invoked whenever the tracepoint is 5362306a36Sopenharmony_citriggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the 5462306a36Sopenharmony_ciprogram is obtained using ``bpf_link_create()``, and the program callback is 5562306a36Sopenharmony_ciinvoked by issuing system calls from user space. 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ciNext, let us see how you can use the iterators to iterate on kernel objects and 5862306a36Sopenharmony_ciread data. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ci------------------------ 6162306a36Sopenharmony_ciHow to Use BPF iterators 6262306a36Sopenharmony_ci------------------------ 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciBPF selftests are a great resource to illustrate how to use the iterators. In 6562306a36Sopenharmony_cithis section, we’ll walk through a BPF selftest which shows how to load and use 6662306a36Sopenharmony_cia BPF iterator program. To begin, we’ll look at `bpf_iter.c 6762306a36Sopenharmony_ci<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_, 6862306a36Sopenharmony_ciwhich illustrates how to load and trigger BPF iterators on the user space side. 6962306a36Sopenharmony_ciLater, we’ll look at a BPF program that runs in kernel space. 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ciLoading a BPF iterator in the kernel from user space typically involves the 7262306a36Sopenharmony_cifollowing steps: 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ci* The BPF program is loaded into the kernel through ``libbpf``. Once the kernel 7562306a36Sopenharmony_ci has verified and loaded the program, it returns a file descriptor (fd) to user 7662306a36Sopenharmony_ci space. 7762306a36Sopenharmony_ci* Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()`` 7862306a36Sopenharmony_ci specified with the BPF program file descriptor received from the kernel. 7962306a36Sopenharmony_ci* Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the 8062306a36Sopenharmony_ci ``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2. 8162306a36Sopenharmony_ci* Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is 8262306a36Sopenharmony_ci available. 8362306a36Sopenharmony_ci* Close the iterator fd using ``close(bpf_iter_fd)``. 8462306a36Sopenharmony_ci* If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again. 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ciThe following are a few examples of selftest BPF iterator programs: 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci* `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_ 8962306a36Sopenharmony_ci* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_ 9062306a36Sopenharmony_ci* `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_ 9162306a36Sopenharmony_ci 9262306a36Sopenharmony_ciLet us look at ``bpf_iter_task_file.c``, which runs in kernel space: 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciHere is the definition of ``bpf_iter__task_file`` in `vmlinux.h 9562306a36Sopenharmony_ci<https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_. 9662306a36Sopenharmony_ciAny struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>`` 9762306a36Sopenharmony_cirepresents a BPF iterator. The suffix ``<iter_name>`` represents the type of 9862306a36Sopenharmony_ciiterator. 9962306a36Sopenharmony_ci 10062306a36Sopenharmony_ci:: 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ci struct bpf_iter__task_file { 10362306a36Sopenharmony_ci union { 10462306a36Sopenharmony_ci struct bpf_iter_meta *meta; 10562306a36Sopenharmony_ci }; 10662306a36Sopenharmony_ci union { 10762306a36Sopenharmony_ci struct task_struct *task; 10862306a36Sopenharmony_ci }; 10962306a36Sopenharmony_ci u32 fd; 11062306a36Sopenharmony_ci union { 11162306a36Sopenharmony_ci struct file *file; 11262306a36Sopenharmony_ci }; 11362306a36Sopenharmony_ci }; 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciIn the above code, the field 'meta' contains the metadata, which is the same for 11662306a36Sopenharmony_ciall BPF iterator programs. The rest of the fields are specific to different 11762306a36Sopenharmony_ciiterators. For example, for task_file iterators, the kernel layer provides the 11862306a36Sopenharmony_ci'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference 11962306a36Sopenharmony_cicounted 12062306a36Sopenharmony_ci<https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_, 12162306a36Sopenharmony_ciso they won't go away when the BPF program runs. 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciHere is a snippet from the ``bpf_iter_task_file.c`` file: 12462306a36Sopenharmony_ci 12562306a36Sopenharmony_ci:: 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ci SEC("iter/task_file") 12862306a36Sopenharmony_ci int dump_task_file(struct bpf_iter__task_file *ctx) 12962306a36Sopenharmony_ci { 13062306a36Sopenharmony_ci struct seq_file *seq = ctx->meta->seq; 13162306a36Sopenharmony_ci struct task_struct *task = ctx->task; 13262306a36Sopenharmony_ci struct file *file = ctx->file; 13362306a36Sopenharmony_ci __u32 fd = ctx->fd; 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci if (task == NULL || file == NULL) 13662306a36Sopenharmony_ci return 0; 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci if (ctx->meta->seq_num == 0) { 13962306a36Sopenharmony_ci count = 0; 14062306a36Sopenharmony_ci BPF_SEQ_PRINTF(seq, " tgid gid fd file\n"); 14162306a36Sopenharmony_ci } 14262306a36Sopenharmony_ci 14362306a36Sopenharmony_ci if (tgid == task->tgid && task->tgid != task->pid) 14462306a36Sopenharmony_ci count++; 14562306a36Sopenharmony_ci 14662306a36Sopenharmony_ci if (last_tgid != task->tgid) { 14762306a36Sopenharmony_ci last_tgid = task->tgid; 14862306a36Sopenharmony_ci unique_tgid_count++; 14962306a36Sopenharmony_ci } 15062306a36Sopenharmony_ci 15162306a36Sopenharmony_ci BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 15262306a36Sopenharmony_ci (long)file->f_op); 15362306a36Sopenharmony_ci return 0; 15462306a36Sopenharmony_ci } 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ciIn the above example, the section name ``SEC(iter/task_file)``, indicates that 15762306a36Sopenharmony_cithe program is a BPF iterator program to iterate all files from all tasks. The 15862306a36Sopenharmony_cicontext of the program is ``bpf_iter__task_file`` struct. 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ciThe user space program invokes the BPF iterator program running in the kernel 16162306a36Sopenharmony_ciby issuing a ``read()`` syscall. Once invoked, the BPF 16262306a36Sopenharmony_ciprogram can export data to user space using a variety of BPF helper functions. 16362306a36Sopenharmony_ciYou can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or 16462306a36Sopenharmony_ci``bpf_seq_write()`` function based on whether you need formatted output or just 16562306a36Sopenharmony_cibinary data, respectively. For binary-encoded data, the user space applications 16662306a36Sopenharmony_cican process the data from ``bpf_seq_write()`` as needed. For the formatted data, 16762306a36Sopenharmony_ciyou can use ``cat <path>`` to print the results similar to ``cat 16862306a36Sopenharmony_ci/proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later, 16962306a36Sopenharmony_ciuse ``rm -f <path>`` to remove the pinned iterator. 17062306a36Sopenharmony_ci 17162306a36Sopenharmony_ciFor example, you can use the following command to create a BPF iterator from the 17262306a36Sopenharmony_ci``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route`` 17362306a36Sopenharmony_cipath: 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ci:: 17662306a36Sopenharmony_ci 17762306a36Sopenharmony_ci $ bpftool iter pin ./bpf_iter_ipv6_route.o /sys/fs/bpf/my_route 17862306a36Sopenharmony_ci 17962306a36Sopenharmony_ciAnd then print out the results using the following command: 18062306a36Sopenharmony_ci 18162306a36Sopenharmony_ci:: 18262306a36Sopenharmony_ci 18362306a36Sopenharmony_ci $ cat /sys/fs/bpf/my_route 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ci 18662306a36Sopenharmony_ci------------------------------------------------------- 18762306a36Sopenharmony_ciImplement Kernel Support for BPF Iterator Program Types 18862306a36Sopenharmony_ci------------------------------------------------------- 18962306a36Sopenharmony_ci 19062306a36Sopenharmony_ciTo implement a BPF iterator in the kernel, the developer must make a one-time 19162306a36Sopenharmony_cichange to the following key data structure defined in the `bpf.h 19262306a36Sopenharmony_ci<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_ 19362306a36Sopenharmony_cifile. 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ci:: 19662306a36Sopenharmony_ci 19762306a36Sopenharmony_ci struct bpf_iter_reg { 19862306a36Sopenharmony_ci const char *target; 19962306a36Sopenharmony_ci bpf_iter_attach_target_t attach_target; 20062306a36Sopenharmony_ci bpf_iter_detach_target_t detach_target; 20162306a36Sopenharmony_ci bpf_iter_show_fdinfo_t show_fdinfo; 20262306a36Sopenharmony_ci bpf_iter_fill_link_info_t fill_link_info; 20362306a36Sopenharmony_ci bpf_iter_get_func_proto_t get_func_proto; 20462306a36Sopenharmony_ci u32 ctx_arg_info_size; 20562306a36Sopenharmony_ci u32 feature; 20662306a36Sopenharmony_ci struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX]; 20762306a36Sopenharmony_ci const struct bpf_iter_seq_info *seq_info; 20862306a36Sopenharmony_ci }; 20962306a36Sopenharmony_ci 21062306a36Sopenharmony_ciAfter filling the data structure fields, call ``bpf_iter_reg_target()`` to 21162306a36Sopenharmony_ciregister the iterator to the main BPF iterator subsystem. 21262306a36Sopenharmony_ci 21362306a36Sopenharmony_ciThe following is the breakdown for each field in struct ``bpf_iter_reg``. 21462306a36Sopenharmony_ci 21562306a36Sopenharmony_ci.. list-table:: 21662306a36Sopenharmony_ci :widths: 25 50 21762306a36Sopenharmony_ci :header-rows: 1 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ci * - Fields 22062306a36Sopenharmony_ci - Description 22162306a36Sopenharmony_ci * - target 22262306a36Sopenharmony_ci - Specifies the name of the BPF iterator. For example: ``bpf_map``, 22362306a36Sopenharmony_ci ``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel. 22462306a36Sopenharmony_ci * - attach_target and detach_target 22562306a36Sopenharmony_ci - Allows for target specific ``link_create`` action since some targets 22662306a36Sopenharmony_ci may need special processing. Called during the user space link_create stage. 22762306a36Sopenharmony_ci * - show_fdinfo and fill_link_info 22862306a36Sopenharmony_ci - Called to fill target specific information when user tries to get link 22962306a36Sopenharmony_ci info associated with the iterator. 23062306a36Sopenharmony_ci * - get_func_proto 23162306a36Sopenharmony_ci - Permits a BPF iterator to access BPF helpers specific to the iterator. 23262306a36Sopenharmony_ci * - ctx_arg_info_size and ctx_arg_info 23362306a36Sopenharmony_ci - Specifies the verifier states for BPF program arguments associated with 23462306a36Sopenharmony_ci the bpf iterator. 23562306a36Sopenharmony_ci * - feature 23662306a36Sopenharmony_ci - Specifies certain action requests in the kernel BPF iterator 23762306a36Sopenharmony_ci infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means 23862306a36Sopenharmony_ci that the kernel function cond_resched() is called to avoid other kernel 23962306a36Sopenharmony_ci subsystem (e.g., rcu) misbehaving. 24062306a36Sopenharmony_ci * - seq_info 24162306a36Sopenharmony_ci - Specifies the set of seq operations for the BPF iterator and helpers to 24262306a36Sopenharmony_ci initialize/free the private data for the corresponding ``seq_file``. 24362306a36Sopenharmony_ci 24462306a36Sopenharmony_ci`Click here 24562306a36Sopenharmony_ci<https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_ 24662306a36Sopenharmony_cito see an implementation of the ``task_vma`` BPF iterator in the kernel. 24762306a36Sopenharmony_ci 24862306a36Sopenharmony_ci--------------------------------- 24962306a36Sopenharmony_ciParameterizing BPF Task Iterators 25062306a36Sopenharmony_ci--------------------------------- 25162306a36Sopenharmony_ci 25262306a36Sopenharmony_ciBy default, BPF iterators walk through all the objects of the specified types 25362306a36Sopenharmony_ci(processes, cgroups, maps, etc.) across the entire system to read relevant 25462306a36Sopenharmony_cikernel data. But often, there are cases where we only care about a much smaller 25562306a36Sopenharmony_cisubset of iterable kernel objects, such as only iterating tasks within a 25662306a36Sopenharmony_cispecific process. Therefore, BPF iterator programs support filtering out objects 25762306a36Sopenharmony_cifrom iteration by allowing user space to configure the iterator program when it 25862306a36Sopenharmony_ciis attached. 25962306a36Sopenharmony_ci 26062306a36Sopenharmony_ci-------------------------- 26162306a36Sopenharmony_ciBPF Task Iterator Program 26262306a36Sopenharmony_ci-------------------------- 26362306a36Sopenharmony_ci 26462306a36Sopenharmony_ciThe following code is a BPF iterator program to print files and task information 26562306a36Sopenharmony_cithrough the ``seq_file`` of the iterator. It is a standard BPF iterator program 26662306a36Sopenharmony_cithat visits every file of an iterator. We will use this BPF program in our 26762306a36Sopenharmony_ciexample later. 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci:: 27062306a36Sopenharmony_ci 27162306a36Sopenharmony_ci #include <vmlinux.h> 27262306a36Sopenharmony_ci #include <bpf/bpf_helpers.h> 27362306a36Sopenharmony_ci 27462306a36Sopenharmony_ci char _license[] SEC("license") = "GPL"; 27562306a36Sopenharmony_ci 27662306a36Sopenharmony_ci SEC("iter/task_file") 27762306a36Sopenharmony_ci int dump_task_file(struct bpf_iter__task_file *ctx) 27862306a36Sopenharmony_ci { 27962306a36Sopenharmony_ci struct seq_file *seq = ctx->meta->seq; 28062306a36Sopenharmony_ci struct task_struct *task = ctx->task; 28162306a36Sopenharmony_ci struct file *file = ctx->file; 28262306a36Sopenharmony_ci __u32 fd = ctx->fd; 28362306a36Sopenharmony_ci if (task == NULL || file == NULL) 28462306a36Sopenharmony_ci return 0; 28562306a36Sopenharmony_ci if (ctx->meta->seq_num == 0) { 28662306a36Sopenharmony_ci BPF_SEQ_PRINTF(seq, " tgid pid fd file\n"); 28762306a36Sopenharmony_ci } 28862306a36Sopenharmony_ci BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 28962306a36Sopenharmony_ci (long)file->f_op); 29062306a36Sopenharmony_ci return 0; 29162306a36Sopenharmony_ci } 29262306a36Sopenharmony_ci 29362306a36Sopenharmony_ci---------------------------------------- 29462306a36Sopenharmony_ciCreating a File Iterator with Parameters 29562306a36Sopenharmony_ci---------------------------------------- 29662306a36Sopenharmony_ci 29762306a36Sopenharmony_ciNow, let us look at how to create an iterator that includes only files of a 29862306a36Sopenharmony_ciprocess. 29962306a36Sopenharmony_ci 30062306a36Sopenharmony_ciFirst, fill the ``bpf_iter_attach_opts`` struct as shown below: 30162306a36Sopenharmony_ci 30262306a36Sopenharmony_ci:: 30362306a36Sopenharmony_ci 30462306a36Sopenharmony_ci LIBBPF_OPTS(bpf_iter_attach_opts, opts); 30562306a36Sopenharmony_ci union bpf_iter_link_info linfo; 30662306a36Sopenharmony_ci memset(&linfo, 0, sizeof(linfo)); 30762306a36Sopenharmony_ci linfo.task.pid = getpid(); 30862306a36Sopenharmony_ci opts.link_info = &linfo; 30962306a36Sopenharmony_ci opts.link_info_len = sizeof(linfo); 31062306a36Sopenharmony_ci 31162306a36Sopenharmony_ci``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator 31262306a36Sopenharmony_cithat only includes opened files for the process with the specified ``pid``. In 31362306a36Sopenharmony_cithis example, we will only be iterating files for our process. If 31462306a36Sopenharmony_ci``linfo.task.pid`` is zero, the iterator will visit every opened file of every 31562306a36Sopenharmony_ciprocess. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator 31662306a36Sopenharmony_cithat visits opened files of a specific thread, not a process. In this example, 31762306a36Sopenharmony_ci``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a 31862306a36Sopenharmony_ciseparate file descriptor table. In most circumstances, all process threads share 31962306a36Sopenharmony_cia single file descriptor table. 32062306a36Sopenharmony_ci 32162306a36Sopenharmony_ciNow, in the userspace program, pass the pointer of struct to the 32262306a36Sopenharmony_ci``bpf_program__attach_iter()``. 32362306a36Sopenharmony_ci 32462306a36Sopenharmony_ci:: 32562306a36Sopenharmony_ci 32662306a36Sopenharmony_ci link = bpf_program__attach_iter(prog, &opts); iter_fd = 32762306a36Sopenharmony_ci bpf_iter_create(bpf_link__fd(link)); 32862306a36Sopenharmony_ci 32962306a36Sopenharmony_ciIf both *tid* and *pid* are zero, an iterator created from this struct 33062306a36Sopenharmony_ci``bpf_iter_attach_opts`` will include every opened file of every task in the 33162306a36Sopenharmony_cisystem (in the namespace, actually.) It is the same as passing a NULL as the 33262306a36Sopenharmony_cisecond argument to ``bpf_program__attach_iter()``. 33362306a36Sopenharmony_ci 33462306a36Sopenharmony_ciThe whole program looks like the following code: 33562306a36Sopenharmony_ci 33662306a36Sopenharmony_ci:: 33762306a36Sopenharmony_ci 33862306a36Sopenharmony_ci #include <stdio.h> 33962306a36Sopenharmony_ci #include <unistd.h> 34062306a36Sopenharmony_ci #include <bpf/bpf.h> 34162306a36Sopenharmony_ci #include <bpf/libbpf.h> 34262306a36Sopenharmony_ci #include "bpf_iter_task_ex.skel.h" 34362306a36Sopenharmony_ci 34462306a36Sopenharmony_ci static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts) 34562306a36Sopenharmony_ci { 34662306a36Sopenharmony_ci struct bpf_link *link; 34762306a36Sopenharmony_ci char buf[16] = {}; 34862306a36Sopenharmony_ci int iter_fd = -1, len; 34962306a36Sopenharmony_ci int ret = 0; 35062306a36Sopenharmony_ci 35162306a36Sopenharmony_ci link = bpf_program__attach_iter(prog, opts); 35262306a36Sopenharmony_ci if (!link) { 35362306a36Sopenharmony_ci fprintf(stderr, "bpf_program__attach_iter() fails\n"); 35462306a36Sopenharmony_ci return -1; 35562306a36Sopenharmony_ci } 35662306a36Sopenharmony_ci iter_fd = bpf_iter_create(bpf_link__fd(link)); 35762306a36Sopenharmony_ci if (iter_fd < 0) { 35862306a36Sopenharmony_ci fprintf(stderr, "bpf_iter_create() fails\n"); 35962306a36Sopenharmony_ci ret = -1; 36062306a36Sopenharmony_ci goto free_link; 36162306a36Sopenharmony_ci } 36262306a36Sopenharmony_ci /* not check contents, but ensure read() ends without error */ 36362306a36Sopenharmony_ci while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) { 36462306a36Sopenharmony_ci buf[len] = 0; 36562306a36Sopenharmony_ci printf("%s", buf); 36662306a36Sopenharmony_ci } 36762306a36Sopenharmony_ci printf("\n"); 36862306a36Sopenharmony_ci free_link: 36962306a36Sopenharmony_ci if (iter_fd >= 0) 37062306a36Sopenharmony_ci close(iter_fd); 37162306a36Sopenharmony_ci bpf_link__destroy(link); 37262306a36Sopenharmony_ci return 0; 37362306a36Sopenharmony_ci } 37462306a36Sopenharmony_ci 37562306a36Sopenharmony_ci static void test_task_file(void) 37662306a36Sopenharmony_ci { 37762306a36Sopenharmony_ci LIBBPF_OPTS(bpf_iter_attach_opts, opts); 37862306a36Sopenharmony_ci struct bpf_iter_task_ex *skel; 37962306a36Sopenharmony_ci union bpf_iter_link_info linfo; 38062306a36Sopenharmony_ci skel = bpf_iter_task_ex__open_and_load(); 38162306a36Sopenharmony_ci if (skel == NULL) 38262306a36Sopenharmony_ci return; 38362306a36Sopenharmony_ci memset(&linfo, 0, sizeof(linfo)); 38462306a36Sopenharmony_ci linfo.task.pid = getpid(); 38562306a36Sopenharmony_ci opts.link_info = &linfo; 38662306a36Sopenharmony_ci opts.link_info_len = sizeof(linfo); 38762306a36Sopenharmony_ci printf("PID %d\n", getpid()); 38862306a36Sopenharmony_ci do_read_opts(skel->progs.dump_task_file, &opts); 38962306a36Sopenharmony_ci bpf_iter_task_ex__destroy(skel); 39062306a36Sopenharmony_ci } 39162306a36Sopenharmony_ci 39262306a36Sopenharmony_ci int main(int argc, const char * const * argv) 39362306a36Sopenharmony_ci { 39462306a36Sopenharmony_ci test_task_file(); 39562306a36Sopenharmony_ci return 0; 39662306a36Sopenharmony_ci } 39762306a36Sopenharmony_ci 39862306a36Sopenharmony_ciThe following lines are the output of the program. 39962306a36Sopenharmony_ci:: 40062306a36Sopenharmony_ci 40162306a36Sopenharmony_ci PID 1859 40262306a36Sopenharmony_ci 40362306a36Sopenharmony_ci tgid pid fd file 40462306a36Sopenharmony_ci 1859 1859 0 ffffffff82270aa0 40562306a36Sopenharmony_ci 1859 1859 1 ffffffff82270aa0 40662306a36Sopenharmony_ci 1859 1859 2 ffffffff82270aa0 40762306a36Sopenharmony_ci 1859 1859 3 ffffffff82272980 40862306a36Sopenharmony_ci 1859 1859 4 ffffffff8225e120 40962306a36Sopenharmony_ci 1859 1859 5 ffffffff82255120 41062306a36Sopenharmony_ci 1859 1859 6 ffffffff82254f00 41162306a36Sopenharmony_ci 1859 1859 7 ffffffff82254d80 41262306a36Sopenharmony_ci 1859 1859 8 ffffffff8225abe0 41362306a36Sopenharmony_ci 41462306a36Sopenharmony_ci------------------ 41562306a36Sopenharmony_ciWithout Parameters 41662306a36Sopenharmony_ci------------------ 41762306a36Sopenharmony_ci 41862306a36Sopenharmony_ciLet us look at how a BPF iterator without parameters skips files of other 41962306a36Sopenharmony_ciprocesses in the system. In this case, the BPF program has to check the pid or 42062306a36Sopenharmony_cithe tid of tasks, or it will receive every opened file in the system (in the 42162306a36Sopenharmony_cicurrent *pid* namespace, actually). So, we usually add a global variable in the 42262306a36Sopenharmony_ciBPF program to pass a *pid* to the BPF program. 42362306a36Sopenharmony_ci 42462306a36Sopenharmony_ciThe BPF program would look like the following block. 42562306a36Sopenharmony_ci 42662306a36Sopenharmony_ci :: 42762306a36Sopenharmony_ci 42862306a36Sopenharmony_ci ...... 42962306a36Sopenharmony_ci int target_pid = 0; 43062306a36Sopenharmony_ci 43162306a36Sopenharmony_ci SEC("iter/task_file") 43262306a36Sopenharmony_ci int dump_task_file(struct bpf_iter__task_file *ctx) 43362306a36Sopenharmony_ci { 43462306a36Sopenharmony_ci ...... 43562306a36Sopenharmony_ci if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */ 43662306a36Sopenharmony_ci return 0; 43762306a36Sopenharmony_ci BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 43862306a36Sopenharmony_ci (long)file->f_op); 43962306a36Sopenharmony_ci return 0; 44062306a36Sopenharmony_ci } 44162306a36Sopenharmony_ci 44262306a36Sopenharmony_ciThe user space program would look like the following block: 44362306a36Sopenharmony_ci 44462306a36Sopenharmony_ci :: 44562306a36Sopenharmony_ci 44662306a36Sopenharmony_ci ...... 44762306a36Sopenharmony_ci static void test_task_file(void) 44862306a36Sopenharmony_ci { 44962306a36Sopenharmony_ci ...... 45062306a36Sopenharmony_ci skel = bpf_iter_task_ex__open_and_load(); 45162306a36Sopenharmony_ci if (skel == NULL) 45262306a36Sopenharmony_ci return; 45362306a36Sopenharmony_ci skel->bss->target_pid = getpid(); /* process ID. For thread id, use gettid() */ 45462306a36Sopenharmony_ci memset(&linfo, 0, sizeof(linfo)); 45562306a36Sopenharmony_ci linfo.task.pid = getpid(); 45662306a36Sopenharmony_ci opts.link_info = &linfo; 45762306a36Sopenharmony_ci opts.link_info_len = sizeof(linfo); 45862306a36Sopenharmony_ci ...... 45962306a36Sopenharmony_ci } 46062306a36Sopenharmony_ci 46162306a36Sopenharmony_ci``target_pid`` is a global variable in the BPF program. The user space program 46262306a36Sopenharmony_cishould initialize the variable with a process ID to skip opened files of other 46362306a36Sopenharmony_ciprocesses in the BPF program. When you parametrize a BPF iterator, the iterator 46462306a36Sopenharmony_cicalls the BPF program fewer times which can save significant resources. 46562306a36Sopenharmony_ci 46662306a36Sopenharmony_ci--------------------------- 46762306a36Sopenharmony_ciParametrizing VMA Iterators 46862306a36Sopenharmony_ci--------------------------- 46962306a36Sopenharmony_ci 47062306a36Sopenharmony_ciBy default, a BPF VMA iterator includes every VMA in every process. However, 47162306a36Sopenharmony_ciyou can still specify a process or a thread to include only its VMAs. Unlike 47262306a36Sopenharmony_cifiles, a thread can not have a separate address space (since Linux 2.6.0-test6). 47362306a36Sopenharmony_ciHere, using *tid* makes no difference from using *pid*. 47462306a36Sopenharmony_ci 47562306a36Sopenharmony_ci---------------------------- 47662306a36Sopenharmony_ciParametrizing Task Iterators 47762306a36Sopenharmony_ci---------------------------- 47862306a36Sopenharmony_ci 47962306a36Sopenharmony_ciA BPF task iterator with *pid* includes all tasks (threads) of a process. The 48062306a36Sopenharmony_ciBPF program receives these tasks one after another. You can specify a BPF task 48162306a36Sopenharmony_ciiterator with *tid* parameter to include only the tasks that match the given 48262306a36Sopenharmony_ci*tid*. 483