18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci=============================== 48c2ecf20Sopenharmony_ciKernel level exception handling 58c2ecf20Sopenharmony_ci=============================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciCommentary by Joerg Pommnitz <joerg@raleigh.ibm.com> 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciWhen a process runs in kernel mode, it often has to access user 108c2ecf20Sopenharmony_cimode memory whose address has been passed by an untrusted program. 118c2ecf20Sopenharmony_ciTo protect itself the kernel has to verify this address. 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciIn older versions of Linux this was done with the 148c2ecf20Sopenharmony_ciint verify_area(int type, const void * addr, unsigned long size) 158c2ecf20Sopenharmony_cifunction (which has since been replaced by access_ok()). 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciThis function verified that the memory area starting at address 188c2ecf20Sopenharmony_ci'addr' and of size 'size' was accessible for the operation specified 198c2ecf20Sopenharmony_ciin type (read or write). To do this, verify_read had to look up the 208c2ecf20Sopenharmony_civirtual memory area (vma) that contained the address addr. In the 218c2ecf20Sopenharmony_cinormal case (correctly working program), this test was successful. 228c2ecf20Sopenharmony_ciIt only failed for a few buggy programs. In some kernel profiling 238c2ecf20Sopenharmony_citests, this normally unneeded verification used up a considerable 248c2ecf20Sopenharmony_ciamount of time. 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_ciTo overcome this situation, Linus decided to let the virtual memory 278c2ecf20Sopenharmony_cihardware present in every Linux-capable CPU handle this test. 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_ciHow does this work? 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciWhenever the kernel tries to access an address that is currently not 328c2ecf20Sopenharmony_ciaccessible, the CPU generates a page fault exception and calls the 338c2ecf20Sopenharmony_cipage fault handler:: 348c2ecf20Sopenharmony_ci 358c2ecf20Sopenharmony_ci void do_page_fault(struct pt_regs *regs, unsigned long error_code) 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ciin arch/x86/mm/fault.c. The parameters on the stack are set up by 388c2ecf20Sopenharmony_cithe low level assembly glue in arch/x86/entry/entry_32.S. The parameter 398c2ecf20Sopenharmony_ciregs is a pointer to the saved registers on the stack, error_code 408c2ecf20Sopenharmony_cicontains a reason code for the exception. 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_cido_page_fault first obtains the unaccessible address from the CPU 438c2ecf20Sopenharmony_cicontrol register CR2. If the address is within the virtual address 448c2ecf20Sopenharmony_cispace of the process, the fault probably occurred, because the page 458c2ecf20Sopenharmony_ciwas not swapped in, write protected or something similar. However, 468c2ecf20Sopenharmony_ciwe are interested in the other case: the address is not valid, there 478c2ecf20Sopenharmony_ciis no vma that contains this address. In this case, the kernel jumps 488c2ecf20Sopenharmony_cito the bad_area label. 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ciThere it uses the address of the instruction that caused the exception 518c2ecf20Sopenharmony_ci(i.e. regs->eip) to find an address where the execution can continue 528c2ecf20Sopenharmony_ci(fixup). If this search is successful, the fault handler modifies the 538c2ecf20Sopenharmony_cireturn address (again regs->eip) and returns. The execution will 548c2ecf20Sopenharmony_cicontinue at the address in fixup. 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_ciWhere does fixup point to? 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_ciSince we jump to the contents of fixup, fixup obviously points 598c2ecf20Sopenharmony_cito executable code. This code is hidden inside the user access macros. 608c2ecf20Sopenharmony_ciI have picked the get_user macro defined in arch/x86/include/asm/uaccess.h 618c2ecf20Sopenharmony_cias an example. The definition is somewhat hard to follow, so let's peek at 628c2ecf20Sopenharmony_cithe code generated by the preprocessor and the compiler. I selected 638c2ecf20Sopenharmony_cithe get_user call in drivers/char/sysrq.c for a detailed examination. 648c2ecf20Sopenharmony_ci 658c2ecf20Sopenharmony_ciThe original code in sysrq.c line 587:: 668c2ecf20Sopenharmony_ci 678c2ecf20Sopenharmony_ci get_user(c, buf); 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciThe preprocessor output (edited to become somewhat readable):: 708c2ecf20Sopenharmony_ci 718c2ecf20Sopenharmony_ci ( 728c2ecf20Sopenharmony_ci { 738c2ecf20Sopenharmony_ci long __gu_err = - 14 , __gu_val = 0; 748c2ecf20Sopenharmony_ci const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); 758c2ecf20Sopenharmony_ci if (((((0 + current_set[0])->tss.segment) == 0x18 ) || 768c2ecf20Sopenharmony_ci (((sizeof(*(buf))) <= 0xC0000000UL) && 778c2ecf20Sopenharmony_ci ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) 788c2ecf20Sopenharmony_ci do { 798c2ecf20Sopenharmony_ci __gu_err = 0; 808c2ecf20Sopenharmony_ci switch ((sizeof(*(buf)))) { 818c2ecf20Sopenharmony_ci case 1: 828c2ecf20Sopenharmony_ci __asm__ __volatile__( 838c2ecf20Sopenharmony_ci "1: mov" "b" " %2,%" "b" "1\n" 848c2ecf20Sopenharmony_ci "2:\n" 858c2ecf20Sopenharmony_ci ".section .fixup,\"ax\"\n" 868c2ecf20Sopenharmony_ci "3: movl %3,%0\n" 878c2ecf20Sopenharmony_ci " xor" "b" " %" "b" "1,%" "b" "1\n" 888c2ecf20Sopenharmony_ci " jmp 2b\n" 898c2ecf20Sopenharmony_ci ".section __ex_table,\"a\"\n" 908c2ecf20Sopenharmony_ci " .align 4\n" 918c2ecf20Sopenharmony_ci " .long 1b,3b\n" 928c2ecf20Sopenharmony_ci ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) 938c2ecf20Sopenharmony_ci ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; 948c2ecf20Sopenharmony_ci break; 958c2ecf20Sopenharmony_ci case 2: 968c2ecf20Sopenharmony_ci __asm__ __volatile__( 978c2ecf20Sopenharmony_ci "1: mov" "w" " %2,%" "w" "1\n" 988c2ecf20Sopenharmony_ci "2:\n" 998c2ecf20Sopenharmony_ci ".section .fixup,\"ax\"\n" 1008c2ecf20Sopenharmony_ci "3: movl %3,%0\n" 1018c2ecf20Sopenharmony_ci " xor" "w" " %" "w" "1,%" "w" "1\n" 1028c2ecf20Sopenharmony_ci " jmp 2b\n" 1038c2ecf20Sopenharmony_ci ".section __ex_table,\"a\"\n" 1048c2ecf20Sopenharmony_ci " .align 4\n" 1058c2ecf20Sopenharmony_ci " .long 1b,3b\n" 1068c2ecf20Sopenharmony_ci ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) 1078c2ecf20Sopenharmony_ci ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); 1088c2ecf20Sopenharmony_ci break; 1098c2ecf20Sopenharmony_ci case 4: 1108c2ecf20Sopenharmony_ci __asm__ __volatile__( 1118c2ecf20Sopenharmony_ci "1: mov" "l" " %2,%" "" "1\n" 1128c2ecf20Sopenharmony_ci "2:\n" 1138c2ecf20Sopenharmony_ci ".section .fixup,\"ax\"\n" 1148c2ecf20Sopenharmony_ci "3: movl %3,%0\n" 1158c2ecf20Sopenharmony_ci " xor" "l" " %" "" "1,%" "" "1\n" 1168c2ecf20Sopenharmony_ci " jmp 2b\n" 1178c2ecf20Sopenharmony_ci ".section __ex_table,\"a\"\n" 1188c2ecf20Sopenharmony_ci " .align 4\n" " .long 1b,3b\n" 1198c2ecf20Sopenharmony_ci ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) 1208c2ecf20Sopenharmony_ci ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); 1218c2ecf20Sopenharmony_ci break; 1228c2ecf20Sopenharmony_ci default: 1238c2ecf20Sopenharmony_ci (__gu_val) = __get_user_bad(); 1248c2ecf20Sopenharmony_ci } 1258c2ecf20Sopenharmony_ci } while (0) ; 1268c2ecf20Sopenharmony_ci ((c)) = (__typeof__(*((buf))))__gu_val; 1278c2ecf20Sopenharmony_ci __gu_err; 1288c2ecf20Sopenharmony_ci } 1298c2ecf20Sopenharmony_ci ); 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_ciWOW! Black GCC/assembly magic. This is impossible to follow, so let's 1328c2ecf20Sopenharmony_cisee what code gcc generates:: 1338c2ecf20Sopenharmony_ci 1348c2ecf20Sopenharmony_ci > xorl %edx,%edx 1358c2ecf20Sopenharmony_ci > movl current_set,%eax 1368c2ecf20Sopenharmony_ci > cmpl $24,788(%eax) 1378c2ecf20Sopenharmony_ci > je .L1424 1388c2ecf20Sopenharmony_ci > cmpl $-1073741825,64(%esp) 1398c2ecf20Sopenharmony_ci > ja .L1423 1408c2ecf20Sopenharmony_ci > .L1424: 1418c2ecf20Sopenharmony_ci > movl %edx,%eax 1428c2ecf20Sopenharmony_ci > movl 64(%esp),%ebx 1438c2ecf20Sopenharmony_ci > #APP 1448c2ecf20Sopenharmony_ci > 1: movb (%ebx),%dl /* this is the actual user access */ 1458c2ecf20Sopenharmony_ci > 2: 1468c2ecf20Sopenharmony_ci > .section .fixup,"ax" 1478c2ecf20Sopenharmony_ci > 3: movl $-14,%eax 1488c2ecf20Sopenharmony_ci > xorb %dl,%dl 1498c2ecf20Sopenharmony_ci > jmp 2b 1508c2ecf20Sopenharmony_ci > .section __ex_table,"a" 1518c2ecf20Sopenharmony_ci > .align 4 1528c2ecf20Sopenharmony_ci > .long 1b,3b 1538c2ecf20Sopenharmony_ci > .text 1548c2ecf20Sopenharmony_ci > #NO_APP 1558c2ecf20Sopenharmony_ci > .L1423: 1568c2ecf20Sopenharmony_ci > movzbl %dl,%esi 1578c2ecf20Sopenharmony_ci 1588c2ecf20Sopenharmony_ciThe optimizer does a good job and gives us something we can actually 1598c2ecf20Sopenharmony_ciunderstand. Can we? The actual user access is quite obvious. Thanks 1608c2ecf20Sopenharmony_cito the unified address space we can just access the address in user 1618c2ecf20Sopenharmony_cimemory. But what does the .section stuff do????? 1628c2ecf20Sopenharmony_ci 1638c2ecf20Sopenharmony_ciTo understand this we have to look at the final kernel:: 1648c2ecf20Sopenharmony_ci 1658c2ecf20Sopenharmony_ci > objdump --section-headers vmlinux 1668c2ecf20Sopenharmony_ci > 1678c2ecf20Sopenharmony_ci > vmlinux: file format elf32-i386 1688c2ecf20Sopenharmony_ci > 1698c2ecf20Sopenharmony_ci > Sections: 1708c2ecf20Sopenharmony_ci > Idx Name Size VMA LMA File off Algn 1718c2ecf20Sopenharmony_ci > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 1728c2ecf20Sopenharmony_ci > CONTENTS, ALLOC, LOAD, READONLY, CODE 1738c2ecf20Sopenharmony_ci > 1 .fixup 000016bc c0198f40 c0198f40 00099f40 2**0 1748c2ecf20Sopenharmony_ci > CONTENTS, ALLOC, LOAD, READONLY, CODE 1758c2ecf20Sopenharmony_ci > 2 .rodata 0000f127 c019a5fc c019a5fc 0009b5fc 2**2 1768c2ecf20Sopenharmony_ci > CONTENTS, ALLOC, LOAD, READONLY, DATA 1778c2ecf20Sopenharmony_ci > 3 __ex_table 000015c0 c01a9724 c01a9724 000aa724 2**2 1788c2ecf20Sopenharmony_ci > CONTENTS, ALLOC, LOAD, READONLY, DATA 1798c2ecf20Sopenharmony_ci > 4 .data 0000ea58 c01abcf0 c01abcf0 000abcf0 2**4 1808c2ecf20Sopenharmony_ci > CONTENTS, ALLOC, LOAD, DATA 1818c2ecf20Sopenharmony_ci > 5 .bss 00018e21 c01ba748 c01ba748 000ba748 2**2 1828c2ecf20Sopenharmony_ci > ALLOC 1838c2ecf20Sopenharmony_ci > 6 .comment 00000ec4 00000000 00000000 000ba748 2**0 1848c2ecf20Sopenharmony_ci > CONTENTS, READONLY 1858c2ecf20Sopenharmony_ci > 7 .note 00001068 00000ec4 00000ec4 000bb60c 2**0 1868c2ecf20Sopenharmony_ci > CONTENTS, READONLY 1878c2ecf20Sopenharmony_ci 1888c2ecf20Sopenharmony_ciThere are obviously 2 non standard ELF sections in the generated object 1898c2ecf20Sopenharmony_cifile. But first we want to find out what happened to our code in the 1908c2ecf20Sopenharmony_cifinal kernel executable:: 1918c2ecf20Sopenharmony_ci 1928c2ecf20Sopenharmony_ci > objdump --disassemble --section=.text vmlinux 1938c2ecf20Sopenharmony_ci > 1948c2ecf20Sopenharmony_ci > c017e785 <do_con_write+c1> xorl %edx,%edx 1958c2ecf20Sopenharmony_ci > c017e787 <do_con_write+c3> movl 0xc01c7bec,%eax 1968c2ecf20Sopenharmony_ci > c017e78c <do_con_write+c8> cmpl $0x18,0x314(%eax) 1978c2ecf20Sopenharmony_ci > c017e793 <do_con_write+cf> je c017e79f <do_con_write+db> 1988c2ecf20Sopenharmony_ci > c017e795 <do_con_write+d1> cmpl $0xbfffffff,0x40(%esp,1) 1998c2ecf20Sopenharmony_ci > c017e79d <do_con_write+d9> ja c017e7a7 <do_con_write+e3> 2008c2ecf20Sopenharmony_ci > c017e79f <do_con_write+db> movl %edx,%eax 2018c2ecf20Sopenharmony_ci > c017e7a1 <do_con_write+dd> movl 0x40(%esp,1),%ebx 2028c2ecf20Sopenharmony_ci > c017e7a5 <do_con_write+e1> movb (%ebx),%dl 2038c2ecf20Sopenharmony_ci > c017e7a7 <do_con_write+e3> movzbl %dl,%esi 2048c2ecf20Sopenharmony_ci 2058c2ecf20Sopenharmony_ciThe whole user memory access is reduced to 10 x86 machine instructions. 2068c2ecf20Sopenharmony_ciThe instructions bracketed in the .section directives are no longer 2078c2ecf20Sopenharmony_ciin the normal execution path. They are located in a different section 2088c2ecf20Sopenharmony_ciof the executable file:: 2098c2ecf20Sopenharmony_ci 2108c2ecf20Sopenharmony_ci > objdump --disassemble --section=.fixup vmlinux 2118c2ecf20Sopenharmony_ci > 2128c2ecf20Sopenharmony_ci > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax 2138c2ecf20Sopenharmony_ci > c0199ffa <.fixup+10ba> xorb %dl,%dl 2148c2ecf20Sopenharmony_ci > c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3> 2158c2ecf20Sopenharmony_ci 2168c2ecf20Sopenharmony_ciAnd finally:: 2178c2ecf20Sopenharmony_ci 2188c2ecf20Sopenharmony_ci > objdump --full-contents --section=__ex_table vmlinux 2198c2ecf20Sopenharmony_ci > 2208c2ecf20Sopenharmony_ci > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ 2218c2ecf20Sopenharmony_ci > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ 2228c2ecf20Sopenharmony_ci > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ 2238c2ecf20Sopenharmony_ci 2248c2ecf20Sopenharmony_cior in human readable byte order:: 2258c2ecf20Sopenharmony_ci 2268c2ecf20Sopenharmony_ci > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................ 2278c2ecf20Sopenharmony_ci > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ 2288c2ecf20Sopenharmony_ci ^^^^^^^^^^^^^^^^^ 2298c2ecf20Sopenharmony_ci this is the interesting part! 2308c2ecf20Sopenharmony_ci > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................ 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciWhat happened? The assembly directives:: 2338c2ecf20Sopenharmony_ci 2348c2ecf20Sopenharmony_ci .section .fixup,"ax" 2358c2ecf20Sopenharmony_ci .section __ex_table,"a" 2368c2ecf20Sopenharmony_ci 2378c2ecf20Sopenharmony_citold the assembler to move the following code to the specified 2388c2ecf20Sopenharmony_cisections in the ELF object file. So the instructions:: 2398c2ecf20Sopenharmony_ci 2408c2ecf20Sopenharmony_ci 3: movl $-14,%eax 2418c2ecf20Sopenharmony_ci xorb %dl,%dl 2428c2ecf20Sopenharmony_ci jmp 2b 2438c2ecf20Sopenharmony_ci 2448c2ecf20Sopenharmony_ciended up in the .fixup section of the object file and the addresses:: 2458c2ecf20Sopenharmony_ci 2468c2ecf20Sopenharmony_ci .long 1b,3b 2478c2ecf20Sopenharmony_ci 2488c2ecf20Sopenharmony_ciended up in the __ex_table section of the object file. 1b and 3b 2498c2ecf20Sopenharmony_ciare local labels. The local label 1b (1b stands for next label 1 2508c2ecf20Sopenharmony_cibackward) is the address of the instruction that might fault, i.e. 2518c2ecf20Sopenharmony_ciin our case the address of the label 1 is c017e7a5: 2528c2ecf20Sopenharmony_cithe original assembly code: > 1: movb (%ebx),%dl 2538c2ecf20Sopenharmony_ciand linked in vmlinux : > c017e7a5 <do_con_write+e1> movb (%ebx),%dl 2548c2ecf20Sopenharmony_ci 2558c2ecf20Sopenharmony_ciThe local label 3 (backwards again) is the address of the code to handle 2568c2ecf20Sopenharmony_cithe fault, in our case the actual value is c0199ff5: 2578c2ecf20Sopenharmony_cithe original assembly code: > 3: movl $-14,%eax 2588c2ecf20Sopenharmony_ciand linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax 2598c2ecf20Sopenharmony_ci 2608c2ecf20Sopenharmony_ciIf the fixup was able to handle the exception, control flow may be returned 2618c2ecf20Sopenharmony_cito the instruction after the one that triggered the fault, ie. local label 2b. 2628c2ecf20Sopenharmony_ci 2638c2ecf20Sopenharmony_ciThe assembly code:: 2648c2ecf20Sopenharmony_ci 2658c2ecf20Sopenharmony_ci > .section __ex_table,"a" 2668c2ecf20Sopenharmony_ci > .align 4 2678c2ecf20Sopenharmony_ci > .long 1b,3b 2688c2ecf20Sopenharmony_ci 2698c2ecf20Sopenharmony_cibecomes the value pair:: 2708c2ecf20Sopenharmony_ci 2718c2ecf20Sopenharmony_ci > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ 2728c2ecf20Sopenharmony_ci ^this is ^this is 2738c2ecf20Sopenharmony_ci 1b 3b 2748c2ecf20Sopenharmony_ci 2758c2ecf20Sopenharmony_cic017e7a5,c0199ff5 in the exception table of the kernel. 2768c2ecf20Sopenharmony_ci 2778c2ecf20Sopenharmony_ciSo, what actually happens if a fault from kernel mode with no suitable 2788c2ecf20Sopenharmony_civma occurs? 2798c2ecf20Sopenharmony_ci 2808c2ecf20Sopenharmony_ci#. access to invalid address:: 2818c2ecf20Sopenharmony_ci 2828c2ecf20Sopenharmony_ci > c017e7a5 <do_con_write+e1> movb (%ebx),%dl 2838c2ecf20Sopenharmony_ci#. MMU generates exception 2848c2ecf20Sopenharmony_ci#. CPU calls do_page_fault 2858c2ecf20Sopenharmony_ci#. do page fault calls search_exception_table (regs->eip == c017e7a5); 2868c2ecf20Sopenharmony_ci#. search_exception_table looks up the address c017e7a5 in the 2878c2ecf20Sopenharmony_ci exception table (i.e. the contents of the ELF section __ex_table) 2888c2ecf20Sopenharmony_ci and returns the address of the associated fault handle code c0199ff5. 2898c2ecf20Sopenharmony_ci#. do_page_fault modifies its own return address to point to the fault 2908c2ecf20Sopenharmony_ci handle code and returns. 2918c2ecf20Sopenharmony_ci#. execution continues in the fault handling code. 2928c2ecf20Sopenharmony_ci#. a) EAX becomes -EFAULT (== -14) 2938c2ecf20Sopenharmony_ci b) DL becomes zero (the value we "read" from user space) 2948c2ecf20Sopenharmony_ci c) execution continues at local label 2 (address of the 2958c2ecf20Sopenharmony_ci instruction immediately after the faulting user access). 2968c2ecf20Sopenharmony_ci 2978c2ecf20Sopenharmony_ciThe steps 8a to 8c in a certain way emulate the faulting instruction. 2988c2ecf20Sopenharmony_ci 2998c2ecf20Sopenharmony_ciThat's it, mostly. If you look at our example, you might ask why 3008c2ecf20Sopenharmony_ciwe set EAX to -EFAULT in the exception handler code. Well, the 3018c2ecf20Sopenharmony_ciget_user macro actually returns a value: 0, if the user access was 3028c2ecf20Sopenharmony_cisuccessful, -EFAULT on failure. Our original code did not test this 3038c2ecf20Sopenharmony_cireturn value, however the inline assembly code in get_user tries to 3048c2ecf20Sopenharmony_cireturn -EFAULT. GCC selected EAX to return this value. 3058c2ecf20Sopenharmony_ci 3068c2ecf20Sopenharmony_ciNOTE: 3078c2ecf20Sopenharmony_ciDue to the way that the exception table is built and needs to be ordered, 3088c2ecf20Sopenharmony_cionly use exceptions for code in the .text section. Any other section 3098c2ecf20Sopenharmony_ciwill cause the exception table to not be sorted correctly, and the 3108c2ecf20Sopenharmony_ciexceptions will fail. 3118c2ecf20Sopenharmony_ci 3128c2ecf20Sopenharmony_ciThings changed when 64-bit support was added to x86 Linux. Rather than 3138c2ecf20Sopenharmony_cidouble the size of the exception table by expanding the two entries 3148c2ecf20Sopenharmony_cifrom 32-bits to 64 bits, a clever trick was used to store addresses 3158c2ecf20Sopenharmony_cias relative offsets from the table itself. The assembly code changed 3168c2ecf20Sopenharmony_cifrom:: 3178c2ecf20Sopenharmony_ci 3188c2ecf20Sopenharmony_ci .long 1b,3b 3198c2ecf20Sopenharmony_ci to: 3208c2ecf20Sopenharmony_ci .long (from) - . 3218c2ecf20Sopenharmony_ci .long (to) - . 3228c2ecf20Sopenharmony_ci 3238c2ecf20Sopenharmony_ciand the C-code that uses these values converts back to absolute addresses 3248c2ecf20Sopenharmony_cilike this:: 3258c2ecf20Sopenharmony_ci 3268c2ecf20Sopenharmony_ci ex_insn_addr(const struct exception_table_entry *x) 3278c2ecf20Sopenharmony_ci { 3288c2ecf20Sopenharmony_ci return (unsigned long)&x->insn + x->insn; 3298c2ecf20Sopenharmony_ci } 3308c2ecf20Sopenharmony_ci 3318c2ecf20Sopenharmony_ciIn v4.6 the exception table entry was expanded with a new field "handler". 3328c2ecf20Sopenharmony_ciThis is also 32-bits wide and contains a third relative function 3338c2ecf20Sopenharmony_cipointer which points to one of: 3348c2ecf20Sopenharmony_ci 3358c2ecf20Sopenharmony_ci1) ``int ex_handler_default(const struct exception_table_entry *fixup)`` 3368c2ecf20Sopenharmony_ci This is legacy case that just jumps to the fixup code 3378c2ecf20Sopenharmony_ci 3388c2ecf20Sopenharmony_ci2) ``int ex_handler_fault(const struct exception_table_entry *fixup)`` 3398c2ecf20Sopenharmony_ci This case provides the fault number of the trap that occurred at 3408c2ecf20Sopenharmony_ci entry->insn. It is used to distinguish page faults from machine 3418c2ecf20Sopenharmony_ci check. 3428c2ecf20Sopenharmony_ci 3438c2ecf20Sopenharmony_ciMore functions can easily be added. 3448c2ecf20Sopenharmony_ci 3458c2ecf20Sopenharmony_ciCONFIG_BUILDTIME_TABLE_SORT allows the __ex_table section to be sorted post 3468c2ecf20Sopenharmony_cilink of the kernel image, via a host utility scripts/sorttable. It will set the 3478c2ecf20Sopenharmony_cisymbol main_extable_sort_needed to 0, avoiding sorting the __ex_table section 3488c2ecf20Sopenharmony_ciat boot time. With the exception table sorted, at runtime when an exception 3498c2ecf20Sopenharmony_cioccurs we can quickly lookup the __ex_table entry via binary search. 3508c2ecf20Sopenharmony_ci 3518c2ecf20Sopenharmony_ciThis is not just a boot time optimization, some architectures require this 3528c2ecf20Sopenharmony_citable to be sorted in order to handle exceptions relatively early in the boot 3538c2ecf20Sopenharmony_ciprocess. For example, i386 makes use of this form of exception handling before 3548c2ecf20Sopenharmony_cipaging support is even enabled! 355