18c2ecf20Sopenharmony_ci============================ 28c2ecf20Sopenharmony_ciTransactional Memory support 38c2ecf20Sopenharmony_ci============================ 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciPOWER kernel support for this feature is currently limited to supporting 68c2ecf20Sopenharmony_ciits use by user programs. It is not currently used by the kernel itself. 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ciThis file aims to sum up how it is supported by Linux and what behaviour you 98c2ecf20Sopenharmony_cican expect from your user programs. 108c2ecf20Sopenharmony_ci 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ciBasic overview 138c2ecf20Sopenharmony_ci============== 148c2ecf20Sopenharmony_ci 158c2ecf20Sopenharmony_ciHardware Transactional Memory is supported on POWER8 processors, and is a 168c2ecf20Sopenharmony_cifeature that enables a different form of atomic memory access. Several new 178c2ecf20Sopenharmony_ciinstructions are presented to delimit transactions; transactions are 188c2ecf20Sopenharmony_ciguaranteed to either complete atomically or roll back and undo any partial 198c2ecf20Sopenharmony_cichanges. 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_ciA simple transaction looks like this:: 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_ci begin_move_money: 248c2ecf20Sopenharmony_ci tbegin 258c2ecf20Sopenharmony_ci beq abort_handler 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_ci ld r4, SAVINGS_ACCT(r3) 288c2ecf20Sopenharmony_ci ld r5, CURRENT_ACCT(r3) 298c2ecf20Sopenharmony_ci subi r5, r5, 1 308c2ecf20Sopenharmony_ci addi r4, r4, 1 318c2ecf20Sopenharmony_ci std r4, SAVINGS_ACCT(r3) 328c2ecf20Sopenharmony_ci std r5, CURRENT_ACCT(r3) 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ci tend 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ci b continue 378c2ecf20Sopenharmony_ci 388c2ecf20Sopenharmony_ci abort_handler: 398c2ecf20Sopenharmony_ci ... test for odd failures ... 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ci /* Retry the transaction if it failed because it conflicted with 428c2ecf20Sopenharmony_ci * someone else: */ 438c2ecf20Sopenharmony_ci b begin_move_money 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ciThe 'tbegin' instruction denotes the start point, and 'tend' the end point. 478c2ecf20Sopenharmony_ciBetween these points the processor is in 'Transactional' state; any memory 488c2ecf20Sopenharmony_cireferences will complete in one go if there are no conflicts with other 498c2ecf20Sopenharmony_citransactional or non-transactional accesses within the system. In this 508c2ecf20Sopenharmony_ciexample, the transaction completes as though it were normal straight-line code 518c2ecf20Sopenharmony_ciIF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an 528c2ecf20Sopenharmony_ciatomic move of money from the current account to the savings account has been 538c2ecf20Sopenharmony_ciperformed. Even though the normal ld/std instructions are used (note no 548c2ecf20Sopenharmony_cilwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be 558c2ecf20Sopenharmony_ciupdated, or neither will be updated. 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ciIf, in the meantime, there is a conflict with the locations accessed by the 588c2ecf20Sopenharmony_citransaction, the transaction will be aborted by the CPU. Register and memory 598c2ecf20Sopenharmony_cistate will roll back to that at the 'tbegin', and control will continue from 608c2ecf20Sopenharmony_ci'tbegin+4'. The branch to abort_handler will be taken this second time; the 618c2ecf20Sopenharmony_ciabort handler can check the cause of the failure, and retry. 628c2ecf20Sopenharmony_ci 638c2ecf20Sopenharmony_ciCheckpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR 648c2ecf20Sopenharmony_ciand a few other status/flag regs; see the ISA for details. 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_ciCauses of transaction aborts 678c2ecf20Sopenharmony_ci============================ 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ci- Conflicts with cache lines used by other processors 708c2ecf20Sopenharmony_ci- Signals 718c2ecf20Sopenharmony_ci- Context switches 728c2ecf20Sopenharmony_ci- See the ISA for full documentation of everything that will abort transactions. 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ci 758c2ecf20Sopenharmony_ciSyscalls 768c2ecf20Sopenharmony_ci======== 778c2ecf20Sopenharmony_ci 788c2ecf20Sopenharmony_ciSyscalls made from within an active transaction will not be performed and the 798c2ecf20Sopenharmony_citransaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL 808c2ecf20Sopenharmony_ci| TM_CAUSE_PERSISTENT. 818c2ecf20Sopenharmony_ci 828c2ecf20Sopenharmony_ciSyscalls made from within a suspended transaction are performed as normal and 838c2ecf20Sopenharmony_cithe transaction is not explicitly doomed by the kernel. However, what the 848c2ecf20Sopenharmony_cikernel does to perform the syscall may result in the transaction being doomed 858c2ecf20Sopenharmony_ciby the hardware. The syscall is performed in suspended mode so any side 868c2ecf20Sopenharmony_cieffects will be persistent, independent of transaction success or failure. No 878c2ecf20Sopenharmony_ciguarantees are provided by the kernel about which syscalls will affect 888c2ecf20Sopenharmony_citransaction success. 898c2ecf20Sopenharmony_ci 908c2ecf20Sopenharmony_ciCare must be taken when relying on syscalls to abort during active transactions 918c2ecf20Sopenharmony_ciif the calls are made via a library. Libraries may cache values (which may 928c2ecf20Sopenharmony_cigive the appearance of success) or perform operations that cause transaction 938c2ecf20Sopenharmony_cifailure before entering the kernel (which may produce different failure codes). 948c2ecf20Sopenharmony_ciExamples are glibc's getpid() and lazy symbol resolution. 958c2ecf20Sopenharmony_ci 968c2ecf20Sopenharmony_ci 978c2ecf20Sopenharmony_ciSignals 988c2ecf20Sopenharmony_ci======= 998c2ecf20Sopenharmony_ci 1008c2ecf20Sopenharmony_ciDelivery of signals (both sync and async) during transactions provides a second 1018c2ecf20Sopenharmony_cithread state (ucontext/mcontext) to represent the second transactional register 1028c2ecf20Sopenharmony_cistate. Signal delivery 'treclaim's to capture both register states, so signals 1038c2ecf20Sopenharmony_ciabort transactions. The usual ucontext_t passed to the signal handler 1048c2ecf20Sopenharmony_cirepresents the checkpointed/original register state; the signal appears to have 1058c2ecf20Sopenharmony_ciarisen at 'tbegin+4'. 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciIf the sighandler ucontext has uc_link set, a second ucontext has been 1088c2ecf20Sopenharmony_cidelivered. For future compatibility the MSR.TS field should be checked to 1098c2ecf20Sopenharmony_cidetermine the transactional state -- if so, the second ucontext in uc->uc_link 1108c2ecf20Sopenharmony_cirepresents the active transactional registers at the point of the signal. 1118c2ecf20Sopenharmony_ci 1128c2ecf20Sopenharmony_ciFor 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS 1138c2ecf20Sopenharmony_cifield shows the transactional mode. 1148c2ecf20Sopenharmony_ci 1158c2ecf20Sopenharmony_ciFor 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32 1168c2ecf20Sopenharmony_cibits are stored in the MSR of the second ucontext, i.e. in 1178c2ecf20Sopenharmony_ciuc->uc_link->uc_mcontext.regs->msr. The top word contains the transactional 1188c2ecf20Sopenharmony_cistate TS. 1198c2ecf20Sopenharmony_ci 1208c2ecf20Sopenharmony_ciHowever, basic signal handlers don't need to be aware of transactions 1218c2ecf20Sopenharmony_ciand simply returning from the handler will deal with things correctly: 1228c2ecf20Sopenharmony_ci 1238c2ecf20Sopenharmony_ciTransaction-aware signal handlers can read the transactional register state 1248c2ecf20Sopenharmony_cifrom the second ucontext. This will be necessary for crash handlers to 1258c2ecf20Sopenharmony_cidetermine, for example, the address of the instruction causing the SIGSEGV. 1268c2ecf20Sopenharmony_ci 1278c2ecf20Sopenharmony_ciExample signal handler:: 1288c2ecf20Sopenharmony_ci 1298c2ecf20Sopenharmony_ci void crash_handler(int sig, siginfo_t *si, void *uc) 1308c2ecf20Sopenharmony_ci { 1318c2ecf20Sopenharmony_ci ucontext_t *ucp = uc; 1328c2ecf20Sopenharmony_ci ucontext_t *transactional_ucp = ucp->uc_link; 1338c2ecf20Sopenharmony_ci 1348c2ecf20Sopenharmony_ci if (ucp_link) { 1358c2ecf20Sopenharmony_ci u64 msr = ucp->uc_mcontext.regs->msr; 1368c2ecf20Sopenharmony_ci /* May have transactional ucontext! */ 1378c2ecf20Sopenharmony_ci #ifndef __powerpc64__ 1388c2ecf20Sopenharmony_ci msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32; 1398c2ecf20Sopenharmony_ci #endif 1408c2ecf20Sopenharmony_ci if (MSR_TM_ACTIVE(msr)) { 1418c2ecf20Sopenharmony_ci /* Yes, we crashed during a transaction. Oops. */ 1428c2ecf20Sopenharmony_ci fprintf(stderr, "Transaction to be restarted at 0x%llx, but " 1438c2ecf20Sopenharmony_ci "crashy instruction was at 0x%llx\n", 1448c2ecf20Sopenharmony_ci ucp->uc_mcontext.regs->nip, 1458c2ecf20Sopenharmony_ci transactional_ucp->uc_mcontext.regs->nip); 1468c2ecf20Sopenharmony_ci } 1478c2ecf20Sopenharmony_ci } 1488c2ecf20Sopenharmony_ci 1498c2ecf20Sopenharmony_ci fix_the_problem(ucp->dar); 1508c2ecf20Sopenharmony_ci } 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ciWhen in an active transaction that takes a signal, we need to be careful with 1538c2ecf20Sopenharmony_cithe stack. It's possible that the stack has moved back up after the tbegin. 1548c2ecf20Sopenharmony_ciThe obvious case here is when the tbegin is called inside a function that 1558c2ecf20Sopenharmony_cireturns before a tend. In this case, the stack is part of the checkpointed 1568c2ecf20Sopenharmony_citransactional memory state. If we write over this non transactionally or in 1578c2ecf20Sopenharmony_cisuspend, we are in trouble because if we get a tm abort, the program counter and 1588c2ecf20Sopenharmony_cistack pointer will be back at the tbegin but our in memory stack won't be valid 1598c2ecf20Sopenharmony_cianymore. 1608c2ecf20Sopenharmony_ci 1618c2ecf20Sopenharmony_ciTo avoid this, when taking a signal in an active transaction, we need to use 1628c2ecf20Sopenharmony_cithe stack pointer from the checkpointed state, rather than the speculated 1638c2ecf20Sopenharmony_cistate. This ensures that the signal context (written tm suspended) will be 1648c2ecf20Sopenharmony_ciwritten below the stack required for the rollback. The transaction is aborted 1658c2ecf20Sopenharmony_cibecause of the treclaim, so any memory written between the tbegin and the 1668c2ecf20Sopenharmony_cisignal will be rolled back anyway. 1678c2ecf20Sopenharmony_ci 1688c2ecf20Sopenharmony_ciFor signals taken in non-TM or suspended mode, we use the 1698c2ecf20Sopenharmony_cinormal/non-checkpointed stack pointer. 1708c2ecf20Sopenharmony_ci 1718c2ecf20Sopenharmony_ciAny transaction initiated inside a sighandler and suspended on return 1728c2ecf20Sopenharmony_cifrom the sighandler to the kernel will get reclaimed and discarded. 1738c2ecf20Sopenharmony_ci 1748c2ecf20Sopenharmony_ciFailure cause codes used by kernel 1758c2ecf20Sopenharmony_ci================================== 1768c2ecf20Sopenharmony_ci 1778c2ecf20Sopenharmony_ciThese are defined in <asm/reg.h>, and distinguish different reasons why the 1788c2ecf20Sopenharmony_cikernel aborted a transaction: 1798c2ecf20Sopenharmony_ci 1808c2ecf20Sopenharmony_ci ====================== ================================ 1818c2ecf20Sopenharmony_ci TM_CAUSE_RESCHED Thread was rescheduled. 1828c2ecf20Sopenharmony_ci TM_CAUSE_TLBI Software TLB invalid. 1838c2ecf20Sopenharmony_ci TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. 1848c2ecf20Sopenharmony_ci TM_CAUSE_SYSCALL Syscall from active transaction. 1858c2ecf20Sopenharmony_ci TM_CAUSE_SIGNAL Signal delivered. 1868c2ecf20Sopenharmony_ci TM_CAUSE_MISC Currently unused. 1878c2ecf20Sopenharmony_ci TM_CAUSE_ALIGNMENT Alignment fault. 1888c2ecf20Sopenharmony_ci TM_CAUSE_EMULATE Emulation that touched memory. 1898c2ecf20Sopenharmony_ci ====================== ================================ 1908c2ecf20Sopenharmony_ci 1918c2ecf20Sopenharmony_ciThese can be checked by the user program's abort handler as TEXASR[0:7]. If 1928c2ecf20Sopenharmony_cibit 7 is set, it indicates that the error is consider persistent. For example 1938c2ecf20Sopenharmony_cia TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not. 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciGDB 1968c2ecf20Sopenharmony_ci=== 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ciGDB and ptrace are not currently TM-aware. If one stops during a transaction, 1998c2ecf20Sopenharmony_ciit looks like the transaction has just started (the checkpointed state is 2008c2ecf20Sopenharmony_cipresented). The transaction cannot then be continued and will take the failure 2018c2ecf20Sopenharmony_cihandler route. Furthermore, the transactional 2nd register state will be 2028c2ecf20Sopenharmony_ciinaccessible. GDB can currently be used on programs using TM, but not sensibly 2038c2ecf20Sopenharmony_ciin parts within transactions. 2048c2ecf20Sopenharmony_ci 2058c2ecf20Sopenharmony_ciPOWER9 2068c2ecf20Sopenharmony_ci====== 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ciTM on POWER9 has issues with storing the complete register state. This 2098c2ecf20Sopenharmony_ciis described in this commit:: 2108c2ecf20Sopenharmony_ci 2118c2ecf20Sopenharmony_ci commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7 2128c2ecf20Sopenharmony_ci Author: Paul Mackerras <paulus@ozlabs.org> 2138c2ecf20Sopenharmony_ci Date: Wed Mar 21 21:32:01 2018 +1100 2148c2ecf20Sopenharmony_ci KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 2158c2ecf20Sopenharmony_ci 2168c2ecf20Sopenharmony_ciTo account for this different POWER9 chips have TM enabled in 2178c2ecf20Sopenharmony_cidifferent ways. 2188c2ecf20Sopenharmony_ci 2198c2ecf20Sopenharmony_ciOn POWER9N DD2.01 and below, TM is disabled. ie 2208c2ecf20Sopenharmony_ciHWCAP2[PPC_FEATURE2_HTM] is not set. 2218c2ecf20Sopenharmony_ci 2228c2ecf20Sopenharmony_ciOn POWER9N DD2.1 TM is configured by firmware to always abort a 2238c2ecf20Sopenharmony_citransaction when tm suspend occurs. So tsuspend will cause a 2248c2ecf20Sopenharmony_citransaction to be aborted and rolled back. Kernel exceptions will also 2258c2ecf20Sopenharmony_cicause the transaction to be aborted and rolled back and the exception 2268c2ecf20Sopenharmony_ciwill not occur. If userspace constructs a sigcontext that enables TM 2278c2ecf20Sopenharmony_cisuspend, the sigcontext will be rejected by the kernel. This mode is 2288c2ecf20Sopenharmony_ciadvertised to users with HWCAP2[PPC_FEATURE2_HTM_NO_SUSPEND] set. 2298c2ecf20Sopenharmony_ciHWCAP2[PPC_FEATURE2_HTM] is not set in this mode. 2308c2ecf20Sopenharmony_ci 2318c2ecf20Sopenharmony_ciOn POWER9N DD2.2 and above, KVM and POWERVM emulate TM for guests (as 2328c2ecf20Sopenharmony_cidescribed in commit 4bb3c7a0208f), hence TM is enabled for guests 2338c2ecf20Sopenharmony_ciie. HWCAP2[PPC_FEATURE2_HTM] is set for guest userspace. Guests that 2348c2ecf20Sopenharmony_cimakes heavy use of TM suspend (tsuspend or kernel suspend) will result 2358c2ecf20Sopenharmony_ciin traps into the hypervisor and hence will suffer a performance 2368c2ecf20Sopenharmony_cidegradation. Host userspace has TM disabled 2378c2ecf20Sopenharmony_ciie. HWCAP2[PPC_FEATURE2_HTM] is not set. (although we make enable it 2388c2ecf20Sopenharmony_ciat some point in the future if we bring the emulation into host 2398c2ecf20Sopenharmony_ciuserspace context switching). 2408c2ecf20Sopenharmony_ci 2418c2ecf20Sopenharmony_ciPOWER9C DD1.2 and above are only available with POWERVM and hence 2428c2ecf20Sopenharmony_ciLinux only runs as a guest. On these systems TM is emulated like on 2438c2ecf20Sopenharmony_ciPOWER9N DD2.2. 2448c2ecf20Sopenharmony_ci 2458c2ecf20Sopenharmony_ciGuest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and 2468c2ecf20Sopenharmony_ciPOWER9C DD1.2. Since earlier POWER9 processors don't support TM 2478c2ecf20Sopenharmony_ciemulation, migration from POWER8 to POWER9 is not supported there. 2488c2ecf20Sopenharmony_ci 2498c2ecf20Sopenharmony_ciKernel implementation 2508c2ecf20Sopenharmony_ci===================== 2518c2ecf20Sopenharmony_ci 2528c2ecf20Sopenharmony_cih/rfid mtmsrd quirk 2538c2ecf20Sopenharmony_ci------------------- 2548c2ecf20Sopenharmony_ci 2558c2ecf20Sopenharmony_ciAs defined in the ISA, rfid has a quirk which is useful in early 2568c2ecf20Sopenharmony_ciexception handling. When in a userspace transaction and we enter the 2578c2ecf20Sopenharmony_cikernel via some exception, MSR will end up as TM=0 and TS=01 (ie. TM 2588c2ecf20Sopenharmony_cioff but TM suspended). Regularly the kernel will want change bits in 2598c2ecf20Sopenharmony_cithe MSR and will perform an rfid to do this. In this case rfid can 2608c2ecf20Sopenharmony_cihave SRR0 TM = 0 and TS = 00 (ie. TM off and non transaction) and the 2618c2ecf20Sopenharmony_ciresulting MSR will retain TM = 0 and TS=01 from before (ie. stay in 2628c2ecf20Sopenharmony_cisuspend). This is a quirk in the architecture as this would normally 2638c2ecf20Sopenharmony_cibe a transition from TS=01 to TS=00 (ie. suspend -> non transactional) 2648c2ecf20Sopenharmony_ciwhich is an illegal transition. 2658c2ecf20Sopenharmony_ci 2668c2ecf20Sopenharmony_ciThis quirk is described the architecture in the definition of rfid 2678c2ecf20Sopenharmony_ciwith these lines: 2688c2ecf20Sopenharmony_ci 2698c2ecf20Sopenharmony_ci if (MSR 29:31 ¬ = 0b010 | SRR1 29:31 ¬ = 0b000) then 2708c2ecf20Sopenharmony_ci MSR 29:31 <- SRR1 29:31 2718c2ecf20Sopenharmony_ci 2728c2ecf20Sopenharmony_cihrfid and mtmsrd have the same quirk. 2738c2ecf20Sopenharmony_ci 2748c2ecf20Sopenharmony_ciThe Linux kernel uses this quirk in it's early exception handling. 275