18c2ecf20Sopenharmony_ci============================
28c2ecf20Sopenharmony_ciTransactional Memory support
38c2ecf20Sopenharmony_ci============================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciPOWER kernel support for this feature is currently limited to supporting
68c2ecf20Sopenharmony_ciits use by user programs.  It is not currently used by the kernel itself.
78c2ecf20Sopenharmony_ci
88c2ecf20Sopenharmony_ciThis file aims to sum up how it is supported by Linux and what behaviour you
98c2ecf20Sopenharmony_cican expect from your user programs.
108c2ecf20Sopenharmony_ci
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciBasic overview
138c2ecf20Sopenharmony_ci==============
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciHardware Transactional Memory is supported on POWER8 processors, and is a
168c2ecf20Sopenharmony_cifeature that enables a different form of atomic memory access.  Several new
178c2ecf20Sopenharmony_ciinstructions are presented to delimit transactions; transactions are
188c2ecf20Sopenharmony_ciguaranteed to either complete atomically or roll back and undo any partial
198c2ecf20Sopenharmony_cichanges.
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ciA simple transaction looks like this::
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ci  begin_move_money:
248c2ecf20Sopenharmony_ci    tbegin
258c2ecf20Sopenharmony_ci    beq   abort_handler
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_ci    ld    r4, SAVINGS_ACCT(r3)
288c2ecf20Sopenharmony_ci    ld    r5, CURRENT_ACCT(r3)
298c2ecf20Sopenharmony_ci    subi  r5, r5, 1
308c2ecf20Sopenharmony_ci    addi  r4, r4, 1
318c2ecf20Sopenharmony_ci    std   r4, SAVINGS_ACCT(r3)
328c2ecf20Sopenharmony_ci    std   r5, CURRENT_ACCT(r3)
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ci    tend
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ci    b     continue
378c2ecf20Sopenharmony_ci
388c2ecf20Sopenharmony_ci  abort_handler:
398c2ecf20Sopenharmony_ci    ... test for odd failures ...
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ci    /* Retry the transaction if it failed because it conflicted with
428c2ecf20Sopenharmony_ci     * someone else: */
438c2ecf20Sopenharmony_ci    b     begin_move_money
448c2ecf20Sopenharmony_ci
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_ciThe 'tbegin' instruction denotes the start point, and 'tend' the end point.
478c2ecf20Sopenharmony_ciBetween these points the processor is in 'Transactional' state; any memory
488c2ecf20Sopenharmony_cireferences will complete in one go if there are no conflicts with other
498c2ecf20Sopenharmony_citransactional or non-transactional accesses within the system.  In this
508c2ecf20Sopenharmony_ciexample, the transaction completes as though it were normal straight-line code
518c2ecf20Sopenharmony_ciIF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an
528c2ecf20Sopenharmony_ciatomic move of money from the current account to the savings account has been
538c2ecf20Sopenharmony_ciperformed.  Even though the normal ld/std instructions are used (note no
548c2ecf20Sopenharmony_cilwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be
558c2ecf20Sopenharmony_ciupdated, or neither will be updated.
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ciIf, in the meantime, there is a conflict with the locations accessed by the
588c2ecf20Sopenharmony_citransaction, the transaction will be aborted by the CPU.  Register and memory
598c2ecf20Sopenharmony_cistate will roll back to that at the 'tbegin', and control will continue from
608c2ecf20Sopenharmony_ci'tbegin+4'.  The branch to abort_handler will be taken this second time; the
618c2ecf20Sopenharmony_ciabort handler can check the cause of the failure, and retry.
628c2ecf20Sopenharmony_ci
638c2ecf20Sopenharmony_ciCheckpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR
648c2ecf20Sopenharmony_ciand a few other status/flag regs; see the ISA for details.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_ciCauses of transaction aborts
678c2ecf20Sopenharmony_ci============================
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_ci- Conflicts with cache lines used by other processors
708c2ecf20Sopenharmony_ci- Signals
718c2ecf20Sopenharmony_ci- Context switches
728c2ecf20Sopenharmony_ci- See the ISA for full documentation of everything that will abort transactions.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ci
758c2ecf20Sopenharmony_ciSyscalls
768c2ecf20Sopenharmony_ci========
778c2ecf20Sopenharmony_ci
788c2ecf20Sopenharmony_ciSyscalls made from within an active transaction will not be performed and the
798c2ecf20Sopenharmony_citransaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
808c2ecf20Sopenharmony_ci| TM_CAUSE_PERSISTENT.
818c2ecf20Sopenharmony_ci
828c2ecf20Sopenharmony_ciSyscalls made from within a suspended transaction are performed as normal and
838c2ecf20Sopenharmony_cithe transaction is not explicitly doomed by the kernel.  However, what the
848c2ecf20Sopenharmony_cikernel does to perform the syscall may result in the transaction being doomed
858c2ecf20Sopenharmony_ciby the hardware.  The syscall is performed in suspended mode so any side
868c2ecf20Sopenharmony_cieffects will be persistent, independent of transaction success or failure.  No
878c2ecf20Sopenharmony_ciguarantees are provided by the kernel about which syscalls will affect
888c2ecf20Sopenharmony_citransaction success.
898c2ecf20Sopenharmony_ci
908c2ecf20Sopenharmony_ciCare must be taken when relying on syscalls to abort during active transactions
918c2ecf20Sopenharmony_ciif the calls are made via a library.  Libraries may cache values (which may
928c2ecf20Sopenharmony_cigive the appearance of success) or perform operations that cause transaction
938c2ecf20Sopenharmony_cifailure before entering the kernel (which may produce different failure codes).
948c2ecf20Sopenharmony_ciExamples are glibc's getpid() and lazy symbol resolution.
958c2ecf20Sopenharmony_ci
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_ciSignals
988c2ecf20Sopenharmony_ci=======
998c2ecf20Sopenharmony_ci
1008c2ecf20Sopenharmony_ciDelivery of signals (both sync and async) during transactions provides a second
1018c2ecf20Sopenharmony_cithread state (ucontext/mcontext) to represent the second transactional register
1028c2ecf20Sopenharmony_cistate.  Signal delivery 'treclaim's to capture both register states, so signals
1038c2ecf20Sopenharmony_ciabort transactions.  The usual ucontext_t passed to the signal handler
1048c2ecf20Sopenharmony_cirepresents the checkpointed/original register state; the signal appears to have
1058c2ecf20Sopenharmony_ciarisen at 'tbegin+4'.
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciIf the sighandler ucontext has uc_link set, a second ucontext has been
1088c2ecf20Sopenharmony_cidelivered.  For future compatibility the MSR.TS field should be checked to
1098c2ecf20Sopenharmony_cidetermine the transactional state -- if so, the second ucontext in uc->uc_link
1108c2ecf20Sopenharmony_cirepresents the active transactional registers at the point of the signal.
1118c2ecf20Sopenharmony_ci
1128c2ecf20Sopenharmony_ciFor 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS
1138c2ecf20Sopenharmony_cifield shows the transactional mode.
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ciFor 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32
1168c2ecf20Sopenharmony_cibits are stored in the MSR of the second ucontext, i.e. in
1178c2ecf20Sopenharmony_ciuc->uc_link->uc_mcontext.regs->msr.  The top word contains the transactional
1188c2ecf20Sopenharmony_cistate TS.
1198c2ecf20Sopenharmony_ci
1208c2ecf20Sopenharmony_ciHowever, basic signal handlers don't need to be aware of transactions
1218c2ecf20Sopenharmony_ciand simply returning from the handler will deal with things correctly:
1228c2ecf20Sopenharmony_ci
1238c2ecf20Sopenharmony_ciTransaction-aware signal handlers can read the transactional register state
1248c2ecf20Sopenharmony_cifrom the second ucontext.  This will be necessary for crash handlers to
1258c2ecf20Sopenharmony_cidetermine, for example, the address of the instruction causing the SIGSEGV.
1268c2ecf20Sopenharmony_ci
1278c2ecf20Sopenharmony_ciExample signal handler::
1288c2ecf20Sopenharmony_ci
1298c2ecf20Sopenharmony_ci    void crash_handler(int sig, siginfo_t *si, void *uc)
1308c2ecf20Sopenharmony_ci    {
1318c2ecf20Sopenharmony_ci      ucontext_t *ucp = uc;
1328c2ecf20Sopenharmony_ci      ucontext_t *transactional_ucp = ucp->uc_link;
1338c2ecf20Sopenharmony_ci
1348c2ecf20Sopenharmony_ci      if (ucp_link) {
1358c2ecf20Sopenharmony_ci        u64 msr = ucp->uc_mcontext.regs->msr;
1368c2ecf20Sopenharmony_ci        /* May have transactional ucontext! */
1378c2ecf20Sopenharmony_ci  #ifndef __powerpc64__
1388c2ecf20Sopenharmony_ci        msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32;
1398c2ecf20Sopenharmony_ci  #endif
1408c2ecf20Sopenharmony_ci        if (MSR_TM_ACTIVE(msr)) {
1418c2ecf20Sopenharmony_ci           /* Yes, we crashed during a transaction.  Oops. */
1428c2ecf20Sopenharmony_ci   fprintf(stderr, "Transaction to be restarted at 0x%llx, but "
1438c2ecf20Sopenharmony_ci                           "crashy instruction was at 0x%llx\n",
1448c2ecf20Sopenharmony_ci                           ucp->uc_mcontext.regs->nip,
1458c2ecf20Sopenharmony_ci                           transactional_ucp->uc_mcontext.regs->nip);
1468c2ecf20Sopenharmony_ci        }
1478c2ecf20Sopenharmony_ci      }
1488c2ecf20Sopenharmony_ci
1498c2ecf20Sopenharmony_ci      fix_the_problem(ucp->dar);
1508c2ecf20Sopenharmony_ci    }
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ciWhen in an active transaction that takes a signal, we need to be careful with
1538c2ecf20Sopenharmony_cithe stack.  It's possible that the stack has moved back up after the tbegin.
1548c2ecf20Sopenharmony_ciThe obvious case here is when the tbegin is called inside a function that
1558c2ecf20Sopenharmony_cireturns before a tend.  In this case, the stack is part of the checkpointed
1568c2ecf20Sopenharmony_citransactional memory state.  If we write over this non transactionally or in
1578c2ecf20Sopenharmony_cisuspend, we are in trouble because if we get a tm abort, the program counter and
1588c2ecf20Sopenharmony_cistack pointer will be back at the tbegin but our in memory stack won't be valid
1598c2ecf20Sopenharmony_cianymore.
1608c2ecf20Sopenharmony_ci
1618c2ecf20Sopenharmony_ciTo avoid this, when taking a signal in an active transaction, we need to use
1628c2ecf20Sopenharmony_cithe stack pointer from the checkpointed state, rather than the speculated
1638c2ecf20Sopenharmony_cistate.  This ensures that the signal context (written tm suspended) will be
1648c2ecf20Sopenharmony_ciwritten below the stack required for the rollback.  The transaction is aborted
1658c2ecf20Sopenharmony_cibecause of the treclaim, so any memory written between the tbegin and the
1668c2ecf20Sopenharmony_cisignal will be rolled back anyway.
1678c2ecf20Sopenharmony_ci
1688c2ecf20Sopenharmony_ciFor signals taken in non-TM or suspended mode, we use the
1698c2ecf20Sopenharmony_cinormal/non-checkpointed stack pointer.
1708c2ecf20Sopenharmony_ci
1718c2ecf20Sopenharmony_ciAny transaction initiated inside a sighandler and suspended on return
1728c2ecf20Sopenharmony_cifrom the sighandler to the kernel will get reclaimed and discarded.
1738c2ecf20Sopenharmony_ci
1748c2ecf20Sopenharmony_ciFailure cause codes used by kernel
1758c2ecf20Sopenharmony_ci==================================
1768c2ecf20Sopenharmony_ci
1778c2ecf20Sopenharmony_ciThese are defined in <asm/reg.h>, and distinguish different reasons why the
1788c2ecf20Sopenharmony_cikernel aborted a transaction:
1798c2ecf20Sopenharmony_ci
1808c2ecf20Sopenharmony_ci ====================== ================================
1818c2ecf20Sopenharmony_ci TM_CAUSE_RESCHED       Thread was rescheduled.
1828c2ecf20Sopenharmony_ci TM_CAUSE_TLBI          Software TLB invalid.
1838c2ecf20Sopenharmony_ci TM_CAUSE_FAC_UNAV      FP/VEC/VSX unavailable trap.
1848c2ecf20Sopenharmony_ci TM_CAUSE_SYSCALL       Syscall from active transaction.
1858c2ecf20Sopenharmony_ci TM_CAUSE_SIGNAL        Signal delivered.
1868c2ecf20Sopenharmony_ci TM_CAUSE_MISC          Currently unused.
1878c2ecf20Sopenharmony_ci TM_CAUSE_ALIGNMENT     Alignment fault.
1888c2ecf20Sopenharmony_ci TM_CAUSE_EMULATE       Emulation that touched memory.
1898c2ecf20Sopenharmony_ci ====================== ================================
1908c2ecf20Sopenharmony_ci
1918c2ecf20Sopenharmony_ciThese can be checked by the user program's abort handler as TEXASR[0:7].  If
1928c2ecf20Sopenharmony_cibit 7 is set, it indicates that the error is consider persistent.  For example
1938c2ecf20Sopenharmony_cia TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.
1948c2ecf20Sopenharmony_ci
1958c2ecf20Sopenharmony_ciGDB
1968c2ecf20Sopenharmony_ci===
1978c2ecf20Sopenharmony_ci
1988c2ecf20Sopenharmony_ciGDB and ptrace are not currently TM-aware.  If one stops during a transaction,
1998c2ecf20Sopenharmony_ciit looks like the transaction has just started (the checkpointed state is
2008c2ecf20Sopenharmony_cipresented).  The transaction cannot then be continued and will take the failure
2018c2ecf20Sopenharmony_cihandler route.  Furthermore, the transactional 2nd register state will be
2028c2ecf20Sopenharmony_ciinaccessible.  GDB can currently be used on programs using TM, but not sensibly
2038c2ecf20Sopenharmony_ciin parts within transactions.
2048c2ecf20Sopenharmony_ci
2058c2ecf20Sopenharmony_ciPOWER9
2068c2ecf20Sopenharmony_ci======
2078c2ecf20Sopenharmony_ci
2088c2ecf20Sopenharmony_ciTM on POWER9 has issues with storing the complete register state. This
2098c2ecf20Sopenharmony_ciis described in this commit::
2108c2ecf20Sopenharmony_ci
2118c2ecf20Sopenharmony_ci    commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
2128c2ecf20Sopenharmony_ci    Author: Paul Mackerras <paulus@ozlabs.org>
2138c2ecf20Sopenharmony_ci    Date:   Wed Mar 21 21:32:01 2018 +1100
2148c2ecf20Sopenharmony_ci    KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
2158c2ecf20Sopenharmony_ci
2168c2ecf20Sopenharmony_ciTo account for this different POWER9 chips have TM enabled in
2178c2ecf20Sopenharmony_cidifferent ways.
2188c2ecf20Sopenharmony_ci
2198c2ecf20Sopenharmony_ciOn POWER9N DD2.01 and below, TM is disabled. ie
2208c2ecf20Sopenharmony_ciHWCAP2[PPC_FEATURE2_HTM] is not set.
2218c2ecf20Sopenharmony_ci
2228c2ecf20Sopenharmony_ciOn POWER9N DD2.1 TM is configured by firmware to always abort a
2238c2ecf20Sopenharmony_citransaction when tm suspend occurs. So tsuspend will cause a
2248c2ecf20Sopenharmony_citransaction to be aborted and rolled back. Kernel exceptions will also
2258c2ecf20Sopenharmony_cicause the transaction to be aborted and rolled back and the exception
2268c2ecf20Sopenharmony_ciwill not occur. If userspace constructs a sigcontext that enables TM
2278c2ecf20Sopenharmony_cisuspend, the sigcontext will be rejected by the kernel. This mode is
2288c2ecf20Sopenharmony_ciadvertised to users with HWCAP2[PPC_FEATURE2_HTM_NO_SUSPEND] set.
2298c2ecf20Sopenharmony_ciHWCAP2[PPC_FEATURE2_HTM] is not set in this mode.
2308c2ecf20Sopenharmony_ci
2318c2ecf20Sopenharmony_ciOn POWER9N DD2.2 and above, KVM and POWERVM emulate TM for guests (as
2328c2ecf20Sopenharmony_cidescribed in commit 4bb3c7a0208f), hence TM is enabled for guests
2338c2ecf20Sopenharmony_ciie. HWCAP2[PPC_FEATURE2_HTM] is set for guest userspace. Guests that
2348c2ecf20Sopenharmony_cimakes heavy use of TM suspend (tsuspend or kernel suspend) will result
2358c2ecf20Sopenharmony_ciin traps into the hypervisor and hence will suffer a performance
2368c2ecf20Sopenharmony_cidegradation. Host userspace has TM disabled
2378c2ecf20Sopenharmony_ciie. HWCAP2[PPC_FEATURE2_HTM] is not set. (although we make enable it
2388c2ecf20Sopenharmony_ciat some point in the future if we bring the emulation into host
2398c2ecf20Sopenharmony_ciuserspace context switching).
2408c2ecf20Sopenharmony_ci
2418c2ecf20Sopenharmony_ciPOWER9C DD1.2 and above are only available with POWERVM and hence
2428c2ecf20Sopenharmony_ciLinux only runs as a guest. On these systems TM is emulated like on
2438c2ecf20Sopenharmony_ciPOWER9N DD2.2.
2448c2ecf20Sopenharmony_ci
2458c2ecf20Sopenharmony_ciGuest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and
2468c2ecf20Sopenharmony_ciPOWER9C DD1.2. Since earlier POWER9 processors don't support TM
2478c2ecf20Sopenharmony_ciemulation, migration from POWER8 to POWER9 is not supported there.
2488c2ecf20Sopenharmony_ci
2498c2ecf20Sopenharmony_ciKernel implementation
2508c2ecf20Sopenharmony_ci=====================
2518c2ecf20Sopenharmony_ci
2528c2ecf20Sopenharmony_cih/rfid mtmsrd quirk
2538c2ecf20Sopenharmony_ci-------------------
2548c2ecf20Sopenharmony_ci
2558c2ecf20Sopenharmony_ciAs defined in the ISA, rfid has a quirk which is useful in early
2568c2ecf20Sopenharmony_ciexception handling. When in a userspace transaction and we enter the
2578c2ecf20Sopenharmony_cikernel via some exception, MSR will end up as TM=0 and TS=01 (ie. TM
2588c2ecf20Sopenharmony_cioff but TM suspended). Regularly the kernel will want change bits in
2598c2ecf20Sopenharmony_cithe MSR and will perform an rfid to do this. In this case rfid can
2608c2ecf20Sopenharmony_cihave SRR0 TM = 0 and TS = 00 (ie. TM off and non transaction) and the
2618c2ecf20Sopenharmony_ciresulting MSR will retain TM = 0 and TS=01 from before (ie. stay in
2628c2ecf20Sopenharmony_cisuspend). This is a quirk in the architecture as this would normally
2638c2ecf20Sopenharmony_cibe a transition from TS=01 to TS=00 (ie. suspend -> non transactional)
2648c2ecf20Sopenharmony_ciwhich is an illegal transition.
2658c2ecf20Sopenharmony_ci
2668c2ecf20Sopenharmony_ciThis quirk is described the architecture in the definition of rfid
2678c2ecf20Sopenharmony_ciwith these lines:
2688c2ecf20Sopenharmony_ci
2698c2ecf20Sopenharmony_ci  if (MSR 29:31 ¬ = 0b010 | SRR1 29:31 ¬ = 0b000) then
2708c2ecf20Sopenharmony_ci     MSR 29:31 <- SRR1 29:31
2718c2ecf20Sopenharmony_ci
2728c2ecf20Sopenharmony_cihrfid and mtmsrd have the same quirk.
2738c2ecf20Sopenharmony_ci
2748c2ecf20Sopenharmony_ciThe Linux kernel uses this quirk in it's early exception handling.
275