162306a36Sopenharmony_ci:Original: Documentation/mm/mmu_notifier.rst
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci:翻译:
462306a36Sopenharmony_ci
562306a36Sopenharmony_ci 司延腾 Yanteng Si <siyanteng@loongson.cn>
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci:校译:
862306a36Sopenharmony_ci
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ci什么时候需要页表锁内通知?
1262306a36Sopenharmony_ci==========================
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ci当清除一个pte/pmd时,我们可以选择通过在页表锁下(通知版的\*_clear_flush调用
1562306a36Sopenharmony_cimmu_notifier_invalidate_range)通知事件。但这种通知并不是在所有情况下都需要的。
1662306a36Sopenharmony_ci
1762306a36Sopenharmony_ci对于二级TLB(非CPU TLB),如IOMMU TLB或设备TLB(当设备使用类似ATS/PASID的东西让
1862306a36Sopenharmony_ciIOMMU走CPU页表来访问进程的虚拟地址空间)。只有两种情况需要在清除pte/pmd时在持有页
1962306a36Sopenharmony_ci表锁的同时通知这些二级TLB:
2062306a36Sopenharmony_ci
2162306a36Sopenharmony_ci  A) 在mmu_notifier_invalidate_range_end()之前,支持页的地址被释放。
2262306a36Sopenharmony_ci  B) 一个页表项被更新以指向一个新的页面(COW,零页上的写异常,__replace_page(),...)。
2362306a36Sopenharmony_ci
2462306a36Sopenharmony_ci情况A很明显,你不想冒风险让设备写到一个现在可能被一些完全不同的任务使用的页面。
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ci情况B更加微妙。为了正确起见,它需要按照以下序列发生:
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ci  - 上页表锁
2962306a36Sopenharmony_ci  - 清除页表项并通知 ([pmd/pte]p_huge_clear_flush_notify())
3062306a36Sopenharmony_ci  - 设置页表项以指向新页
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ci如果在设置新的pte/pmd值之前,清除页表项之后没有进行通知,那么你就会破坏设备的C11或
3362306a36Sopenharmony_ciC++11等内存模型。
3462306a36Sopenharmony_ci
3562306a36Sopenharmony_ci考虑以下情况(设备使用类似于ATS/PASID的功能)。
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ci两个地址addrA和addrB,这样|addrA - addrB| >= PAGE_SIZE,我们假设它们是COW的
3862306a36Sopenharmony_ci写保护(B的其他情况也适用)。
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ci::
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ci [Time N] --------------------------------------------------------------------
4362306a36Sopenharmony_ci CPU-thread-0  {尝试写到addrA}
4462306a36Sopenharmony_ci CPU-thread-1  {尝试写到addrB}
4562306a36Sopenharmony_ci CPU-thread-2  {}
4662306a36Sopenharmony_ci CPU-thread-3  {}
4762306a36Sopenharmony_ci DEV-thread-0  {读取addrA并填充设备TLB}
4862306a36Sopenharmony_ci DEV-thread-2  {读取addrB并填充设备TLB}
4962306a36Sopenharmony_ci [Time N+1] ------------------------------------------------------------------
5062306a36Sopenharmony_ci CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
5162306a36Sopenharmony_ci CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
5262306a36Sopenharmony_ci CPU-thread-2  {}
5362306a36Sopenharmony_ci CPU-thread-3  {}
5462306a36Sopenharmony_ci DEV-thread-0  {}
5562306a36Sopenharmony_ci DEV-thread-2  {}
5662306a36Sopenharmony_ci [Time N+2] ------------------------------------------------------------------
5762306a36Sopenharmony_ci CPU-thread-0  {COW_step1: {更新页表以指向addrA的新页}}
5862306a36Sopenharmony_ci CPU-thread-1  {COW_step1: {更新页表以指向addrB的新页}}
5962306a36Sopenharmony_ci CPU-thread-2  {}
6062306a36Sopenharmony_ci CPU-thread-3  {}
6162306a36Sopenharmony_ci DEV-thread-0  {}
6262306a36Sopenharmony_ci DEV-thread-2  {}
6362306a36Sopenharmony_ci [Time N+3] ------------------------------------------------------------------
6462306a36Sopenharmony_ci CPU-thread-0  {preempted}
6562306a36Sopenharmony_ci CPU-thread-1  {preempted}
6662306a36Sopenharmony_ci CPU-thread-2  {写入addrA,这是对新页面的写入}
6762306a36Sopenharmony_ci CPU-thread-3  {}
6862306a36Sopenharmony_ci DEV-thread-0  {}
6962306a36Sopenharmony_ci DEV-thread-2  {}
7062306a36Sopenharmony_ci [Time N+3] ------------------------------------------------------------------
7162306a36Sopenharmony_ci CPU-thread-0  {preempted}
7262306a36Sopenharmony_ci CPU-thread-1  {preempted}
7362306a36Sopenharmony_ci CPU-thread-2  {}
7462306a36Sopenharmony_ci CPU-thread-3  {写入addrB,这是一个写入新页的过程}
7562306a36Sopenharmony_ci DEV-thread-0  {}
7662306a36Sopenharmony_ci DEV-thread-2  {}
7762306a36Sopenharmony_ci [Time N+4] ------------------------------------------------------------------
7862306a36Sopenharmony_ci CPU-thread-0  {preempted}
7962306a36Sopenharmony_ci CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
8062306a36Sopenharmony_ci CPU-thread-2  {}
8162306a36Sopenharmony_ci CPU-thread-3  {}
8262306a36Sopenharmony_ci DEV-thread-0  {}
8362306a36Sopenharmony_ci DEV-thread-2  {}
8462306a36Sopenharmony_ci [Time N+5] ------------------------------------------------------------------
8562306a36Sopenharmony_ci CPU-thread-0  {preempted}
8662306a36Sopenharmony_ci CPU-thread-1  {}
8762306a36Sopenharmony_ci CPU-thread-2  {}
8862306a36Sopenharmony_ci CPU-thread-3  {}
8962306a36Sopenharmony_ci DEV-thread-0  {从旧页中读取addrA}
9062306a36Sopenharmony_ci DEV-thread-2  {从新页面读取addrB}
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ci所以在这里,因为在N+2的时候,清空页表项没有和通知一起作废二级TLB,设备在看到addrA的新值之前
9362306a36Sopenharmony_ci就看到了addrB的新值。这就破坏了设备的总内存序。
9462306a36Sopenharmony_ci
9562306a36Sopenharmony_ci当改变一个pte的写保护或指向一个新的具有相同内容的写保护页(KSM)时,将mmu_notifier_invalidate_range
9662306a36Sopenharmony_ci调用延迟到页表锁外的mmu_notifier_invalidate_range_end()是可以的。即使做页表更新的线程
9762306a36Sopenharmony_ci在释放页表锁后但在调用mmu_notifier_invalidate_range_end()前被抢占,也是如此。
98