162306a36Sopenharmony_ci================== 262306a36Sopenharmony_ciPartial Parity Log 362306a36Sopenharmony_ci================== 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciPartial Parity Log (PPL) is a feature available for RAID5 arrays. The issue 662306a36Sopenharmony_ciaddressed by PPL is that after a dirty shutdown, parity of a particular stripe 762306a36Sopenharmony_cimay become inconsistent with data on other member disks. If the array is also 862306a36Sopenharmony_ciin degraded state, there is no way to recalculate parity, because one of the 962306a36Sopenharmony_cidisks is missing. This can lead to silent data corruption when rebuilding the 1062306a36Sopenharmony_ciarray or using it is as degraded - data calculated from parity for array blocks 1162306a36Sopenharmony_cithat have not been touched by a write request during the unclean shutdown can 1262306a36Sopenharmony_cibe incorrect. Such condition is known as the RAID5 Write Hole. Because of 1362306a36Sopenharmony_cithis, md by default does not allow starting a dirty degraded array. 1462306a36Sopenharmony_ci 1562306a36Sopenharmony_ciPartial parity for a write operation is the XOR of stripe data chunks not 1662306a36Sopenharmony_cimodified by this write. It is just enough data needed for recovering from the 1762306a36Sopenharmony_ciwrite hole. XORing partial parity with the modified chunks produces parity for 1862306a36Sopenharmony_cithe stripe, consistent with its state before the write operation, regardless of 1962306a36Sopenharmony_ciwhich chunk writes have completed. If one of the not modified data disks of 2062306a36Sopenharmony_cithis stripe is missing, this updated parity can be used to recover its 2162306a36Sopenharmony_cicontents. PPL recovery is also performed when starting an array after an 2262306a36Sopenharmony_ciunclean shutdown and all disks are available, eliminating the need to resync 2362306a36Sopenharmony_cithe array. Because of this, using write-intent bitmap and PPL together is not 2462306a36Sopenharmony_cisupported. 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciWhen handling a write request PPL writes partial parity before new data and 2762306a36Sopenharmony_ciparity are dispatched to disks. PPL is a distributed log - it is stored on 2862306a36Sopenharmony_ciarray member drives in the metadata area, on the parity drive of a particular 2962306a36Sopenharmony_cistripe. It does not require a dedicated journaling drive. Write performance is 3062306a36Sopenharmony_cireduced by up to 30%-40% but it scales with the number of drives in the array 3162306a36Sopenharmony_ciand the journaling drive does not become a bottleneck or a single point of 3262306a36Sopenharmony_cifailure. 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciUnlike raid5-cache, the other solution in md for closing the write hole, PPL is 3562306a36Sopenharmony_cinot a true journal. It does not protect from losing in-flight data, only from 3662306a36Sopenharmony_cisilent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is 3762306a36Sopenharmony_ciperformed for this stripe (parity is not updated). So it is possible to have 3862306a36Sopenharmony_ciarbitrary data in the written part of a stripe if that disk is lost. In such 3962306a36Sopenharmony_cicase the behavior is the same as in plain raid5. 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ciPPL is available for md version-1 metadata and external (specifically IMSM) 4262306a36Sopenharmony_cimetadata arrays. It can be enabled using mdadm option --consistency-policy=ppl. 4362306a36Sopenharmony_ci 4462306a36Sopenharmony_ciThere is a limitation of maximum 64 disks in the array for PPL. It allows to 4562306a36Sopenharmony_cikeep data structures and implementation simple. RAID5 arrays with so many disks 4662306a36Sopenharmony_ciare not likely due to high risk of multiple disks failure. Such restriction 4762306a36Sopenharmony_cishould not be a real life limitation. 48