18c2ecf20Sopenharmony_ci.. _array_rcu_doc:
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ciUsing RCU to Protect Read-Mostly Arrays
48c2ecf20Sopenharmony_ci=======================================
58c2ecf20Sopenharmony_ci
68c2ecf20Sopenharmony_ciAlthough RCU is more commonly used to protect linked lists, it can
78c2ecf20Sopenharmony_cialso be used to protect arrays.  Three situations are as follows:
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ci1.  :ref:`Hash Tables <hash_tables>`
108c2ecf20Sopenharmony_ci
118c2ecf20Sopenharmony_ci2.  :ref:`Static Arrays <static_arrays>`
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ci3.  :ref:`Resizable Arrays <resizable_arrays>`
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciEach of these three situations involves an RCU-protected pointer to an
168c2ecf20Sopenharmony_ciarray that is separately indexed.  It might be tempting to consider use
178c2ecf20Sopenharmony_ciof RCU to instead protect the index into an array, however, this use
188c2ecf20Sopenharmony_cicase is **not** supported.  The problem with RCU-protected indexes into
198c2ecf20Sopenharmony_ciarrays is that compilers can play way too many optimization games with
208c2ecf20Sopenharmony_ciintegers, which means that the rules governing handling of these indexes
218c2ecf20Sopenharmony_ciare far more trouble than they are worth.  If RCU-protected indexes into
228c2ecf20Sopenharmony_ciarrays prove to be particularly valuable (which they have not thus far),
238c2ecf20Sopenharmony_ciexplicit cooperation from the compiler will be required to permit them
248c2ecf20Sopenharmony_cito be safely used.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ciThat aside, each of the three RCU-protected pointer situations are
278c2ecf20Sopenharmony_cidescribed in the following sections.
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ci.. _hash_tables:
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ciSituation 1: Hash Tables
328c2ecf20Sopenharmony_ci------------------------
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciHash tables are often implemented as an array, where each array entry
358c2ecf20Sopenharmony_cihas a linked-list hash chain.  Each hash chain can be protected by RCU
368c2ecf20Sopenharmony_cias described in the listRCU.txt document.  This approach also applies
378c2ecf20Sopenharmony_cito other array-of-list situations, such as radix trees.
388c2ecf20Sopenharmony_ci
398c2ecf20Sopenharmony_ci.. _static_arrays:
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ciSituation 2: Static Arrays
428c2ecf20Sopenharmony_ci--------------------------
438c2ecf20Sopenharmony_ci
448c2ecf20Sopenharmony_ciStatic arrays, where the data (rather than a pointer to the data) is
458c2ecf20Sopenharmony_cilocated in each array element, and where the array is never resized,
468c2ecf20Sopenharmony_cihave not been used with RCU.  Rik van Riel recommends using seqlock in
478c2ecf20Sopenharmony_cithis situation, which would also have minimal read-side overhead as long
488c2ecf20Sopenharmony_cias updates are rare.
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_ciQuick Quiz:
518c2ecf20Sopenharmony_ci		Why is it so important that updates be rare when using seqlock?
528c2ecf20Sopenharmony_ci
538c2ecf20Sopenharmony_ci:ref:`Answer to Quick Quiz <answer_quick_quiz_seqlock>`
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ci.. _resizable_arrays:
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ciSituation 3: Resizable Arrays
588c2ecf20Sopenharmony_ci------------------------------
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_ciUse of RCU for resizable arrays is demonstrated by the grow_ary()
618c2ecf20Sopenharmony_cifunction formerly used by the System V IPC code.  The array is used
628c2ecf20Sopenharmony_cito map from semaphore, message-queue, and shared-memory IDs to the data
638c2ecf20Sopenharmony_cistructure that represents the corresponding IPC construct.  The grow_ary()
648c2ecf20Sopenharmony_cifunction does not acquire any locks; instead its caller must hold the
658c2ecf20Sopenharmony_ciids->sem semaphore.
668c2ecf20Sopenharmony_ci
678c2ecf20Sopenharmony_ciThe grow_ary() function, shown below, does some limit checks, allocates a
688c2ecf20Sopenharmony_cinew ipc_id_ary, copies the old to the new portion of the new, initializes
698c2ecf20Sopenharmony_cithe remainder of the new, updates the ids->entries pointer to point to
708c2ecf20Sopenharmony_cithe new array, and invokes ipc_rcu_putref() to free up the old array.
718c2ecf20Sopenharmony_ciNote that rcu_assign_pointer() is used to update the ids->entries pointer,
728c2ecf20Sopenharmony_ciwhich includes any memory barriers required on whatever architecture
738c2ecf20Sopenharmony_ciyou are running on::
748c2ecf20Sopenharmony_ci
758c2ecf20Sopenharmony_ci	static int grow_ary(struct ipc_ids* ids, int newsize)
768c2ecf20Sopenharmony_ci	{
778c2ecf20Sopenharmony_ci		struct ipc_id_ary* new;
788c2ecf20Sopenharmony_ci		struct ipc_id_ary* old;
798c2ecf20Sopenharmony_ci		int i;
808c2ecf20Sopenharmony_ci		int size = ids->entries->size;
818c2ecf20Sopenharmony_ci
828c2ecf20Sopenharmony_ci		if(newsize > IPCMNI)
838c2ecf20Sopenharmony_ci			newsize = IPCMNI;
848c2ecf20Sopenharmony_ci		if(newsize <= size)
858c2ecf20Sopenharmony_ci			return newsize;
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ci		new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm *)*newsize +
888c2ecf20Sopenharmony_ci				    sizeof(struct ipc_id_ary));
898c2ecf20Sopenharmony_ci		if(new == NULL)
908c2ecf20Sopenharmony_ci			return size;
918c2ecf20Sopenharmony_ci		new->size = newsize;
928c2ecf20Sopenharmony_ci		memcpy(new->p, ids->entries->p,
938c2ecf20Sopenharmony_ci		       sizeof(struct kern_ipc_perm *)*size +
948c2ecf20Sopenharmony_ci		       sizeof(struct ipc_id_ary));
958c2ecf20Sopenharmony_ci		for(i=size;i<newsize;i++) {
968c2ecf20Sopenharmony_ci			new->p[i] = NULL;
978c2ecf20Sopenharmony_ci		}
988c2ecf20Sopenharmony_ci		old = ids->entries;
998c2ecf20Sopenharmony_ci
1008c2ecf20Sopenharmony_ci		/*
1018c2ecf20Sopenharmony_ci		 * Use rcu_assign_pointer() to make sure the memcpyed
1028c2ecf20Sopenharmony_ci		 * contents of the new array are visible before the new
1038c2ecf20Sopenharmony_ci		 * array becomes visible.
1048c2ecf20Sopenharmony_ci		 */
1058c2ecf20Sopenharmony_ci		rcu_assign_pointer(ids->entries, new);
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ci		ipc_rcu_putref(old);
1088c2ecf20Sopenharmony_ci		return newsize;
1098c2ecf20Sopenharmony_ci	}
1108c2ecf20Sopenharmony_ci
1118c2ecf20Sopenharmony_ciThe ipc_rcu_putref() function decrements the array's reference count
1128c2ecf20Sopenharmony_ciand then, if the reference count has dropped to zero, uses call_rcu()
1138c2ecf20Sopenharmony_cito free the array after a grace period has elapsed.
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ciThe array is traversed by the ipc_lock() function.  This function
1168c2ecf20Sopenharmony_ciindexes into the array under the protection of rcu_read_lock(),
1178c2ecf20Sopenharmony_ciusing rcu_dereference() to pick up the pointer to the array so
1188c2ecf20Sopenharmony_cithat it may later safely be dereferenced -- memory barriers are
1198c2ecf20Sopenharmony_cirequired on the Alpha CPU.  Since the size of the array is stored
1208c2ecf20Sopenharmony_ciwith the array itself, there can be no array-size mismatches, so
1218c2ecf20Sopenharmony_cia simple check suffices.  The pointer to the structure corresponding
1228c2ecf20Sopenharmony_cito the desired IPC object is placed in "out", with NULL indicating
1238c2ecf20Sopenharmony_cia non-existent entry.  After acquiring "out->lock", the "out->deleted"
1248c2ecf20Sopenharmony_ciflag indicates whether the IPC object is in the process of being
1258c2ecf20Sopenharmony_cideleted, and, if not, the pointer is returned::
1268c2ecf20Sopenharmony_ci
1278c2ecf20Sopenharmony_ci	struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
1288c2ecf20Sopenharmony_ci	{
1298c2ecf20Sopenharmony_ci		struct kern_ipc_perm* out;
1308c2ecf20Sopenharmony_ci		int lid = id % SEQ_MULTIPLIER;
1318c2ecf20Sopenharmony_ci		struct ipc_id_ary* entries;
1328c2ecf20Sopenharmony_ci
1338c2ecf20Sopenharmony_ci		rcu_read_lock();
1348c2ecf20Sopenharmony_ci		entries = rcu_dereference(ids->entries);
1358c2ecf20Sopenharmony_ci		if(lid >= entries->size) {
1368c2ecf20Sopenharmony_ci			rcu_read_unlock();
1378c2ecf20Sopenharmony_ci			return NULL;
1388c2ecf20Sopenharmony_ci		}
1398c2ecf20Sopenharmony_ci		out = entries->p[lid];
1408c2ecf20Sopenharmony_ci		if(out == NULL) {
1418c2ecf20Sopenharmony_ci			rcu_read_unlock();
1428c2ecf20Sopenharmony_ci			return NULL;
1438c2ecf20Sopenharmony_ci		}
1448c2ecf20Sopenharmony_ci		spin_lock(&out->lock);
1458c2ecf20Sopenharmony_ci
1468c2ecf20Sopenharmony_ci		/* ipc_rmid() may have already freed the ID while ipc_lock
1478c2ecf20Sopenharmony_ci		 * was spinning: here verify that the structure is still valid
1488c2ecf20Sopenharmony_ci		 */
1498c2ecf20Sopenharmony_ci		if (out->deleted) {
1508c2ecf20Sopenharmony_ci			spin_unlock(&out->lock);
1518c2ecf20Sopenharmony_ci			rcu_read_unlock();
1528c2ecf20Sopenharmony_ci			return NULL;
1538c2ecf20Sopenharmony_ci		}
1548c2ecf20Sopenharmony_ci		return out;
1558c2ecf20Sopenharmony_ci	}
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ci.. _answer_quick_quiz_seqlock:
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ciAnswer to Quick Quiz:
1608c2ecf20Sopenharmony_ci	Why is it so important that updates be rare when using seqlock?
1618c2ecf20Sopenharmony_ci
1628c2ecf20Sopenharmony_ci	The reason that it is important that updates be rare when
1638c2ecf20Sopenharmony_ci	using seqlock is that frequent updates can livelock readers.
1648c2ecf20Sopenharmony_ci	One way to avoid this problem is to assign a seqlock for
1658c2ecf20Sopenharmony_ci	each array entry rather than to the entire array.
166