17db96d56Sopenharmony_ciThis document describes some caveats about the use of Valgrind with
27db96d56Sopenharmony_ciPython.  Valgrind is used periodically by Python developers to try
37db96d56Sopenharmony_cito ensure there are no memory leaks or invalid memory reads/writes.
47db96d56Sopenharmony_ci
57db96d56Sopenharmony_ciIf you want to enable valgrind support in Python, you will need to
67db96d56Sopenharmony_ciconfigure Python --with-valgrind option or an older option
77db96d56Sopenharmony_ci--without-pymalloc.
87db96d56Sopenharmony_ci
97db96d56Sopenharmony_ciUPDATE: Python 3.6 now supports PYTHONMALLOC=malloc environment variable which
107db96d56Sopenharmony_cican be used to force the usage of the malloc() allocator of the C library.
117db96d56Sopenharmony_ci
127db96d56Sopenharmony_ciIf you don't want to read about the details of using Valgrind, there
137db96d56Sopenharmony_ciare still two things you must do to suppress the warnings.  First,
147db96d56Sopenharmony_ciyou must use a suppressions file.  One is supplied in
157db96d56Sopenharmony_ciMisc/valgrind-python.supp.  Second, you must uncomment the lines in 
167db96d56Sopenharmony_ciMisc/valgrind-python.supp that suppress the warnings for PyObject_Free and
177db96d56Sopenharmony_ciPyObject_Realloc.
187db96d56Sopenharmony_ci
197db96d56Sopenharmony_ciIf you want to use Valgrind more effectively and catch even more
207db96d56Sopenharmony_cimemory leaks, you will need to configure python --without-pymalloc.
217db96d56Sopenharmony_ciPyMalloc allocates a few blocks in big chunks and most object
227db96d56Sopenharmony_ciallocations don't call malloc, they use chunks doled about by PyMalloc
237db96d56Sopenharmony_cifrom the big blocks.  This means Valgrind can't detect
247db96d56Sopenharmony_cimany allocations (and frees), except for those that are forwarded
257db96d56Sopenharmony_cito the system malloc.  Note: configuring python --without-pymalloc
267db96d56Sopenharmony_cimakes Python run much slower, especially when running under Valgrind.
277db96d56Sopenharmony_ciYou may need to run the tests in batches under Valgrind to keep
287db96d56Sopenharmony_cithe memory usage down to allow the tests to complete.  It seems to take
297db96d56Sopenharmony_ciabout 5 times longer to run --without-pymalloc.
307db96d56Sopenharmony_ci
317db96d56Sopenharmony_ciApr 15, 2006:
327db96d56Sopenharmony_ci  test_ctypes causes Valgrind 3.1.1 to fail (crash).
337db96d56Sopenharmony_ci  test_socket_ssl should be skipped when running valgrind.
347db96d56Sopenharmony_ci	The reason is that it purposely uses uninitialized memory.
357db96d56Sopenharmony_ci	This causes many spurious warnings, so it's easier to just skip it.
367db96d56Sopenharmony_ci
377db96d56Sopenharmony_ci
387db96d56Sopenharmony_ciDetails:
397db96d56Sopenharmony_ci--------
407db96d56Sopenharmony_ciPython uses its own small-object allocation scheme on top of malloc,
417db96d56Sopenharmony_cicalled PyMalloc.
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ciValgrind may show some unexpected results when PyMalloc is used.
447db96d56Sopenharmony_ciStarting with Python 2.3, PyMalloc is used by default.  You can disable
457db96d56Sopenharmony_ciPyMalloc when configuring python by adding the --without-pymalloc option.
467db96d56Sopenharmony_ciIf you disable PyMalloc, most of the information in this document and
477db96d56Sopenharmony_cithe supplied suppressions file will not be useful.  As discussed above,
487db96d56Sopenharmony_cidisabling PyMalloc can catch more problems.
497db96d56Sopenharmony_ci
507db96d56Sopenharmony_ciPyMalloc uses 256KB chunks of memory, so it can't detect anything
517db96d56Sopenharmony_ciwrong within these blocks.  For that reason, compiling Python
527db96d56Sopenharmony_ci--without-pymalloc usually increases the usefulness of other tools.
537db96d56Sopenharmony_ci
547db96d56Sopenharmony_ciIf you use valgrind on a default build of Python,  you will see
557db96d56Sopenharmony_cimany errors like:
567db96d56Sopenharmony_ci
577db96d56Sopenharmony_ci        ==6399== Use of uninitialised value of size 4
587db96d56Sopenharmony_ci        ==6399== at 0x4A9BDE7E: PyObject_Free (obmalloc.c:711)
597db96d56Sopenharmony_ci        ==6399== by 0x4A9B8198: dictresize (dictobject.c:477)
607db96d56Sopenharmony_ci
617db96d56Sopenharmony_ciThese are expected and not a problem.  Tim Peters explains
627db96d56Sopenharmony_cithe situation:
637db96d56Sopenharmony_ci
647db96d56Sopenharmony_ci        PyMalloc needs to know whether an arbitrary address is one
657db96d56Sopenharmony_ci	that's managed by it, or is managed by the system malloc.
667db96d56Sopenharmony_ci	The current scheme allows this to be determined in constant
677db96d56Sopenharmony_ci	time, regardless of how many memory areas are under pymalloc's
687db96d56Sopenharmony_ci	control.
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci        The memory pymalloc manages itself is in one or more "arenas",
717db96d56Sopenharmony_ci	each a large contiguous memory area obtained from malloc.
727db96d56Sopenharmony_ci	The base address of each arena is saved by pymalloc
737db96d56Sopenharmony_ci	in a vector.  Each arena is carved into "pools", and a field at
747db96d56Sopenharmony_ci	the start of each pool contains the index of that pool's arena's
757db96d56Sopenharmony_ci	base address in that vector.
767db96d56Sopenharmony_ci
777db96d56Sopenharmony_ci        Given an arbitrary address, pymalloc computes the pool base
787db96d56Sopenharmony_ci	address corresponding to it, then looks at "the index" stored
797db96d56Sopenharmony_ci	near there.  If the index read up is out of bounds for the
807db96d56Sopenharmony_ci	vector of arena base addresses pymalloc maintains, then
817db96d56Sopenharmony_ci	pymalloc knows for certain that this address is not under
827db96d56Sopenharmony_ci	pymalloc's control.  Otherwise the index is in bounds, and
837db96d56Sopenharmony_ci	pymalloc compares
847db96d56Sopenharmony_ci
857db96d56Sopenharmony_ci            the arena base address stored at that index in the vector
867db96d56Sopenharmony_ci
877db96d56Sopenharmony_ci        to
887db96d56Sopenharmony_ci
897db96d56Sopenharmony_ci            the arbitrary address pymalloc is investigating
907db96d56Sopenharmony_ci
917db96d56Sopenharmony_ci        pymalloc controls this arbitrary address if and only if it lies
927db96d56Sopenharmony_ci        in the arena the address's pool's index claims it lies in.
937db96d56Sopenharmony_ci
947db96d56Sopenharmony_ci        It doesn't matter whether the memory pymalloc reads up ("the
957db96d56Sopenharmony_ci	index") is initialized.  If it's not initialized, then
967db96d56Sopenharmony_ci	whatever trash gets read up will lead pymalloc to conclude
977db96d56Sopenharmony_ci	(correctly) that the address isn't controlled by it, either
987db96d56Sopenharmony_ci	because the index is out of bounds, or the index is in bounds
997db96d56Sopenharmony_ci	but the arena it represents doesn't contain the address.
1007db96d56Sopenharmony_ci
1017db96d56Sopenharmony_ci        This determination has to be made on every call to one of
1027db96d56Sopenharmony_ci	pymalloc's free/realloc entry points, so its speed is critical
1037db96d56Sopenharmony_ci	(Python allocates and frees dynamic memory at a ferocious rate
1047db96d56Sopenharmony_ci	-- everything in Python, from integers to "stack frames",
1057db96d56Sopenharmony_ci	lives in the heap).
106