17db96d56Sopenharmony_ciDescription of the internal format of the line number table 27db96d56Sopenharmony_ci 37db96d56Sopenharmony_ciConceptually, the line number table consists of a sequence of triples: 47db96d56Sopenharmony_ci start-offset (inclusive), end-offset (exclusive), line-number. 57db96d56Sopenharmony_ci 67db96d56Sopenharmony_ciNote that not all byte codes have a line number so we need handle `None` for the line-number. 77db96d56Sopenharmony_ci 87db96d56Sopenharmony_ciHowever, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry. 97db96d56Sopenharmony_ci 107db96d56Sopenharmony_ciFirst, note that the end of one entry is the same as the start of the next, so we can overlap entries. 117db96d56Sopenharmony_ciSecond, we don't really need arbitrary access to the sequence, so we can store deltas. 127db96d56Sopenharmony_ci 137db96d56Sopenharmony_ciWe just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero. 147db96d56Sopenharmony_ci 157db96d56Sopenharmony_ciThird, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line. 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ciConsider the following table 187db96d56Sopenharmony_ci Start End Line 197db96d56Sopenharmony_ci 0 6 1 207db96d56Sopenharmony_ci 6 50 2 217db96d56Sopenharmony_ci 50 350 7 227db96d56Sopenharmony_ci 350 360 No line number 237db96d56Sopenharmony_ci 360 376 8 247db96d56Sopenharmony_ci 376 380 208 257db96d56Sopenharmony_ci 267db96d56Sopenharmony_ciStripping the redundant ends gives: 277db96d56Sopenharmony_ci 287db96d56Sopenharmony_ci End-Start Line-delta 297db96d56Sopenharmony_ci 6 +1 307db96d56Sopenharmony_ci 44 +1 317db96d56Sopenharmony_ci 300 +5 327db96d56Sopenharmony_ci 10 No line number 337db96d56Sopenharmony_ci 16 +1 347db96d56Sopenharmony_ci 4 +200 357db96d56Sopenharmony_ci 367db96d56Sopenharmony_ci 377db96d56Sopenharmony_ciNote that the end - start value is always positive. 387db96d56Sopenharmony_ci 397db96d56Sopenharmony_ciFinally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254, 407db96d56Sopenharmony_ciand line deltas to the range -127 <= delta <= 127. 417db96d56Sopenharmony_ciA line delta of -128 is used to indicate no line number. 427db96d56Sopenharmony_ciAlso note that a delta of zero indicates that there are no bytecodes in the given range, 437db96d56Sopenharmony_ciwhich means we can use an invalid line number for that range. 447db96d56Sopenharmony_ci 457db96d56Sopenharmony_ciFinal form: 467db96d56Sopenharmony_ci 477db96d56Sopenharmony_ci Start delta Line delta 487db96d56Sopenharmony_ci 6 +1 497db96d56Sopenharmony_ci 44 +1 507db96d56Sopenharmony_ci 254 +5 517db96d56Sopenharmony_ci 46 0 527db96d56Sopenharmony_ci 10 -128 (No line number, treated as a delta of zero) 537db96d56Sopenharmony_ci 16 +1 547db96d56Sopenharmony_ci 0 +127 (line 135, but the range is empty as no bytecodes are at line 135) 557db96d56Sopenharmony_ci 4 +73 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ciIterating over the table. 587db96d56Sopenharmony_ci------------------------- 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ciFor the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries. 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ciThe code is as follows: 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_cidef co_lines(code): 657db96d56Sopenharmony_ci line = code.co_firstlineno 667db96d56Sopenharmony_ci end = 0 677db96d56Sopenharmony_ci table_iter = iter(code.internal_line_table): 687db96d56Sopenharmony_ci for sdelta, ldelta in table_iter: 697db96d56Sopenharmony_ci if ldelta == 0: # No change to line number, just accumulate changes to end 707db96d56Sopenharmony_ci end += sdelta 717db96d56Sopenharmony_ci continue 727db96d56Sopenharmony_ci start = end 737db96d56Sopenharmony_ci end = start + sdelta 747db96d56Sopenharmony_ci if ldelta == -128: # No valid line number -- skip entry 757db96d56Sopenharmony_ci continue 767db96d56Sopenharmony_ci line += ldelta 777db96d56Sopenharmony_ci if end == start: # Empty range, omit. 787db96d56Sopenharmony_ci continue 797db96d56Sopenharmony_ci yield start, end, line 807db96d56Sopenharmony_ci 817db96d56Sopenharmony_ci 827db96d56Sopenharmony_ci 837db96d56Sopenharmony_ci 847db96d56Sopenharmony_ciThe historical co_lnotab format 857db96d56Sopenharmony_ci------------------------------- 867db96d56Sopenharmony_ci 877db96d56Sopenharmony_ciprior to 3.10 code objects stored a field named co_lnotab. 887db96d56Sopenharmony_ciThis was an array of unsigned bytes disguised as a Python bytes object. 897db96d56Sopenharmony_ci 907db96d56Sopenharmony_ciThe old co_lnotab did not account for the presence of bytecodes without a line number, 917db96d56Sopenharmony_cinor was it well suited to tracing as a number of workarounds were required. 927db96d56Sopenharmony_ci 937db96d56Sopenharmony_ciThe old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format. 947db96d56Sopenharmony_ci 957db96d56Sopenharmony_ciBelow is the description of the old co_lnotab format: 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ci 987db96d56Sopenharmony_ciThe array is conceptually a compressed list of 997db96d56Sopenharmony_ci (bytecode offset increment, line number increment) 1007db96d56Sopenharmony_cipairs. The details are important and delicate, best illustrated by example: 1017db96d56Sopenharmony_ci 1027db96d56Sopenharmony_ci byte code offset source code line number 1037db96d56Sopenharmony_ci 0 1 1047db96d56Sopenharmony_ci 6 2 1057db96d56Sopenharmony_ci 50 7 1067db96d56Sopenharmony_ci 350 207 1077db96d56Sopenharmony_ci 361 208 1087db96d56Sopenharmony_ci 1097db96d56Sopenharmony_ciInstead of storing these numbers literally, we compress the list by storing only 1107db96d56Sopenharmony_cithe difference from one row to the next. Conceptually, the stored list might 1117db96d56Sopenharmony_cilook like: 1127db96d56Sopenharmony_ci 1137db96d56Sopenharmony_ci 0, 1, 6, 1, 44, 5, 300, 200, 11, 1 1147db96d56Sopenharmony_ci 1157db96d56Sopenharmony_ciThe above doesn't really work, but it's a start. An unsigned byte (byte code 1167db96d56Sopenharmony_cioffset) can't hold negative values, or values larger than 255, a signed byte 1177db96d56Sopenharmony_ci(line number) can't hold values larger than 127 or less than -128, and the 1187db96d56Sopenharmony_ciabove example contains two such values. (Note that before 3.6, line number 1197db96d56Sopenharmony_ciwas also encoded by an unsigned byte.) So we make two tweaks: 1207db96d56Sopenharmony_ci 1217db96d56Sopenharmony_ci (a) there's a deep assumption that byte code offsets increase monotonically, 1227db96d56Sopenharmony_ci and 1237db96d56Sopenharmony_ci (b) if byte code offset jumps by more than 255 from one row to the next, or if 1247db96d56Sopenharmony_ci source code line number jumps by more than 127 or less than -128 from one row 1257db96d56Sopenharmony_ci to the next, more than one pair is written to the table. In case #b, 1267db96d56Sopenharmony_ci there's no way to know from looking at the table later how many were written. 1277db96d56Sopenharmony_ci That's the delicate part. A user of co_lnotab desiring to find the source 1287db96d56Sopenharmony_ci line number corresponding to a bytecode address A should do something like 1297db96d56Sopenharmony_ci this: 1307db96d56Sopenharmony_ci 1317db96d56Sopenharmony_ci lineno = addr = 0 1327db96d56Sopenharmony_ci for addr_incr, line_incr in co_lnotab: 1337db96d56Sopenharmony_ci addr += addr_incr 1347db96d56Sopenharmony_ci if addr > A: 1357db96d56Sopenharmony_ci return lineno 1367db96d56Sopenharmony_ci if line_incr >= 0x80: 1377db96d56Sopenharmony_ci line_incr -= 0x100 1387db96d56Sopenharmony_ci lineno += line_incr 1397db96d56Sopenharmony_ci 1407db96d56Sopenharmony_ci(In C, this is implemented by PyCode_Addr2Line().) In order for this to work, 1417db96d56Sopenharmony_ciwhen the addr field increments by more than 255, the line # increment in each 1427db96d56Sopenharmony_cipair generated must be 0 until the remaining addr increment is < 256. So, in 1437db96d56Sopenharmony_cithe example above, assemble_lnotab in compile.c should not (as was actually done 1447db96d56Sopenharmony_ciuntil 2.2) expand 300, 200 to 1457db96d56Sopenharmony_ci 255, 255, 45, 45, 1467db96d56Sopenharmony_cibut to 1477db96d56Sopenharmony_ci 255, 0, 45, 127, 0, 73. 1487db96d56Sopenharmony_ci 1497db96d56Sopenharmony_ciThe above is sufficient to reconstruct line numbers for tracebacks, but not for 1507db96d56Sopenharmony_ciline tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c 1517db96d56Sopenharmony_ciand maybe_call_line_trace() in ceval.c. 1527db96d56Sopenharmony_ci 1537db96d56Sopenharmony_ci*** Tracing *** 1547db96d56Sopenharmony_ci 1557db96d56Sopenharmony_ciTo a first approximation, we want to call the tracing function when the line 1567db96d56Sopenharmony_cinumber of the current instruction changes. Re-computing the current line for 1577db96d56Sopenharmony_cievery instruction is a little slow, though, so each time we compute the line 1587db96d56Sopenharmony_cinumber we save the bytecode indices where it's valid: 1597db96d56Sopenharmony_ci 1607db96d56Sopenharmony_ci *instr_lb <= frame->f_lasti < *instr_ub 1617db96d56Sopenharmony_ci 1627db96d56Sopenharmony_ciis true so long as execution does not change lines. That is, *instr_lb holds 1637db96d56Sopenharmony_cithe first bytecode index of the current line, and *instr_ub holds the first 1647db96d56Sopenharmony_cibytecode index of the next line. As long as the above expression is true, 1657db96d56Sopenharmony_cimaybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note 1667db96d56Sopenharmony_cithat the same line may appear multiple times in the lnotab, either because the 1677db96d56Sopenharmony_cibytecode jumped more than 255 indices between line number changes or because 1687db96d56Sopenharmony_cithe compiler inserted the same line twice. Even in that case, *instr_ub holds 1697db96d56Sopenharmony_cithe first index of the next line. 1707db96d56Sopenharmony_ci 1717db96d56Sopenharmony_ciHowever, we don't *always* want to call the line trace function when the above 1727db96d56Sopenharmony_citest fails. 1737db96d56Sopenharmony_ci 1747db96d56Sopenharmony_ciConsider this code: 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci1: def f(a): 1777db96d56Sopenharmony_ci2: while a: 1787db96d56Sopenharmony_ci3: print(1) 1797db96d56Sopenharmony_ci4: break 1807db96d56Sopenharmony_ci5: else: 1817db96d56Sopenharmony_ci6: print(2) 1827db96d56Sopenharmony_ci 1837db96d56Sopenharmony_ciwhich compiles to this: 1847db96d56Sopenharmony_ci 1857db96d56Sopenharmony_ci 2 0 SETUP_LOOP 26 (to 28) 1867db96d56Sopenharmony_ci >> 2 LOAD_FAST 0 (a) 1877db96d56Sopenharmony_ci 4 POP_JUMP_IF_FALSE 18 1887db96d56Sopenharmony_ci 1897db96d56Sopenharmony_ci 3 6 LOAD_GLOBAL 0 (print) 1907db96d56Sopenharmony_ci 8 LOAD_CONST 1 (1) 1917db96d56Sopenharmony_ci 10 CALL_NO_KW 1 1927db96d56Sopenharmony_ci 12 POP_TOP 1937db96d56Sopenharmony_ci 1947db96d56Sopenharmony_ci 4 14 BREAK_LOOP 1957db96d56Sopenharmony_ci 16 JUMP_ABSOLUTE 2 1967db96d56Sopenharmony_ci >> 18 POP_BLOCK 1977db96d56Sopenharmony_ci 1987db96d56Sopenharmony_ci 6 20 LOAD_GLOBAL 0 (print) 1997db96d56Sopenharmony_ci 22 LOAD_CONST 2 (2) 2007db96d56Sopenharmony_ci 24 CALL_NO_KW 1 2017db96d56Sopenharmony_ci 26 POP_TOP 2027db96d56Sopenharmony_ci >> 28 LOAD_CONST 0 (None) 2037db96d56Sopenharmony_ci 30 RETURN_VALUE 2047db96d56Sopenharmony_ci 2057db96d56Sopenharmony_ciIf 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18 2067db96d56Sopenharmony_ciand the co_lnotab will claim that execution has moved to line 4, which is wrong. 2077db96d56Sopenharmony_ciIn this case, we could instead associate the POP_BLOCK with line 5, but that 2087db96d56Sopenharmony_ciwould break jumps around loops without else clauses. 2097db96d56Sopenharmony_ci 2107db96d56Sopenharmony_ciWe fix this by only calling the line trace function for a forward jump if the 2117db96d56Sopenharmony_cico_lnotab indicates we have jumped to the *start* of a line, i.e. if the current 2127db96d56Sopenharmony_ciinstruction offset matches the offset given for the start of a line by the 2137db96d56Sopenharmony_cico_lnotab. For backward jumps, however, we always call the line trace function, 2147db96d56Sopenharmony_ciwhich lets a debugger stop on every evaluation of a loop guard (which usually 2157db96d56Sopenharmony_ciwon't be the first opcode in a line). 2167db96d56Sopenharmony_ci 2177db96d56Sopenharmony_ciWhy do we set f_lineno when tracing, and only just before calling the trace 2187db96d56Sopenharmony_cifunction? Well, consider the code above when 'a' is true. If stepping through 2197db96d56Sopenharmony_cithis with 'n' in pdb, you would stop at line 1 with a "call" type event, then 2207db96d56Sopenharmony_ciline events on lines 2, 3, and 4, then a "return" type event -- but because the 2217db96d56Sopenharmony_cicode for the return actually falls in the range of the "line 6" opcodes, you 2227db96d56Sopenharmony_ciwould be shown line 6 during this event. This is a change from the behaviour in 2237db96d56Sopenharmony_ci2.2 and before, and I've found it confusing in practice. By setting and using 2247db96d56Sopenharmony_cif_lineno when tracing, one can report a line number different from that 2257db96d56Sopenharmony_cisuggested by f_lasti on this one occasion where it's desirable. 226