17db96d56Sopenharmony_ciDescription of the internal format of the line number table
27db96d56Sopenharmony_ci
37db96d56Sopenharmony_ciConceptually, the line number table consists of a sequence of triples:
47db96d56Sopenharmony_ci    start-offset (inclusive), end-offset (exclusive), line-number.
57db96d56Sopenharmony_ci
67db96d56Sopenharmony_ciNote that not all byte codes have a line number so we need handle `None` for the line-number.
77db96d56Sopenharmony_ci
87db96d56Sopenharmony_ciHowever, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry.
97db96d56Sopenharmony_ci
107db96d56Sopenharmony_ciFirst, note that the end of one entry is the same as the start of the next, so we can overlap entries.
117db96d56Sopenharmony_ciSecond, we don't really need arbitrary access to the sequence, so we can store deltas.
127db96d56Sopenharmony_ci
137db96d56Sopenharmony_ciWe just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero.
147db96d56Sopenharmony_ci
157db96d56Sopenharmony_ciThird, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line.
167db96d56Sopenharmony_ci
177db96d56Sopenharmony_ciConsider the following table
187db96d56Sopenharmony_ci     Start    End     Line
197db96d56Sopenharmony_ci      0       6       1
207db96d56Sopenharmony_ci      6       50      2
217db96d56Sopenharmony_ci      50      350     7
227db96d56Sopenharmony_ci      350     360     No line number
237db96d56Sopenharmony_ci      360     376     8
247db96d56Sopenharmony_ci      376     380     208
257db96d56Sopenharmony_ci
267db96d56Sopenharmony_ciStripping the redundant ends gives:
277db96d56Sopenharmony_ci
287db96d56Sopenharmony_ci   End-Start  Line-delta
297db96d56Sopenharmony_ci      6         +1
307db96d56Sopenharmony_ci      44        +1
317db96d56Sopenharmony_ci      300       +5
327db96d56Sopenharmony_ci      10        No line number
337db96d56Sopenharmony_ci      16        +1
347db96d56Sopenharmony_ci      4         +200
357db96d56Sopenharmony_ci
367db96d56Sopenharmony_ci
377db96d56Sopenharmony_ciNote that the end - start value is always positive.
387db96d56Sopenharmony_ci
397db96d56Sopenharmony_ciFinally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254,
407db96d56Sopenharmony_ciand line deltas to the range -127  <= delta <= 127.
417db96d56Sopenharmony_ciA line delta of -128 is used to indicate no line number.
427db96d56Sopenharmony_ciAlso note that a delta of zero indicates that there are no bytecodes in the given range,
437db96d56Sopenharmony_ciwhich means we can use an invalid line number for that range.
447db96d56Sopenharmony_ci
457db96d56Sopenharmony_ciFinal form:
467db96d56Sopenharmony_ci
477db96d56Sopenharmony_ci   Start delta   Line delta
487db96d56Sopenharmony_ci    6               +1
497db96d56Sopenharmony_ci    44              +1
507db96d56Sopenharmony_ci    254             +5
517db96d56Sopenharmony_ci    46              0
527db96d56Sopenharmony_ci    10              -128 (No line number, treated as a delta of zero)
537db96d56Sopenharmony_ci    16              +1
547db96d56Sopenharmony_ci    0               +127 (line 135, but the range is empty as no bytecodes are at line 135)
557db96d56Sopenharmony_ci    4               +73
567db96d56Sopenharmony_ci
577db96d56Sopenharmony_ciIterating over the table.
587db96d56Sopenharmony_ci-------------------------
597db96d56Sopenharmony_ci
607db96d56Sopenharmony_ciFor the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries.
617db96d56Sopenharmony_ci
627db96d56Sopenharmony_ciThe code is as follows:
637db96d56Sopenharmony_ci
647db96d56Sopenharmony_cidef co_lines(code):
657db96d56Sopenharmony_ci    line = code.co_firstlineno
667db96d56Sopenharmony_ci    end = 0
677db96d56Sopenharmony_ci    table_iter = iter(code.internal_line_table):
687db96d56Sopenharmony_ci    for sdelta, ldelta in table_iter:
697db96d56Sopenharmony_ci        if ldelta == 0: # No change to line number, just accumulate changes to end
707db96d56Sopenharmony_ci            end += sdelta
717db96d56Sopenharmony_ci            continue
727db96d56Sopenharmony_ci        start = end
737db96d56Sopenharmony_ci        end = start + sdelta
747db96d56Sopenharmony_ci        if ldelta == -128: # No valid line number -- skip entry
757db96d56Sopenharmony_ci            continue
767db96d56Sopenharmony_ci        line += ldelta
777db96d56Sopenharmony_ci        if end == start: # Empty range, omit.
787db96d56Sopenharmony_ci            continue
797db96d56Sopenharmony_ci        yield start, end, line
807db96d56Sopenharmony_ci
817db96d56Sopenharmony_ci
827db96d56Sopenharmony_ci
837db96d56Sopenharmony_ci
847db96d56Sopenharmony_ciThe historical co_lnotab format
857db96d56Sopenharmony_ci-------------------------------
867db96d56Sopenharmony_ci
877db96d56Sopenharmony_ciprior to 3.10 code objects stored a field named co_lnotab.
887db96d56Sopenharmony_ciThis was an array of unsigned bytes disguised as a Python bytes object.
897db96d56Sopenharmony_ci
907db96d56Sopenharmony_ciThe old co_lnotab did not account for the presence of bytecodes without a line number,
917db96d56Sopenharmony_cinor was it well suited to tracing as a number of workarounds were required.
927db96d56Sopenharmony_ci
937db96d56Sopenharmony_ciThe old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format.
947db96d56Sopenharmony_ci
957db96d56Sopenharmony_ciBelow is the description of the old co_lnotab format:
967db96d56Sopenharmony_ci
977db96d56Sopenharmony_ci
987db96d56Sopenharmony_ciThe array is conceptually a compressed list of
997db96d56Sopenharmony_ci    (bytecode offset increment, line number increment)
1007db96d56Sopenharmony_cipairs.  The details are important and delicate, best illustrated by example:
1017db96d56Sopenharmony_ci
1027db96d56Sopenharmony_ci    byte code offset    source code line number
1037db96d56Sopenharmony_ci        0                   1
1047db96d56Sopenharmony_ci        6                   2
1057db96d56Sopenharmony_ci       50                   7
1067db96d56Sopenharmony_ci      350                 207
1077db96d56Sopenharmony_ci      361                 208
1087db96d56Sopenharmony_ci
1097db96d56Sopenharmony_ciInstead of storing these numbers literally, we compress the list by storing only
1107db96d56Sopenharmony_cithe difference from one row to the next.  Conceptually, the stored list might
1117db96d56Sopenharmony_cilook like:
1127db96d56Sopenharmony_ci
1137db96d56Sopenharmony_ci    0, 1,  6, 1,  44, 5,  300, 200,  11, 1
1147db96d56Sopenharmony_ci
1157db96d56Sopenharmony_ciThe above doesn't really work, but it's a start. An unsigned byte (byte code
1167db96d56Sopenharmony_cioffset) can't hold negative values, or values larger than 255, a signed byte
1177db96d56Sopenharmony_ci(line number) can't hold values larger than 127 or less than -128, and the
1187db96d56Sopenharmony_ciabove example contains two such values.  (Note that before 3.6, line number
1197db96d56Sopenharmony_ciwas also encoded by an unsigned byte.)  So we make two tweaks:
1207db96d56Sopenharmony_ci
1217db96d56Sopenharmony_ci (a) there's a deep assumption that byte code offsets increase monotonically,
1227db96d56Sopenharmony_ci and
1237db96d56Sopenharmony_ci (b) if byte code offset jumps by more than 255 from one row to the next, or if
1247db96d56Sopenharmony_ci source code line number jumps by more than 127 or less than -128 from one row
1257db96d56Sopenharmony_ci to the next, more than one pair is written to the table. In case #b,
1267db96d56Sopenharmony_ci there's no way to know from looking at the table later how many were written.
1277db96d56Sopenharmony_ci That's the delicate part.  A user of co_lnotab desiring to find the source
1287db96d56Sopenharmony_ci line number corresponding to a bytecode address A should do something like
1297db96d56Sopenharmony_ci this:
1307db96d56Sopenharmony_ci
1317db96d56Sopenharmony_ci    lineno = addr = 0
1327db96d56Sopenharmony_ci    for addr_incr, line_incr in co_lnotab:
1337db96d56Sopenharmony_ci        addr += addr_incr
1347db96d56Sopenharmony_ci        if addr > A:
1357db96d56Sopenharmony_ci            return lineno
1367db96d56Sopenharmony_ci        if line_incr >= 0x80:
1377db96d56Sopenharmony_ci            line_incr -= 0x100
1387db96d56Sopenharmony_ci        lineno += line_incr
1397db96d56Sopenharmony_ci
1407db96d56Sopenharmony_ci(In C, this is implemented by PyCode_Addr2Line().)  In order for this to work,
1417db96d56Sopenharmony_ciwhen the addr field increments by more than 255, the line # increment in each
1427db96d56Sopenharmony_cipair generated must be 0 until the remaining addr increment is < 256.  So, in
1437db96d56Sopenharmony_cithe example above, assemble_lnotab in compile.c should not (as was actually done
1447db96d56Sopenharmony_ciuntil 2.2) expand 300, 200 to
1457db96d56Sopenharmony_ci    255, 255, 45, 45,
1467db96d56Sopenharmony_cibut to
1477db96d56Sopenharmony_ci    255, 0, 45, 127, 0, 73.
1487db96d56Sopenharmony_ci
1497db96d56Sopenharmony_ciThe above is sufficient to reconstruct line numbers for tracebacks, but not for
1507db96d56Sopenharmony_ciline tracing.  Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
1517db96d56Sopenharmony_ciand maybe_call_line_trace() in ceval.c.
1527db96d56Sopenharmony_ci
1537db96d56Sopenharmony_ci*** Tracing ***
1547db96d56Sopenharmony_ci
1557db96d56Sopenharmony_ciTo a first approximation, we want to call the tracing function when the line
1567db96d56Sopenharmony_cinumber of the current instruction changes.  Re-computing the current line for
1577db96d56Sopenharmony_cievery instruction is a little slow, though, so each time we compute the line
1587db96d56Sopenharmony_cinumber we save the bytecode indices where it's valid:
1597db96d56Sopenharmony_ci
1607db96d56Sopenharmony_ci     *instr_lb <= frame->f_lasti < *instr_ub
1617db96d56Sopenharmony_ci
1627db96d56Sopenharmony_ciis true so long as execution does not change lines.  That is, *instr_lb holds
1637db96d56Sopenharmony_cithe first bytecode index of the current line, and *instr_ub holds the first
1647db96d56Sopenharmony_cibytecode index of the next line.  As long as the above expression is true,
1657db96d56Sopenharmony_cimaybe_call_line_trace() does not need to call PyCode_CheckLineNumber().  Note
1667db96d56Sopenharmony_cithat the same line may appear multiple times in the lnotab, either because the
1677db96d56Sopenharmony_cibytecode jumped more than 255 indices between line number changes or because
1687db96d56Sopenharmony_cithe compiler inserted the same line twice.  Even in that case, *instr_ub holds
1697db96d56Sopenharmony_cithe first index of the next line.
1707db96d56Sopenharmony_ci
1717db96d56Sopenharmony_ciHowever, we don't *always* want to call the line trace function when the above
1727db96d56Sopenharmony_citest fails.
1737db96d56Sopenharmony_ci
1747db96d56Sopenharmony_ciConsider this code:
1757db96d56Sopenharmony_ci
1767db96d56Sopenharmony_ci1: def f(a):
1777db96d56Sopenharmony_ci2:    while a:
1787db96d56Sopenharmony_ci3:       print(1)
1797db96d56Sopenharmony_ci4:       break
1807db96d56Sopenharmony_ci5:    else:
1817db96d56Sopenharmony_ci6:       print(2)
1827db96d56Sopenharmony_ci
1837db96d56Sopenharmony_ciwhich compiles to this:
1847db96d56Sopenharmony_ci
1857db96d56Sopenharmony_ci  2           0 SETUP_LOOP              26 (to 28)
1867db96d56Sopenharmony_ci        >>    2 LOAD_FAST                0 (a)
1877db96d56Sopenharmony_ci              4 POP_JUMP_IF_FALSE       18
1887db96d56Sopenharmony_ci
1897db96d56Sopenharmony_ci  3           6 LOAD_GLOBAL              0 (print)
1907db96d56Sopenharmony_ci              8 LOAD_CONST               1 (1)
1917db96d56Sopenharmony_ci             10 CALL_NO_KW               1
1927db96d56Sopenharmony_ci             12 POP_TOP
1937db96d56Sopenharmony_ci
1947db96d56Sopenharmony_ci  4          14 BREAK_LOOP
1957db96d56Sopenharmony_ci             16 JUMP_ABSOLUTE            2
1967db96d56Sopenharmony_ci        >>   18 POP_BLOCK
1977db96d56Sopenharmony_ci
1987db96d56Sopenharmony_ci  6          20 LOAD_GLOBAL              0 (print)
1997db96d56Sopenharmony_ci             22 LOAD_CONST               2 (2)
2007db96d56Sopenharmony_ci             24 CALL_NO_KW               1
2017db96d56Sopenharmony_ci             26 POP_TOP
2027db96d56Sopenharmony_ci        >>   28 LOAD_CONST               0 (None)
2037db96d56Sopenharmony_ci             30 RETURN_VALUE
2047db96d56Sopenharmony_ci
2057db96d56Sopenharmony_ciIf 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
2067db96d56Sopenharmony_ciand the co_lnotab will claim that execution has moved to line 4, which is wrong.
2077db96d56Sopenharmony_ciIn this case, we could instead associate the POP_BLOCK with line 5, but that
2087db96d56Sopenharmony_ciwould break jumps around loops without else clauses.
2097db96d56Sopenharmony_ci
2107db96d56Sopenharmony_ciWe fix this by only calling the line trace function for a forward jump if the
2117db96d56Sopenharmony_cico_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
2127db96d56Sopenharmony_ciinstruction offset matches the offset given for the start of a line by the
2137db96d56Sopenharmony_cico_lnotab.  For backward jumps, however, we always call the line trace function,
2147db96d56Sopenharmony_ciwhich lets a debugger stop on every evaluation of a loop guard (which usually
2157db96d56Sopenharmony_ciwon't be the first opcode in a line).
2167db96d56Sopenharmony_ci
2177db96d56Sopenharmony_ciWhy do we set f_lineno when tracing, and only just before calling the trace
2187db96d56Sopenharmony_cifunction?  Well, consider the code above when 'a' is true.  If stepping through
2197db96d56Sopenharmony_cithis with 'n' in pdb, you would stop at line 1 with a "call" type event, then
2207db96d56Sopenharmony_ciline events on lines 2, 3, and 4, then a "return" type event -- but because the
2217db96d56Sopenharmony_cicode for the return actually falls in the range of the "line 6" opcodes, you
2227db96d56Sopenharmony_ciwould be shown line 6 during this event.  This is a change from the behaviour in
2237db96d56Sopenharmony_ci2.2 and before, and I've found it confusing in practice.  By setting and using
2247db96d56Sopenharmony_cif_lineno when tracing, one can report a line number different from that
2257db96d56Sopenharmony_cisuggested by f_lasti on this one occasion where it's desirable.
226