1Name
2
3    AMD_performance_monitor
4    
5Name Strings
6
7    GL_AMD_performance_monitor
8    
9Contributors
10
11    Dan Ginsburg
12    Aaftab Munshi
13    Dave Oldcorn
14    Maurice Ribble
15    Jonathan Zarge
16
17Contact
18
19    Dan Ginsburg (dan.ginsburg 'at' amd.com)
20
21Status
22
23    ???
24
25Version
26
27    Last Modified Date: 11/29/2007
28
29Number
30
31    OpenGL Extension #360
32    OpenGL ES Extension #50
33
34Dependencies
35
36    None
37
38Overview
39
40    This extension enables the capture and reporting of performance monitors.
41    Performance monitors contain groups of counters which hold arbitrary counted 
42    data.  Typically, the counters hold information on performance-related
43    counters in the underlying hardware.  The extension is general enough to
44    allow the implementation to choose which counters to expose and pick the
45    data type and range of the counters.  The extension also allows counting to 
46    start and end on arbitrary boundaries during rendering.
47
48Issues
49
50    1.  Should this be an EGL or OpenGL/OpenGL ES extension?
51
52        Decision - Make this an OpenGL/OpenGL ES extension
53        
54        Reason - We would like to expose this extension in both OpenGL and 
55        OpenGL ES which makes EGL an unsuitable choice.  Further, support for 
56        EGL is not a requirement and there are platforms that support OpenGL ES 
57        but not EGL, making it difficult to make this an EGL extension.
58        
59    2.  Should the API support multipassing?
60    
61        Decision - No.
62        
63        Reason - Multipassing should really be left to the application to do.  
64        This makes the API unnecessarily complicated.  A major issue is that 
65        depending on which counters are to be sampled, the # of passes and which 
66        counters get selected in each pass can be difficult to determine.  It is 
67        much easier to give a list of counters categorized by groups with 
68        specific information on the number of counters that can be selected from 
69        each group.
70
71    3.  Should we define a 64-bit data type for UNSIGNED_INT64_AMD?
72
73        Decision - No.
74
75        Reason - While counters can be returned as 64-bit unsigned integers, the
76        data is passed back to the application inside of a void*.  Therefore,
77        there is no need in this extension to define a 64-bit data type (e.g.,
78        GLuint64).  It will be up the application to declare a native 64-bit
79        unsigned integer and cast the returned data to that type.
80
81
82New Procedures and Functions
83
84    void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 
85                                 uint *groups)
86    
87    void GetPerfMonitorCountersAMD(uint group, int *numCounters, 
88                                   int *maxActiveCounters, sizei countersSize, 
89                                   uint *counters)
90
91    void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, sizei *length, 
92                                      char *groupString)
93
94    void GetPerfMonitorCounterStringAMD(uint group, uint counter, sizei bufSize,
95                                        sizei *length, char *counterString)
96 
97    void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 
98                                      enum pname, void *data)
99    
100    void GenPerfMonitorsAMD(sizei n, uint *monitors)
101    
102    void DeletePerfMonitorsAMD(sizei n, uint *monitors)
103    
104    void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
105                                      uint group, int numCounters, 
106                                      uint *counterList)
107
108    void BeginPerfMonitorAMD(uint monitor)
109        
110    void EndPerfMonitorAMD(uint monitor)
111
112    void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, sizei dataSize, 
113                                      uint *data, int *bytesWritten)
114
115
116New Tokens
117
118    Accepted by the <pame> parameter of GetPerfMonitorCounterInfoAMD
119    
120        COUNTER_TYPE_AMD                           0x8BC0
121        COUNTER_RANGE_AMD                          0x8BC1
122        
123    Returned as a valid value in <data> parameter of
124    GetPerfMonitorCounterInfoAMD if <pname> = COUNTER_TYPE_AMD
125        
126        UNSIGNED_INT                               0x1405
127        FLOAT                                      0x1406
128        UNSIGNED_INT64_AMD                         0x8BC2
129        PERCENTAGE_AMD                             0x8BC3
130        
131    Accepted by the <pname> parameter of GetPerfMonitorCounterDataAMD
132        
133        PERFMON_RESULT_AVAILABLE_AMD               0x8BC4
134        PERFMON_RESULT_SIZE_AMD                    0x8BC5
135        PERFMON_RESULT_AMD                         0x8BC6
136
137Addition to the GL specification
138
139    Add a new section called Performance Monitoring
140    
141    A performance monitor consists of a number of hardware and software counters
142    that can be sampled by the GPU and reported back to the application.
143    Performance counters are organized as a single hierarchy where counters are
144    categorized into groups.  Each group has a list of counters that belong to
145    the counter and can be sampled, and a maximum number of counters that can be 
146    sampled.
147    
148    The command
149    
150        void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 
151                                     uint *groups);
152        
153    returns the number of available groups in <numGroups>, if <numGroups> is
154    not NULL.  If <groupsSize> is not 0 and <groups> is not NULL, then the list 
155    of available groups is returned.  The number of entries that will be 
156    returned in <groups> is determined by <groupsSize>.  If <groupsSize> is 0, 
157    no information is copied.  Each group is identified by a unique unsigned int 
158    identifier.
159    
160    The command
161    
162        void GetPerfMonitorCountersAMD(uint group, int *numCounters, 
163                                       int *maxActiveCounters, 
164                                       sizei countersSize, 
165                                       uint *counters);
166        
167    returns the following information.  For each group, it returns the number of 
168    available counters in <numCounters>, the max number of counters that can be
169    active at any time in <maxActiveCounters>, and the list of counters in 
170    <counters>.  The number of entries that can be returned in <counters> is
171    determined by <countersSize>.  If <countersSize> is 0, no information is
172    copied. Each counter in a group is identified by a unique unsigned int
173    identifier.  If <group> does not reference a valid group ID, an 
174    INVALID_VALUE error is generated.
175
176    
177    The command
178    
179        void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, 
180                                          sizei *length, char *groupString)
181
182        
183    returns the string that describes the group name identified by <group> in 
184    <groupString>.  The actual number of characters written to <groupString>,
185    excluding the null terminator, is returned in <length>.  If <length> is 
186    NULL, then no length is returned.  The maximum number of characters that
187    may be written into <groupString>, including the null terminator, is 
188    specified by <bufSize>.  If <bufSize> is 0 and <groupString> is NULL, the 
189    number of characters that would be required to hold the group string,
190    excluding the null terminator, is returned in <length>.  If <group> 
191    does not reference a valid group ID, an INVALID_VALUE error is generated.
192    
193    
194    The command
195    
196        void GetPerfMonitorCounterStringAMD(uint group, uint counter, 
197                                            sizei bufSize, sizei *length, 
198                                            char *counterString);
199
200    
201    returns the string that describes the counter name identified by <group> 
202    and <counter> in <counterString>.  The actual number of characters written 
203    to <counterString>, excluding the null terminator, is returned in <length>.  
204    If <length> is NULL, then no length is returned.  The maximum number of 
205    characters that may be written into <counterString>, including the null 
206    terminator, is specified by <bufSize>.  If <bufSize> is 0 and 
207    <counterString> is NULL, the number of characters that would be required to 
208    hold the counter string, excluding the null terminator, is returned in 
209    <length>.  If <group> does not reference a valid group ID, or <counter> 
210    does not reference a valid counter within the group ID, an INVALID_VALUE 
211    error is generated.
212       
213    The command
214    
215        void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 
216                                          enum pname, void *data);
217        
218    returns the following information about a counter.  For a <counter> 
219    belonging to <group>, we can query the counter type and counter range.  If 
220    <pname> is COUNTER_TYPE_AMD, then <data> returns the type.  Valid type
221    values returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT.
222    If type value returned is PERCENTAGE_AMD, then this describes a float
223    value that is in the range [0.0 .. 100.0].  If <pname> is COUNTER_RANGE_AMD,
224    <data> returns two values representing a minimum and a maximum. The 
225    counter's type is used to determine the format in which the range values 
226    are returned.  If <group> does not reference a valid group ID, or <counter> 
227    does not reference a valid counter within the group ID, an INVALID_VALUE 
228    error is generated.
229
230    
231    The command
232    
233        void GenPerfMonitorsAMD(sizei n, uint *monitors)
234        
235    returns a list of monitors.  These monitors can then be used to select 
236    groups/counters to be sampled, to start multiple monitoring sessions and to 
237    return counter information sampled by the GPU.  At creation time, the 
238    performance monitor object has all counters disabled.  The value of the
239    PERFMON_RESULT_AVAILABLE_AMD, PERFMON_RESULT_AMD, and 
240    PERFMON_RESULT_SIZE_AMD queries will all initially be 0.
241    
242    The command
243    
244        void DeletePerfMonitorsAMD(sizei n, uint *monitors)
245        
246    is used to delete the list of monitors created by a previous call to 
247    GenPerfMonitors.  If a monitor ID in the list <monitors> does not 
248    reference a previously generated performance monitor, an INVALID_VALUE
249    error is generated.
250    
251    The command 
252    
253        void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
254                                          uint group, int numCounters, 
255                                          uint *counterList);
256        
257    is used to enable or disable a list of counters from a group to be monitored 
258    as identified by <monitor>.  The <enable> argument determines whether the
259    counters should be enabled or disabled.  <group> specifies the group
260    ID under which counters will be enabled or disabled.  The <numCounters>
261    argument gives the number of counters to be selected from the list 
262    <counterList>.  If <monitor> is not a valid monitor created by 
263    GenPerfMonitorsAMD, then INVALID_VALUE error will be generated.  If <group>
264    is not a valid group, the INVALID_VALUE error will be generated.  If 
265    <numCounters> is less than 0, an INVALID_VALUE error will be generated. 
266
267    When SelectPerfMonitorCountersAMD is called on a monitor, any outstanding 
268    results for that monitor become invalidated and the result queries 
269    PERFMON_RESULT_SIZE_AMD and PERFMON_RESULT_AVAILABLE_AMD are reset to 0.
270    
271    The command
272    
273        void BeginPerfMonitorAMD(uint monitor);
274        
275    is used to start a monitor session.  Note that BeginPerfMonitor calls cannot 
276    be nested.  In addition, it is quite possible that given the list of groups 
277    and counters/group enabled for a monitor, it may not be able to sample the 
278    necessary counters and so the monitor session will fail.  In such a case,
279    an INVALID_OPERATION error will be generated.
280
281    While BeginPerfMonitorAMD does mark the beginning of performance counter
282    collection, the counters do not begin collecting immediately.  Rather, the
283    counters begin collection when BeginPerfMonitorAMD is processed by
284    the hardware.  That is, the API is asynchronous, and performance counter
285    collection does not begin until the graphics hardware processes the
286    BeginPerfMonitorAMD command.  
287    
288    The command
289    
290        void EndPerfMonitorAMD(uint monitor);
291        
292    ends a monitor session started by BeginPerfMonitorAMD.  If a performance 
293    monitor is not currently started, an INVALID_OPERATION error will be 
294    generated.
295    
296    Note that there is an implied overhead to collecting performance counters
297    that may or may not distort performance depending on the implementation.  
298    For example, some counters may require a pipeline flush thereby causing a
299    change in the performance of the application.  Further, the frequency at 
300    which an application samples may distort the accuracy of counters which are 
301    variant (e.g., non-deterministic based on the input).  While the effects 
302    of sampling frequency are implementation dependent, general guidance can
303    be given that sampling at a high frequency may distort both performance
304    of the application and the accuracy of variant counters.
305
306    The command
307    
308        void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, 
309                                          sizei dataSize, 
310                                          uint *data, sizei *bytesWritten);
311        
312    is used to return counter values that have been sampled for a monitor
313    session.  If <pname> is PERFMON_RESULT_AVAILABLE_AMD, then <data> will
314    indicate whether the result is available or not.  If <pname> is 
315    PERFMON_RESULT_SIZE_AMD, <data> will contain actual size of all counter 
316    results being sampled.  If <pname> is PERFMON_RESULT_AMD, <data> will
317    contain results.  For each counter of a group that was selected to be 
318    sampled, the information is returned as group ID, followed by counter ID, 
319    followed by counter value.  The size of counter value returned will depend 
320    on the counter value type.  The argument <dataSize> specifies the number of
321    bytes available in the <data> buffer for writing.  If <bytesWritten> is not 
322    NULL, it gives the number of bytes written into the <data> buffer.  It is an 
323    INVALID_OPERATION error for <data> to be NULL.  If <pname> is 
324    PERFMON_RESULT_AMD and <dataSize> is less than the number of bytes required 
325    to store the results as reported by a PERFMON_RESULT_SIZE_AMD query, then 
326    results will be written only up to the number of bytes specified by 
327    <dataSize>.
328
329    If no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for a monitor,
330    then the result of querying for PERFMON_RESULT_AVAILABLE and 
331    PERFMON_RESULT_SIZE will be 0.  When SelectPerfMonitorCountersAMD is called
332    on a monitor, the results stored for the monitor become invalidated and
333    the value of PERFMON_RESULT_AVAILABLE and PERFMON_RESULT_SIZE queries should
334    behave as if no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for
335    the monitor.
336
337Errors
338
339    INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is unable
340    to begin monitoring with the currently selected counters.  
341
342    INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is called
343    when a performance monitor is already active.
344
345    INVALID_OPERATION error will be generated if EndPerfMonitorAMD is called
346    when a performance monitor is not currently started.
347
348    INVALID_VALUE error will be generated if the <group> parameter to 
349    GetPerfMonitorCountersAMD, GetPerfMonitorCounterStringAMD,
350    GetPerfMonitorCounterStringAMD, GetPerfMonitorCounterInfoAMD, or
351    SelectPerfMonitorCountersAMD does not reference a valid group ID.
352
353    INVALID_VALUE error will be generated if the <counter> parameter to
354    GetPerfMonitorCounterInfoAMD does not reference a valid counter ID
355    in the group specified by <group>.
356
357    INVALID_VALUE error will be generated if any of the monitor IDs
358    in the <monitors> parameter to DeletePerfMonitorsAMD do not reference
359    a valid generated monitor ID.
360   
361    INVALID_VALUE error will be generated if the <monitor> parameter to
362    SelectPerfMonitorCountersAMD does not reference a monitor created by
363    GenPerfMonitorsAMD.
364
365    INVALID_VALUE error will be generated if the <numCounters> parameter to
366    SelectPerfMonitorCountersAMD is less than 0.
367
368     
369
370New State
371
372Sample Usage
373
374    typedef struct 
375    {
376            GLuint       *counterList;
377            int         numCounters;
378            int         maxActiveCounters;
379    } CounterInfo;
380
381    void
382    getGroupAndCounterList(GLuint **groupsList, int *numGroups, 
383                           CounterInfo **counterInfo)
384    {
385        GLint          n;
386        GLuint        *groups;
387        CounterInfo   *counters;
388
389        glGetPerfMonitorGroupsAMD(&n, 0, NULL);
390        groups = (GLuint*) malloc(n * sizeof(GLuint));
391        glGetPerfMonitorGroupsAMD(NULL, n, groups);
392        *numGroups = n;
393
394        *groupsList = groups;
395        counters = (CounterInfo*) malloc(sizeof(CounterInfo) * n);
396        for (int i = 0 ; i < n; i++ )
397        {
398            glGetPerfMonitorCountersAMD(groups[i], &counters[i].numCounters,
399                                     &counters[i].maxActiveCounters, 0, NULL);
400
401            counters[i].counterList = (GLuint*)malloc(counters[i].numCounters * 
402                                                      sizeof(int));
403
404            glGetPerfMonitorCountersAMD(groups[i], NULL, NULL,
405                                        counters[i].numCounters, 
406                                        counters[i].counterList);
407        }
408
409        *counterInfo = counters;
410    }
411    
412    static int  countersInitialized = 0;
413        
414    int
415    getCounterByName(char *groupName, char *counterName, GLuint *groupID, 
416                     GLuint *counterID)
417    {
418        int          numGroups;
419        GLuint       *groups;
420        CounterInfo  *counters;
421        int          i = 0;
422
423        if (!countersInitialized)
424        {
425            getGroupAndCounterList(&groups, &numGroups, &counters);
426            countersInitialized = 1;
427        }
428
429        for ( i = 0; i < numGroups; i++ )
430        {
431           char curGroupName[256];
432           glGetPerfMonitorGroupStringAMD(groups[i], 256, NULL, curGroupName);
433           if (strcmp(groupName, curGroupName) == 0)
434           {
435               *groupID = groups[i];
436               break;
437           }
438        }
439
440        if ( i == numGroups )
441            return -1;           // error - could not find the group name
442
443        for ( int j = 0; j < counters[i].numCounters; j++ )
444        {
445            char curCounterName[256];
446            
447            glGetPerfMonitorCounterStringAMD(groups[i],
448                                             counters[i].counterList[j], 
449                                             256, NULL, curCounterName);
450            if (strcmp(counterName, curCounterName) == 0)
451            {
452                *counterID = counters[i].counterList[j];
453                return 0;
454            }
455        }
456
457        return -1;           // error - could not find the counter name
458    }
459
460    void
461    drawFrameWithCounters(void)
462    {
463        GLuint group[2];
464        GLuint counter[2];
465        GLuint monitor;
466        GLuint *counterData;
467
468        // Get group/counter IDs by name.  Note that normally the
469        // counter and group names need to be queried for because
470        // each implementation of this extension on different hardware
471        // could define different names and groups.  This is just provided
472        // to demonstrate the API.
473        getCounterByName("HW", "Hardware Busy", &group[0],
474                         &counter[0]);
475        getCounterByName("API", "Draw Calls", &group[1], 
476                         &counter[1]);
477                
478        // create perf monitor ID
479        glGenPerfMonitorsAMD(1, &monitor);
480
481        // enable the counters
482        glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[0], 1,
483                                       &counter[0]);
484        glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[1], 1, 
485                                       &counter[1]);
486
487        glBeginPerfMonitorAMD(monitor);
488
489        // RENDER FRAME HERE
490        // ...
491        
492        glEndPerfMonitorAMD(monitor);
493
494        // read the counters
495        GLint resultSize;
496        glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_SIZE_AMD, 
497                                       sizeof(GLint), &resultSize, NULL);
498
499        counterData = (GLuint*) malloc(resultSize);
500
501        GLsizei bytesWritten;
502        glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_AMD,  
503                                       resultSize, counterData, &bytesWritten);
504
505        // display or log counter info
506        GLsizei wordCount = 0;
507
508        while ( (4 * wordCount) < bytesWritten )
509        {
510            GLuint groupId = counterData[wordCount];
511            GLuint counterId = counterData[wordCount + 1];
512
513            // Determine the counter type
514            GLuint counterType;
515            glGetPerfMonitorCounterInfoAMD(groupId, counterId, 
516                                           GL_COUNTER_TYPE_AMD, &counterType);
517 
518            if ( counterType == GL_UNSIGNED_INT64_AMD )
519            {
520                unsigned __int64 counterResult = 
521                           *(unsigned __int64*)(&counterData[wordCount + 2]);
522
523                // Print counter result
524
525                wordCount += 4;
526            }
527            else if ( counterType == GL_FLOAT )
528            {
529                float counterResult = *(float*)(&counterData[wordCount + 2]);
530
531                // Print counter result
532
533                wordCount += 3;
534            } 
535            // else if ( ... ) check for other counter types 
536            //   (GL_UNSIGNED_INT and GL_PERCENTAGE_AMD)
537        }
538    }
539 
540Revision History
541    11/29/2007 - dginsburg
542       + Clarified the default state of a performance monitor object on creation
543
544    11/09/2007 - dginsbur
545       + Clarify what happens if SelectPerfMonitorCountersAMD is called on
546         a monitor with outstanding query results.
547       + Rename counterSize to countersSize
548       + Remove some ';' typos
549
550    06/13/2007 - dginsbur
551       + Add language on the asynchronous nature of the API and 
552         counter accuracy/performance distortion.
553       + Add myself as the contact
554       + Remove INVALID_OPERATION error when countersList is NULL
555       + Clarify 64-bit issue
556       + Make PERCENTAGE_AMD counters float rather than uint
557       + Clarify accuracy distortion on variant counters only
558       + Tweak to overview language
559
560    06/09/2007 - dginsbur
561       + Fill in errors section and make many more errors explicit
562       + Fix the example code so it compiles
563
564    06/08/2007 - dginsbur
565       + Modified GetPerfMonitorGroupString and GetPerfMonitorCounterString to
566         be more client/server friendly.  
567       + Modified example.
568       + Renamed parameters/variables to follow GL conventions.
569       + Modified several 'int' param types to 'sizei'
570       + Modifid counters type from 'int' to 'uint'
571       + Renamed argument 'cb' and 'cbret'
572       + Better documented GetPerfMonitorCounterData 
573       + Add AMD adornment in many places that were missing it
574 
575    06/07/2007 - dginsbur
576       + Cleanup formatting, remove tabs, make fit in proper page width
577       + Add FLOAT and UNSIGNED_INT to list of COUNTER_TYPEs
578       + Fix some bugs in the example code
579       + Rewrite introduction
580       + Clarified Issue 1 reasoning
581       + Added Issue 3 regarding use of 64-bit data types
582       + Added revision history
583
584    03/21/2007 - Initial version written.  Written by amunshi.
585
586        
587