18c2ecf20Sopenharmony_ci======================================= 28c2ecf20Sopenharmony_ciOracle Data Analytics Accelerator (DAX) 38c2ecf20Sopenharmony_ci======================================= 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciDAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8 68c2ecf20Sopenharmony_ci(DAX2) processor chips, and has direct access to the CPU's L3 caches 78c2ecf20Sopenharmony_cias well as physical memory. It can perform several operations on data 88c2ecf20Sopenharmony_cistreams with various input and output formats. A driver provides a 98c2ecf20Sopenharmony_citransport mechanism and has limited knowledge of the various opcodes 108c2ecf20Sopenharmony_ciand data formats. A user space library provides high level services 118c2ecf20Sopenharmony_ciand translates these into low level commands which are then passed 128c2ecf20Sopenharmony_ciinto the driver and subsequently the Hypervisor and the coprocessor. 138c2ecf20Sopenharmony_ciThe library is the recommended way for applications to use the 148c2ecf20Sopenharmony_cicoprocessor, and the driver interface is not intended for general use. 158c2ecf20Sopenharmony_ciThis document describes the general flow of the driver, its 168c2ecf20Sopenharmony_cistructures, and its programmatic interface. It also provides example 178c2ecf20Sopenharmony_cicode sufficient to write user or kernel applications that use DAX 188c2ecf20Sopenharmony_cifunctionality. 198c2ecf20Sopenharmony_ci 208c2ecf20Sopenharmony_ciThe user library is open source and available at: 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ci https://oss.oracle.com/git/gitweb.cgi?p=libdax.git 238c2ecf20Sopenharmony_ci 248c2ecf20Sopenharmony_ciThe Hypervisor interface to the coprocessor is described in detail in 258c2ecf20Sopenharmony_cithe accompanying document, dax-hv-api.txt, which is a plain text 268c2ecf20Sopenharmony_ciexcerpt of the (Oracle internal) "UltraSPARC Virtual Machine 278c2ecf20Sopenharmony_ciSpecification" version 3.0.20+15, dated 2017-09-25. 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_ci 308c2ecf20Sopenharmony_ciHigh Level Overview 318c2ecf20Sopenharmony_ci=================== 328c2ecf20Sopenharmony_ci 338c2ecf20Sopenharmony_ciA coprocessor request is described by a Command Control Block 348c2ecf20Sopenharmony_ci(CCB). The CCB contains an opcode and various parameters. The opcode 358c2ecf20Sopenharmony_cispecifies what operation is to be done, and the parameters specify 368c2ecf20Sopenharmony_cioptions, flags, sizes, and addresses. The CCB (or an array of CCBs) 378c2ecf20Sopenharmony_ciis passed to the Hypervisor, which handles queueing and scheduling of 388c2ecf20Sopenharmony_cirequests to the available coprocessor execution units. A status code 398c2ecf20Sopenharmony_cireturned indicates if the request was submitted successfully or if 408c2ecf20Sopenharmony_cithere was an error. One of the addresses given in each CCB is a 418c2ecf20Sopenharmony_cipointer to a "completion area", which is a 128 byte memory block that 428c2ecf20Sopenharmony_ciis written by the coprocessor to provide execution status. No 438c2ecf20Sopenharmony_ciinterrupt is generated upon completion; the completion area must be 448c2ecf20Sopenharmony_cipolled by software to find out when a transaction has finished, but 458c2ecf20Sopenharmony_cithe M7 and later processors provide a mechanism to pause the virtual 468c2ecf20Sopenharmony_ciprocessor until the completion status has been updated by the 478c2ecf20Sopenharmony_cicoprocessor. This is done using the monitored load and mwait 488c2ecf20Sopenharmony_ciinstructions, which are described in more detail later. The DAX 498c2ecf20Sopenharmony_cicoprocessor was designed so that after a request is submitted, the 508c2ecf20Sopenharmony_cikernel is no longer involved in the processing of it. The polling is 518c2ecf20Sopenharmony_cidone at the user level, which results in almost zero latency between 528c2ecf20Sopenharmony_cicompletion of a request and resumption of execution of the requesting 538c2ecf20Sopenharmony_cithread. 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_ciAddressing Memory 578c2ecf20Sopenharmony_ci================= 588c2ecf20Sopenharmony_ci 598c2ecf20Sopenharmony_ciThe kernel does not have access to physical memory in the Sun4v 608c2ecf20Sopenharmony_ciarchitecture, as there is an additional level of memory virtualization 618c2ecf20Sopenharmony_cipresent. This intermediate level is called "real" memory, and the 628c2ecf20Sopenharmony_cikernel treats this as if it were physical. The Hypervisor handles the 638c2ecf20Sopenharmony_citranslations between real memory and physical so that each logical 648c2ecf20Sopenharmony_cidomain (LDOM) can have a partition of physical memory that is isolated 658c2ecf20Sopenharmony_cifrom that of other LDOMs. When the kernel sets up a virtual mapping, 668c2ecf20Sopenharmony_ciit specifies a virtual address and the real address to which it should 678c2ecf20Sopenharmony_cibe mapped. 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciThe DAX coprocessor can only operate on physical memory, so before a 708c2ecf20Sopenharmony_cirequest can be fed to the coprocessor, all the addresses in a CCB must 718c2ecf20Sopenharmony_cibe converted into physical addresses. The kernel cannot do this since 728c2ecf20Sopenharmony_ciit has no visibility into physical addresses. So a CCB may contain 738c2ecf20Sopenharmony_cieither the virtual or real addresses of the buffers or a combination 748c2ecf20Sopenharmony_ciof them. An "address type" field is available for each address that 758c2ecf20Sopenharmony_cimay be given in the CCB. In all cases, the Hypervisor will translate 768c2ecf20Sopenharmony_ciall the addresses to physical before dispatching to hardware. Address 778c2ecf20Sopenharmony_citranslations are performed using the context of the process initiating 788c2ecf20Sopenharmony_cithe request. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ciThe Driver API 828c2ecf20Sopenharmony_ci============== 838c2ecf20Sopenharmony_ci 848c2ecf20Sopenharmony_ciAn application makes requests to the driver via the write() system 858c2ecf20Sopenharmony_cicall, and gets results (if any) via read(). The completion areas are 868c2ecf20Sopenharmony_cimade accessible via mmap(), and are read-only for the application. 878c2ecf20Sopenharmony_ci 888c2ecf20Sopenharmony_ciThe request may either be an immediate command or an array of CCBs to 898c2ecf20Sopenharmony_cibe submitted to the hardware. 908c2ecf20Sopenharmony_ci 918c2ecf20Sopenharmony_ciEach open instance of the device is exclusive to the thread that 928c2ecf20Sopenharmony_ciopened it, and must be used by that thread for all subsequent 938c2ecf20Sopenharmony_cioperations. The driver open function creates a new context for the 948c2ecf20Sopenharmony_cithread and initializes it for use. This context contains pointers and 958c2ecf20Sopenharmony_civalues used internally by the driver to keep track of submitted 968c2ecf20Sopenharmony_cirequests. The completion area buffer is also allocated, and this is 978c2ecf20Sopenharmony_cilarge enough to contain the completion areas for many concurrent 988c2ecf20Sopenharmony_cirequests. When the device is closed, any outstanding transactions are 998c2ecf20Sopenharmony_ciflushed and the context is cleaned up. 1008c2ecf20Sopenharmony_ci 1018c2ecf20Sopenharmony_ciOn a DAX1 system (M7), the device will be called "oradax1", while on a 1028c2ecf20Sopenharmony_ciDAX2 system (M8) it will be "oradax2". If an application requires one 1038c2ecf20Sopenharmony_cior the other, it should simply attempt to open the appropriate 1048c2ecf20Sopenharmony_cidevice. Only one of the devices will exist on any given system, so the 1058c2ecf20Sopenharmony_ciname can be used to determine what the platform supports. 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciThe immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For 1088c2ecf20Sopenharmony_ciall of these, success is indicated by a return value from write() 1098c2ecf20Sopenharmony_ciequal to the number of bytes given in the call. Otherwise -1 is 1108c2ecf20Sopenharmony_cireturned and errno is set. 1118c2ecf20Sopenharmony_ci 1128c2ecf20Sopenharmony_ciCCB_DEQUEUE 1138c2ecf20Sopenharmony_ci----------- 1148c2ecf20Sopenharmony_ci 1158c2ecf20Sopenharmony_ciTells the driver to clean up resources associated with past 1168c2ecf20Sopenharmony_cirequests. Since no interrupt is generated upon the completion of a 1178c2ecf20Sopenharmony_cirequest, the driver must be told when it may reclaim resources. No 1188c2ecf20Sopenharmony_cifurther status information is returned, so the user should not 1198c2ecf20Sopenharmony_cisubsequently call read(). 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ciCCB_KILL 1228c2ecf20Sopenharmony_ci-------- 1238c2ecf20Sopenharmony_ci 1248c2ecf20Sopenharmony_ciKills a CCB during execution. The CCB is guaranteed to not continue 1258c2ecf20Sopenharmony_ciexecuting once this call returns successfully. On success, read() must 1268c2ecf20Sopenharmony_cibe called to retrieve the result of the action. 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ciCCB_INFO 1298c2ecf20Sopenharmony_ci-------- 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_ciRetrieves information about a currently executing CCB. Note that some 1328c2ecf20Sopenharmony_ciHypervisors might return 'notfound' when the CCB is in 'inprogress' 1338c2ecf20Sopenharmony_cistate. To ensure a CCB in the 'notfound' state will never be executed, 1348c2ecf20Sopenharmony_ciCCB_KILL must be invoked on that CCB. Upon success, read() must be 1358c2ecf20Sopenharmony_cicalled to retrieve the details of the action. 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ciSubmission of an array of CCBs for execution 1388c2ecf20Sopenharmony_ci--------------------------------------------- 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ciA write() whose length is a multiple of the CCB size is treated as a 1418c2ecf20Sopenharmony_cisubmit operation. The file offset is treated as the index of the 1428c2ecf20Sopenharmony_cicompletion area to use, and may be set via lseek() or using the 1438c2ecf20Sopenharmony_cipwrite() system call. If -1 is returned then errno is set to indicate 1448c2ecf20Sopenharmony_cithe error. Otherwise, the return value is the length of the array that 1458c2ecf20Sopenharmony_ciwas actually accepted by the coprocessor. If the accepted length is 1468c2ecf20Sopenharmony_ciequal to the requested length, then the submission was completely 1478c2ecf20Sopenharmony_cisuccessful and there is no further status needed; hence, the user 1488c2ecf20Sopenharmony_cishould not subsequently call read(). Partial acceptance of the CCB 1498c2ecf20Sopenharmony_ciarray is indicated by a return value less than the requested length, 1508c2ecf20Sopenharmony_ciand read() must be called to retrieve further status information. The 1518c2ecf20Sopenharmony_cistatus will reflect the error caused by the first CCB that was not 1528c2ecf20Sopenharmony_ciaccepted, and status_data will provide additional data in some cases. 1538c2ecf20Sopenharmony_ci 1548c2ecf20Sopenharmony_ciMMAP 1558c2ecf20Sopenharmony_ci---- 1568c2ecf20Sopenharmony_ci 1578c2ecf20Sopenharmony_ciThe mmap() function provides access to the completion area allocated 1588c2ecf20Sopenharmony_ciin the driver. Note that the completion area is not writeable by the 1598c2ecf20Sopenharmony_ciuser process, and the mmap call must not specify PROT_WRITE. 1608c2ecf20Sopenharmony_ci 1618c2ecf20Sopenharmony_ci 1628c2ecf20Sopenharmony_ciCompletion of a Request 1638c2ecf20Sopenharmony_ci======================= 1648c2ecf20Sopenharmony_ci 1658c2ecf20Sopenharmony_ciThe first byte in each completion area is the command status which is 1668c2ecf20Sopenharmony_ciupdated by the coprocessor hardware. Software may take advantage of 1678c2ecf20Sopenharmony_cinew M7/M8 processor capabilities to efficiently poll this status byte. 1688c2ecf20Sopenharmony_ciFirst, a "monitored load" is achieved via a Load from Alternate Space 1698c2ecf20Sopenharmony_ci(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY). Second, a 1708c2ecf20Sopenharmony_ci"monitored wait" is achieved via the mwait instruction (a write to 1718c2ecf20Sopenharmony_ci%asr28). This instruction is like pause in that it suspends execution 1728c2ecf20Sopenharmony_ciof the virtual processor for the given number of nanoseconds, but in 1738c2ecf20Sopenharmony_ciaddition will terminate early when one of several events occur. If the 1748c2ecf20Sopenharmony_ciblock of data containing the monitored location is modified, then the 1758c2ecf20Sopenharmony_cimwait terminates. This causes software to resume execution immediately 1768c2ecf20Sopenharmony_ci(without a context switch or kernel to user transition) after a 1778c2ecf20Sopenharmony_citransaction completes. Thus the latency between transaction completion 1788c2ecf20Sopenharmony_ciand resumption of execution may be just a few nanoseconds. 1798c2ecf20Sopenharmony_ci 1808c2ecf20Sopenharmony_ci 1818c2ecf20Sopenharmony_ciApplication Life Cycle of a DAX Submission 1828c2ecf20Sopenharmony_ci========================================== 1838c2ecf20Sopenharmony_ci 1848c2ecf20Sopenharmony_ci - open dax device 1858c2ecf20Sopenharmony_ci - call mmap() to get the completion area address 1868c2ecf20Sopenharmony_ci - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc. 1878c2ecf20Sopenharmony_ci - submit CCB via write() or pwrite() 1888c2ecf20Sopenharmony_ci - go into a loop executing monitored load + monitored wait and 1898c2ecf20Sopenharmony_ci terminate when the command status indicates the request is complete 1908c2ecf20Sopenharmony_ci (CCB_KILL or CCB_INFO may be used any time as necessary) 1918c2ecf20Sopenharmony_ci - perform a CCB_DEQUEUE 1928c2ecf20Sopenharmony_ci - call munmap() for completion area 1938c2ecf20Sopenharmony_ci - close the dax device 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ci 1968c2ecf20Sopenharmony_ciMemory Constraints 1978c2ecf20Sopenharmony_ci================== 1988c2ecf20Sopenharmony_ci 1998c2ecf20Sopenharmony_ciThe DAX hardware operates only on physical addresses. Therefore, it is 2008c2ecf20Sopenharmony_cinot aware of virtual memory mappings and the discontiguities that may 2018c2ecf20Sopenharmony_ciexist in the physical memory that a virtual buffer maps to. There is 2028c2ecf20Sopenharmony_cino I/O TLB or any scatter/gather mechanism. All buffers, whether input 2038c2ecf20Sopenharmony_cior output, must reside in a physically contiguous region of memory. 2048c2ecf20Sopenharmony_ci 2058c2ecf20Sopenharmony_ciThe Hypervisor translates all addresses within a CCB to physical 2068c2ecf20Sopenharmony_cibefore handing off the CCB to DAX. The Hypervisor determines the 2078c2ecf20Sopenharmony_civirtual page size for each virtual address given, and uses this to 2088c2ecf20Sopenharmony_ciprogram a size limit for each address. This prevents the coprocessor 2098c2ecf20Sopenharmony_cifrom reading or writing beyond the bound of the virtual page, even 2108c2ecf20Sopenharmony_cithough it is accessing physical memory directly. A simpler way of 2118c2ecf20Sopenharmony_cisaying this is that a DAX operation will never "cross" a virtual page 2128c2ecf20Sopenharmony_ciboundary. If an 8k virtual page is used, then the data is strictly 2138c2ecf20Sopenharmony_cilimited to 8k. If a user's buffer is larger than 8k, then a larger 2148c2ecf20Sopenharmony_cipage size must be used, or the transaction size will be truncated to 2158c2ecf20Sopenharmony_ci8k. 2168c2ecf20Sopenharmony_ci 2178c2ecf20Sopenharmony_ciHuge pages. A user may allocate huge pages using standard interfaces. 2188c2ecf20Sopenharmony_ciMemory buffers residing on huge pages may be used to achieve much 2198c2ecf20Sopenharmony_cilarger DAX transaction sizes, but the rules must still be followed, 2208c2ecf20Sopenharmony_ciand no transaction will cross a page boundary, even a huge page. A 2218c2ecf20Sopenharmony_cimajor caveat is that Linux on Sparc presents 8Mb as one of the huge 2228c2ecf20Sopenharmony_cipage sizes. Sparc does not actually provide a 8Mb hardware page size, 2238c2ecf20Sopenharmony_ciand this size is synthesized by pasting together two 4Mb pages. The 2248c2ecf20Sopenharmony_cireasons for this are historical, and it creates an issue because only 2258c2ecf20Sopenharmony_cihalf of this 8Mb page can actually be used for any given buffer in a 2268c2ecf20Sopenharmony_ciDAX request, and it must be either the first half or the second half; 2278c2ecf20Sopenharmony_ciit cannot be a 4Mb chunk in the middle, since that crosses a 2288c2ecf20Sopenharmony_ci(hardware) page boundary. Note that this entire issue may be hidden by 2298c2ecf20Sopenharmony_cihigher level libraries. 2308c2ecf20Sopenharmony_ci 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciCCB Structure 2338c2ecf20Sopenharmony_ci------------- 2348c2ecf20Sopenharmony_ciA CCB is an array of 8 64-bit words. Several of these words provide 2358c2ecf20Sopenharmony_cicommand opcodes, parameters, flags, etc., and the rest are addresses 2368c2ecf20Sopenharmony_cifor the completion area, output buffer, and various inputs:: 2378c2ecf20Sopenharmony_ci 2388c2ecf20Sopenharmony_ci struct ccb { 2398c2ecf20Sopenharmony_ci u64 control; 2408c2ecf20Sopenharmony_ci u64 completion; 2418c2ecf20Sopenharmony_ci u64 input0; 2428c2ecf20Sopenharmony_ci u64 access; 2438c2ecf20Sopenharmony_ci u64 input1; 2448c2ecf20Sopenharmony_ci u64 op_data; 2458c2ecf20Sopenharmony_ci u64 output; 2468c2ecf20Sopenharmony_ci u64 table; 2478c2ecf20Sopenharmony_ci }; 2488c2ecf20Sopenharmony_ci 2498c2ecf20Sopenharmony_ciSee libdax/common/sys/dax1/dax1_ccb.h for a detailed description of 2508c2ecf20Sopenharmony_cieach of these fields, and see dax-hv-api.txt for a complete description 2518c2ecf20Sopenharmony_ciof the Hypervisor API available to the guest OS (ie, Linux kernel). 2528c2ecf20Sopenharmony_ci 2538c2ecf20Sopenharmony_ciThe first word (control) is examined by the driver for the following: 2548c2ecf20Sopenharmony_ci - CCB version, which must be consistent with hardware version 2558c2ecf20Sopenharmony_ci - Opcode, which must be one of the documented allowable commands 2568c2ecf20Sopenharmony_ci - Address types, which must be set to "virtual" for all the addresses 2578c2ecf20Sopenharmony_ci given by the user, thereby ensuring that the application can 2588c2ecf20Sopenharmony_ci only access memory that it owns 2598c2ecf20Sopenharmony_ci 2608c2ecf20Sopenharmony_ci 2618c2ecf20Sopenharmony_ciExample Code 2628c2ecf20Sopenharmony_ci============ 2638c2ecf20Sopenharmony_ci 2648c2ecf20Sopenharmony_ciThe DAX is accessible to both user and kernel code. The kernel code 2658c2ecf20Sopenharmony_cican make hypercalls directly while the user code must use wrappers 2668c2ecf20Sopenharmony_ciprovided by the driver. The setup of the CCB is nearly identical for 2678c2ecf20Sopenharmony_ciboth; the only difference is in preparation of the completion area. An 2688c2ecf20Sopenharmony_ciexample of user code is given now, with kernel code afterwards. 2698c2ecf20Sopenharmony_ci 2708c2ecf20Sopenharmony_ciIn order to program using the driver API, the file 2718c2ecf20Sopenharmony_ciarch/sparc/include/uapi/asm/oradax.h must be included. 2728c2ecf20Sopenharmony_ci 2738c2ecf20Sopenharmony_ciFirst, the proper device must be opened. For M7 it will be 2748c2ecf20Sopenharmony_ci/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest 2758c2ecf20Sopenharmony_ciprocedure is to attempt to open both, as only one will succeed:: 2768c2ecf20Sopenharmony_ci 2778c2ecf20Sopenharmony_ci fd = open("/dev/oradax1", O_RDWR); 2788c2ecf20Sopenharmony_ci if (fd < 0) 2798c2ecf20Sopenharmony_ci fd = open("/dev/oradax2", O_RDWR); 2808c2ecf20Sopenharmony_ci if (fd < 0) 2818c2ecf20Sopenharmony_ci /* No DAX found */ 2828c2ecf20Sopenharmony_ci 2838c2ecf20Sopenharmony_ciNext, the completion area must be mapped:: 2848c2ecf20Sopenharmony_ci 2858c2ecf20Sopenharmony_ci completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0); 2868c2ecf20Sopenharmony_ci 2878c2ecf20Sopenharmony_ciAll input and output buffers must be fully contained in one hardware 2888c2ecf20Sopenharmony_cipage, since as explained above, the DAX is strictly constrained by 2898c2ecf20Sopenharmony_civirtual page boundaries. In addition, the output buffer must be 2908c2ecf20Sopenharmony_ci64-byte aligned and its size must be a multiple of 64 bytes because 2918c2ecf20Sopenharmony_cithe coprocessor writes in units of cache lines. 2928c2ecf20Sopenharmony_ci 2938c2ecf20Sopenharmony_ciThis example demonstrates the DAX Scan command, which takes as input a 2948c2ecf20Sopenharmony_civector and a match value, and produces a bitmap as the output. For 2958c2ecf20Sopenharmony_cieach input element that matches the value, the corresponding bit is 2968c2ecf20Sopenharmony_ciset in the output. 2978c2ecf20Sopenharmony_ci 2988c2ecf20Sopenharmony_ciIn this example, the input vector consists of a series of single bits, 2998c2ecf20Sopenharmony_ciand the match value is 0. So each 0 bit in the input will produce a 1 3008c2ecf20Sopenharmony_ciin the output, and vice versa, which produces an output bitmap which 3018c2ecf20Sopenharmony_ciis the input bitmap inverted. 3028c2ecf20Sopenharmony_ci 3038c2ecf20Sopenharmony_ciFor details of all the parameters and bits used in this CCB, please 3048c2ecf20Sopenharmony_cirefer to section 36.2.1.3 of the DAX Hypervisor API document, which 3058c2ecf20Sopenharmony_cidescribes the Scan command in detail:: 3068c2ecf20Sopenharmony_ci 3078c2ecf20Sopenharmony_ci ccb->control = /* Table 36.1, CCB Header Format */ 3088c2ecf20Sopenharmony_ci (2L << 48) /* command = Scan Value */ 3098c2ecf20Sopenharmony_ci | (3L << 40) /* output address type = primary virtual */ 3108c2ecf20Sopenharmony_ci | (3L << 34) /* primary input address type = primary virtual */ 3118c2ecf20Sopenharmony_ci /* Section 36.2.1, Query CCB Command Formats */ 3128c2ecf20Sopenharmony_ci | (1 << 28) /* 36.2.1.1.1 primary input format = fixed width bit packed */ 3138c2ecf20Sopenharmony_ci | (0 << 23) /* 36.2.1.1.2 primary input element size = 0 (1 bit) */ 3148c2ecf20Sopenharmony_ci | (8 << 10) /* 36.2.1.1.6 output format = bit vector */ 3158c2ecf20Sopenharmony_ci | (0 << 5) /* 36.2.1.3 First scan criteria size = 0 (1 byte) */ 3168c2ecf20Sopenharmony_ci | (31 << 0); /* 36.2.1.3 Disable second scan criteria */ 3178c2ecf20Sopenharmony_ci 3188c2ecf20Sopenharmony_ci ccb->completion = 0; /* Completion area address, to be filled in by driver */ 3198c2ecf20Sopenharmony_ci 3208c2ecf20Sopenharmony_ci ccb->input0 = (unsigned long) input; /* primary input address */ 3218c2ecf20Sopenharmony_ci 3228c2ecf20Sopenharmony_ci ccb->access = /* Section 36.2.1.2, Data Access Control */ 3238c2ecf20Sopenharmony_ci (2 << 24) /* Primary input length format = bits */ 3248c2ecf20Sopenharmony_ci | (nbits - 1); /* number of bits in primary input stream, minus 1 */ 3258c2ecf20Sopenharmony_ci 3268c2ecf20Sopenharmony_ci ccb->input1 = 0; /* secondary input address, unused */ 3278c2ecf20Sopenharmony_ci 3288c2ecf20Sopenharmony_ci ccb->op_data = 0; /* scan criteria (value to be matched) */ 3298c2ecf20Sopenharmony_ci 3308c2ecf20Sopenharmony_ci ccb->output = (unsigned long) output; /* output address */ 3318c2ecf20Sopenharmony_ci 3328c2ecf20Sopenharmony_ci ccb->table = 0; /* table address, unused */ 3338c2ecf20Sopenharmony_ci 3348c2ecf20Sopenharmony_ciThe CCB submission is a write() or pwrite() system call to the 3358c2ecf20Sopenharmony_cidriver. If the call fails, then a read() must be used to retrieve the 3368c2ecf20Sopenharmony_cistatus:: 3378c2ecf20Sopenharmony_ci 3388c2ecf20Sopenharmony_ci if (pwrite(fd, ccb, 64, 0) != 64) { 3398c2ecf20Sopenharmony_ci struct ccb_exec_result status; 3408c2ecf20Sopenharmony_ci read(fd, &status, sizeof(status)); 3418c2ecf20Sopenharmony_ci /* bail out */ 3428c2ecf20Sopenharmony_ci } 3438c2ecf20Sopenharmony_ci 3448c2ecf20Sopenharmony_ciAfter a successful submission of the CCB, the completion area may be 3458c2ecf20Sopenharmony_cipolled to determine when the DAX is finished. Detailed information on 3468c2ecf20Sopenharmony_cithe contents of the completion area can be found in section 36.2.2 of 3478c2ecf20Sopenharmony_cithe DAX HV API document:: 3488c2ecf20Sopenharmony_ci 3498c2ecf20Sopenharmony_ci while (1) { 3508c2ecf20Sopenharmony_ci /* Monitored Load */ 3518c2ecf20Sopenharmony_ci __asm__ __volatile__("lduba [%1] 0x84, %0\n" 3528c2ecf20Sopenharmony_ci : "=r" (status) 3538c2ecf20Sopenharmony_ci : "r" (completion_area)); 3548c2ecf20Sopenharmony_ci 3558c2ecf20Sopenharmony_ci if (status) /* 0 indicates command in progress */ 3568c2ecf20Sopenharmony_ci break; 3578c2ecf20Sopenharmony_ci 3588c2ecf20Sopenharmony_ci /* MWAIT */ 3598c2ecf20Sopenharmony_ci __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */ 3608c2ecf20Sopenharmony_ci } 3618c2ecf20Sopenharmony_ci 3628c2ecf20Sopenharmony_ciA completion area status of 1 indicates successful completion of the 3638c2ecf20Sopenharmony_ciCCB and validity of the output bitmap, which may be used immediately. 3648c2ecf20Sopenharmony_ciAll other non-zero values indicate error conditions which are 3658c2ecf20Sopenharmony_cidescribed in section 36.2.2:: 3668c2ecf20Sopenharmony_ci 3678c2ecf20Sopenharmony_ci if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */ 3688c2ecf20Sopenharmony_ci /* completion_area[0] contains the completion status */ 3698c2ecf20Sopenharmony_ci /* completion_area[1] contains an error code, see 36.2.2 */ 3708c2ecf20Sopenharmony_ci } 3718c2ecf20Sopenharmony_ci 3728c2ecf20Sopenharmony_ciAfter the completion area has been processed, the driver must be 3738c2ecf20Sopenharmony_cinotified that it can release any resources associated with the 3748c2ecf20Sopenharmony_cirequest. This is done via the dequeue operation:: 3758c2ecf20Sopenharmony_ci 3768c2ecf20Sopenharmony_ci struct dax_command cmd; 3778c2ecf20Sopenharmony_ci cmd.command = CCB_DEQUEUE; 3788c2ecf20Sopenharmony_ci if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) { 3798c2ecf20Sopenharmony_ci /* bail out */ 3808c2ecf20Sopenharmony_ci } 3818c2ecf20Sopenharmony_ci 3828c2ecf20Sopenharmony_ciFinally, normal program cleanup should be done, i.e., unmapping 3838c2ecf20Sopenharmony_cicompletion area, closing the dax device, freeing memory etc. 3848c2ecf20Sopenharmony_ci 3858c2ecf20Sopenharmony_ciKernel example 3868c2ecf20Sopenharmony_ci-------------- 3878c2ecf20Sopenharmony_ci 3888c2ecf20Sopenharmony_ciThe only difference in using the DAX in kernel code is the treatment 3898c2ecf20Sopenharmony_ciof the completion area. Unlike user applications which mmap the 3908c2ecf20Sopenharmony_cicompletion area allocated by the driver, kernel code must allocate its 3918c2ecf20Sopenharmony_ciown memory to use for the completion area, and this address and its 3928c2ecf20Sopenharmony_citype must be given in the CCB:: 3938c2ecf20Sopenharmony_ci 3948c2ecf20Sopenharmony_ci ccb->control |= /* Table 36.1, CCB Header Format */ 3958c2ecf20Sopenharmony_ci (3L << 32); /* completion area address type = primary virtual */ 3968c2ecf20Sopenharmony_ci 3978c2ecf20Sopenharmony_ci ccb->completion = (unsigned long) completion_area; /* Completion area address */ 3988c2ecf20Sopenharmony_ci 3998c2ecf20Sopenharmony_ciThe dax submit hypercall is made directly. The flags used in the 4008c2ecf20Sopenharmony_ciccb_submit call are documented in the DAX HV API in section 36.3.1/ 4018c2ecf20Sopenharmony_ci 4028c2ecf20Sopenharmony_ci:: 4038c2ecf20Sopenharmony_ci 4048c2ecf20Sopenharmony_ci #include <asm/hypervisor.h> 4058c2ecf20Sopenharmony_ci 4068c2ecf20Sopenharmony_ci hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64, 4078c2ecf20Sopenharmony_ci HV_CCB_QUERY_CMD | 4088c2ecf20Sopenharmony_ci HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY | 4098c2ecf20Sopenharmony_ci HV_CCB_VA_PRIVILEGED, 4108c2ecf20Sopenharmony_ci 0, &bytes_accepted, &status_data); 4118c2ecf20Sopenharmony_ci 4128c2ecf20Sopenharmony_ci if (hv_rv != HV_EOK) { 4138c2ecf20Sopenharmony_ci /* hv_rv is an error code, status_data contains */ 4148c2ecf20Sopenharmony_ci /* potential additional status, see 36.3.1.1 */ 4158c2ecf20Sopenharmony_ci } 4168c2ecf20Sopenharmony_ci 4178c2ecf20Sopenharmony_ciAfter the submission, the completion area polling code is identical to 4188c2ecf20Sopenharmony_cithat in user land:: 4198c2ecf20Sopenharmony_ci 4208c2ecf20Sopenharmony_ci while (1) { 4218c2ecf20Sopenharmony_ci /* Monitored Load */ 4228c2ecf20Sopenharmony_ci __asm__ __volatile__("lduba [%1] 0x84, %0\n" 4238c2ecf20Sopenharmony_ci : "=r" (status) 4248c2ecf20Sopenharmony_ci : "r" (completion_area)); 4258c2ecf20Sopenharmony_ci 4268c2ecf20Sopenharmony_ci if (status) /* 0 indicates command in progress */ 4278c2ecf20Sopenharmony_ci break; 4288c2ecf20Sopenharmony_ci 4298c2ecf20Sopenharmony_ci /* MWAIT */ 4308c2ecf20Sopenharmony_ci __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */ 4318c2ecf20Sopenharmony_ci } 4328c2ecf20Sopenharmony_ci 4338c2ecf20Sopenharmony_ci if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */ 4348c2ecf20Sopenharmony_ci /* completion_area[0] contains the completion status */ 4358c2ecf20Sopenharmony_ci /* completion_area[1] contains an error code, see 36.2.2 */ 4368c2ecf20Sopenharmony_ci } 4378c2ecf20Sopenharmony_ci 4388c2ecf20Sopenharmony_ciThe output bitmap is ready for consumption immediately after the 4398c2ecf20Sopenharmony_cicompletion status indicates success. 4408c2ecf20Sopenharmony_ci 4418c2ecf20Sopenharmony_ciExcer[t from UltraSPARC Virtual Machine Specification 4428c2ecf20Sopenharmony_ci===================================================== 4438c2ecf20Sopenharmony_ci 4448c2ecf20Sopenharmony_ci .. include:: dax-hv-api.txt 4458c2ecf20Sopenharmony_ci :literal: 446