18c2ecf20Sopenharmony_ci=======================================
28c2ecf20Sopenharmony_ciOracle Data Analytics Accelerator (DAX)
38c2ecf20Sopenharmony_ci=======================================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciDAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
68c2ecf20Sopenharmony_ci(DAX2) processor chips, and has direct access to the CPU's L3 caches
78c2ecf20Sopenharmony_cias well as physical memory. It can perform several operations on data
88c2ecf20Sopenharmony_cistreams with various input and output formats.  A driver provides a
98c2ecf20Sopenharmony_citransport mechanism and has limited knowledge of the various opcodes
108c2ecf20Sopenharmony_ciand data formats. A user space library provides high level services
118c2ecf20Sopenharmony_ciand translates these into low level commands which are then passed
128c2ecf20Sopenharmony_ciinto the driver and subsequently the Hypervisor and the coprocessor.
138c2ecf20Sopenharmony_ciThe library is the recommended way for applications to use the
148c2ecf20Sopenharmony_cicoprocessor, and the driver interface is not intended for general use.
158c2ecf20Sopenharmony_ciThis document describes the general flow of the driver, its
168c2ecf20Sopenharmony_cistructures, and its programmatic interface. It also provides example
178c2ecf20Sopenharmony_cicode sufficient to write user or kernel applications that use DAX
188c2ecf20Sopenharmony_cifunctionality.
198c2ecf20Sopenharmony_ci
208c2ecf20Sopenharmony_ciThe user library is open source and available at:
218c2ecf20Sopenharmony_ci
228c2ecf20Sopenharmony_ci    https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
238c2ecf20Sopenharmony_ci
248c2ecf20Sopenharmony_ciThe Hypervisor interface to the coprocessor is described in detail in
258c2ecf20Sopenharmony_cithe accompanying document, dax-hv-api.txt, which is a plain text
268c2ecf20Sopenharmony_ciexcerpt of the (Oracle internal) "UltraSPARC Virtual Machine
278c2ecf20Sopenharmony_ciSpecification" version 3.0.20+15, dated 2017-09-25.
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ci
308c2ecf20Sopenharmony_ciHigh Level Overview
318c2ecf20Sopenharmony_ci===================
328c2ecf20Sopenharmony_ci
338c2ecf20Sopenharmony_ciA coprocessor request is described by a Command Control Block
348c2ecf20Sopenharmony_ci(CCB). The CCB contains an opcode and various parameters. The opcode
358c2ecf20Sopenharmony_cispecifies what operation is to be done, and the parameters specify
368c2ecf20Sopenharmony_cioptions, flags, sizes, and addresses.  The CCB (or an array of CCBs)
378c2ecf20Sopenharmony_ciis passed to the Hypervisor, which handles queueing and scheduling of
388c2ecf20Sopenharmony_cirequests to the available coprocessor execution units. A status code
398c2ecf20Sopenharmony_cireturned indicates if the request was submitted successfully or if
408c2ecf20Sopenharmony_cithere was an error.  One of the addresses given in each CCB is a
418c2ecf20Sopenharmony_cipointer to a "completion area", which is a 128 byte memory block that
428c2ecf20Sopenharmony_ciis written by the coprocessor to provide execution status. No
438c2ecf20Sopenharmony_ciinterrupt is generated upon completion; the completion area must be
448c2ecf20Sopenharmony_cipolled by software to find out when a transaction has finished, but
458c2ecf20Sopenharmony_cithe M7 and later processors provide a mechanism to pause the virtual
468c2ecf20Sopenharmony_ciprocessor until the completion status has been updated by the
478c2ecf20Sopenharmony_cicoprocessor. This is done using the monitored load and mwait
488c2ecf20Sopenharmony_ciinstructions, which are described in more detail later.  The DAX
498c2ecf20Sopenharmony_cicoprocessor was designed so that after a request is submitted, the
508c2ecf20Sopenharmony_cikernel is no longer involved in the processing of it.  The polling is
518c2ecf20Sopenharmony_cidone at the user level, which results in almost zero latency between
528c2ecf20Sopenharmony_cicompletion of a request and resumption of execution of the requesting
538c2ecf20Sopenharmony_cithread.
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_ciAddressing Memory
578c2ecf20Sopenharmony_ci=================
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ciThe kernel does not have access to physical memory in the Sun4v
608c2ecf20Sopenharmony_ciarchitecture, as there is an additional level of memory virtualization
618c2ecf20Sopenharmony_cipresent. This intermediate level is called "real" memory, and the
628c2ecf20Sopenharmony_cikernel treats this as if it were physical.  The Hypervisor handles the
638c2ecf20Sopenharmony_citranslations between real memory and physical so that each logical
648c2ecf20Sopenharmony_cidomain (LDOM) can have a partition of physical memory that is isolated
658c2ecf20Sopenharmony_cifrom that of other LDOMs.  When the kernel sets up a virtual mapping,
668c2ecf20Sopenharmony_ciit specifies a virtual address and the real address to which it should
678c2ecf20Sopenharmony_cibe mapped.
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_ciThe DAX coprocessor can only operate on physical memory, so before a
708c2ecf20Sopenharmony_cirequest can be fed to the coprocessor, all the addresses in a CCB must
718c2ecf20Sopenharmony_cibe converted into physical addresses. The kernel cannot do this since
728c2ecf20Sopenharmony_ciit has no visibility into physical addresses. So a CCB may contain
738c2ecf20Sopenharmony_cieither the virtual or real addresses of the buffers or a combination
748c2ecf20Sopenharmony_ciof them. An "address type" field is available for each address that
758c2ecf20Sopenharmony_cimay be given in the CCB. In all cases, the Hypervisor will translate
768c2ecf20Sopenharmony_ciall the addresses to physical before dispatching to hardware. Address
778c2ecf20Sopenharmony_citranslations are performed using the context of the process initiating
788c2ecf20Sopenharmony_cithe request.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci
818c2ecf20Sopenharmony_ciThe Driver API
828c2ecf20Sopenharmony_ci==============
838c2ecf20Sopenharmony_ci
848c2ecf20Sopenharmony_ciAn application makes requests to the driver via the write() system
858c2ecf20Sopenharmony_cicall, and gets results (if any) via read(). The completion areas are
868c2ecf20Sopenharmony_cimade accessible via mmap(), and are read-only for the application.
878c2ecf20Sopenharmony_ci
888c2ecf20Sopenharmony_ciThe request may either be an immediate command or an array of CCBs to
898c2ecf20Sopenharmony_cibe submitted to the hardware.
908c2ecf20Sopenharmony_ci
918c2ecf20Sopenharmony_ciEach open instance of the device is exclusive to the thread that
928c2ecf20Sopenharmony_ciopened it, and must be used by that thread for all subsequent
938c2ecf20Sopenharmony_cioperations. The driver open function creates a new context for the
948c2ecf20Sopenharmony_cithread and initializes it for use.  This context contains pointers and
958c2ecf20Sopenharmony_civalues used internally by the driver to keep track of submitted
968c2ecf20Sopenharmony_cirequests. The completion area buffer is also allocated, and this is
978c2ecf20Sopenharmony_cilarge enough to contain the completion areas for many concurrent
988c2ecf20Sopenharmony_cirequests.  When the device is closed, any outstanding transactions are
998c2ecf20Sopenharmony_ciflushed and the context is cleaned up.
1008c2ecf20Sopenharmony_ci
1018c2ecf20Sopenharmony_ciOn a DAX1 system (M7), the device will be called "oradax1", while on a
1028c2ecf20Sopenharmony_ciDAX2 system (M8) it will be "oradax2". If an application requires one
1038c2ecf20Sopenharmony_cior the other, it should simply attempt to open the appropriate
1048c2ecf20Sopenharmony_cidevice. Only one of the devices will exist on any given system, so the
1058c2ecf20Sopenharmony_ciname can be used to determine what the platform supports.
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciThe immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
1088c2ecf20Sopenharmony_ciall of these, success is indicated by a return value from write()
1098c2ecf20Sopenharmony_ciequal to the number of bytes given in the call. Otherwise -1 is
1108c2ecf20Sopenharmony_cireturned and errno is set.
1118c2ecf20Sopenharmony_ci
1128c2ecf20Sopenharmony_ciCCB_DEQUEUE
1138c2ecf20Sopenharmony_ci-----------
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ciTells the driver to clean up resources associated with past
1168c2ecf20Sopenharmony_cirequests. Since no interrupt is generated upon the completion of a
1178c2ecf20Sopenharmony_cirequest, the driver must be told when it may reclaim resources.  No
1188c2ecf20Sopenharmony_cifurther status information is returned, so the user should not
1198c2ecf20Sopenharmony_cisubsequently call read().
1208c2ecf20Sopenharmony_ci
1218c2ecf20Sopenharmony_ciCCB_KILL
1228c2ecf20Sopenharmony_ci--------
1238c2ecf20Sopenharmony_ci
1248c2ecf20Sopenharmony_ciKills a CCB during execution. The CCB is guaranteed to not continue
1258c2ecf20Sopenharmony_ciexecuting once this call returns successfully. On success, read() must
1268c2ecf20Sopenharmony_cibe called to retrieve the result of the action.
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ciCCB_INFO
1298c2ecf20Sopenharmony_ci--------
1308c2ecf20Sopenharmony_ci
1318c2ecf20Sopenharmony_ciRetrieves information about a currently executing CCB. Note that some
1328c2ecf20Sopenharmony_ciHypervisors might return 'notfound' when the CCB is in 'inprogress'
1338c2ecf20Sopenharmony_cistate. To ensure a CCB in the 'notfound' state will never be executed,
1348c2ecf20Sopenharmony_ciCCB_KILL must be invoked on that CCB. Upon success, read() must be
1358c2ecf20Sopenharmony_cicalled to retrieve the details of the action.
1368c2ecf20Sopenharmony_ci
1378c2ecf20Sopenharmony_ciSubmission of an array of CCBs for execution
1388c2ecf20Sopenharmony_ci---------------------------------------------
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ciA write() whose length is a multiple of the CCB size is treated as a
1418c2ecf20Sopenharmony_cisubmit operation. The file offset is treated as the index of the
1428c2ecf20Sopenharmony_cicompletion area to use, and may be set via lseek() or using the
1438c2ecf20Sopenharmony_cipwrite() system call. If -1 is returned then errno is set to indicate
1448c2ecf20Sopenharmony_cithe error. Otherwise, the return value is the length of the array that
1458c2ecf20Sopenharmony_ciwas actually accepted by the coprocessor. If the accepted length is
1468c2ecf20Sopenharmony_ciequal to the requested length, then the submission was completely
1478c2ecf20Sopenharmony_cisuccessful and there is no further status needed; hence, the user
1488c2ecf20Sopenharmony_cishould not subsequently call read(). Partial acceptance of the CCB
1498c2ecf20Sopenharmony_ciarray is indicated by a return value less than the requested length,
1508c2ecf20Sopenharmony_ciand read() must be called to retrieve further status information.  The
1518c2ecf20Sopenharmony_cistatus will reflect the error caused by the first CCB that was not
1528c2ecf20Sopenharmony_ciaccepted, and status_data will provide additional data in some cases.
1538c2ecf20Sopenharmony_ci
1548c2ecf20Sopenharmony_ciMMAP
1558c2ecf20Sopenharmony_ci----
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ciThe mmap() function provides access to the completion area allocated
1588c2ecf20Sopenharmony_ciin the driver.  Note that the completion area is not writeable by the
1598c2ecf20Sopenharmony_ciuser process, and the mmap call must not specify PROT_WRITE.
1608c2ecf20Sopenharmony_ci
1618c2ecf20Sopenharmony_ci
1628c2ecf20Sopenharmony_ciCompletion of a Request
1638c2ecf20Sopenharmony_ci=======================
1648c2ecf20Sopenharmony_ci
1658c2ecf20Sopenharmony_ciThe first byte in each completion area is the command status which is
1668c2ecf20Sopenharmony_ciupdated by the coprocessor hardware. Software may take advantage of
1678c2ecf20Sopenharmony_cinew M7/M8 processor capabilities to efficiently poll this status byte.
1688c2ecf20Sopenharmony_ciFirst, a "monitored load" is achieved via a Load from Alternate Space
1698c2ecf20Sopenharmony_ci(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY).  Second, a
1708c2ecf20Sopenharmony_ci"monitored wait" is achieved via the mwait instruction (a write to
1718c2ecf20Sopenharmony_ci%asr28). This instruction is like pause in that it suspends execution
1728c2ecf20Sopenharmony_ciof the virtual processor for the given number of nanoseconds, but in
1738c2ecf20Sopenharmony_ciaddition will terminate early when one of several events occur. If the
1748c2ecf20Sopenharmony_ciblock of data containing the monitored location is modified, then the
1758c2ecf20Sopenharmony_cimwait terminates. This causes software to resume execution immediately
1768c2ecf20Sopenharmony_ci(without a context switch or kernel to user transition) after a
1778c2ecf20Sopenharmony_citransaction completes. Thus the latency between transaction completion
1788c2ecf20Sopenharmony_ciand resumption of execution may be just a few nanoseconds.
1798c2ecf20Sopenharmony_ci
1808c2ecf20Sopenharmony_ci
1818c2ecf20Sopenharmony_ciApplication Life Cycle of a DAX Submission
1828c2ecf20Sopenharmony_ci==========================================
1838c2ecf20Sopenharmony_ci
1848c2ecf20Sopenharmony_ci - open dax device
1858c2ecf20Sopenharmony_ci - call mmap() to get the completion area address
1868c2ecf20Sopenharmony_ci - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
1878c2ecf20Sopenharmony_ci - submit CCB via write() or pwrite()
1888c2ecf20Sopenharmony_ci - go into a loop executing monitored load + monitored wait and
1898c2ecf20Sopenharmony_ci   terminate when the command status indicates the request is complete
1908c2ecf20Sopenharmony_ci   (CCB_KILL or CCB_INFO may be used any time as necessary)
1918c2ecf20Sopenharmony_ci - perform a CCB_DEQUEUE
1928c2ecf20Sopenharmony_ci - call munmap() for completion area
1938c2ecf20Sopenharmony_ci - close the dax device
1948c2ecf20Sopenharmony_ci
1958c2ecf20Sopenharmony_ci
1968c2ecf20Sopenharmony_ciMemory Constraints
1978c2ecf20Sopenharmony_ci==================
1988c2ecf20Sopenharmony_ci
1998c2ecf20Sopenharmony_ciThe DAX hardware operates only on physical addresses. Therefore, it is
2008c2ecf20Sopenharmony_cinot aware of virtual memory mappings and the discontiguities that may
2018c2ecf20Sopenharmony_ciexist in the physical memory that a virtual buffer maps to. There is
2028c2ecf20Sopenharmony_cino I/O TLB or any scatter/gather mechanism. All buffers, whether input
2038c2ecf20Sopenharmony_cior output, must reside in a physically contiguous region of memory.
2048c2ecf20Sopenharmony_ci
2058c2ecf20Sopenharmony_ciThe Hypervisor translates all addresses within a CCB to physical
2068c2ecf20Sopenharmony_cibefore handing off the CCB to DAX. The Hypervisor determines the
2078c2ecf20Sopenharmony_civirtual page size for each virtual address given, and uses this to
2088c2ecf20Sopenharmony_ciprogram a size limit for each address. This prevents the coprocessor
2098c2ecf20Sopenharmony_cifrom reading or writing beyond the bound of the virtual page, even
2108c2ecf20Sopenharmony_cithough it is accessing physical memory directly. A simpler way of
2118c2ecf20Sopenharmony_cisaying this is that a DAX operation will never "cross" a virtual page
2128c2ecf20Sopenharmony_ciboundary. If an 8k virtual page is used, then the data is strictly
2138c2ecf20Sopenharmony_cilimited to 8k. If a user's buffer is larger than 8k, then a larger
2148c2ecf20Sopenharmony_cipage size must be used, or the transaction size will be truncated to
2158c2ecf20Sopenharmony_ci8k.
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_ciHuge pages. A user may allocate huge pages using standard interfaces.
2188c2ecf20Sopenharmony_ciMemory buffers residing on huge pages may be used to achieve much
2198c2ecf20Sopenharmony_cilarger DAX transaction sizes, but the rules must still be followed,
2208c2ecf20Sopenharmony_ciand no transaction will cross a page boundary, even a huge page.  A
2218c2ecf20Sopenharmony_cimajor caveat is that Linux on Sparc presents 8Mb as one of the huge
2228c2ecf20Sopenharmony_cipage sizes. Sparc does not actually provide a 8Mb hardware page size,
2238c2ecf20Sopenharmony_ciand this size is synthesized by pasting together two 4Mb pages. The
2248c2ecf20Sopenharmony_cireasons for this are historical, and it creates an issue because only
2258c2ecf20Sopenharmony_cihalf of this 8Mb page can actually be used for any given buffer in a
2268c2ecf20Sopenharmony_ciDAX request, and it must be either the first half or the second half;
2278c2ecf20Sopenharmony_ciit cannot be a 4Mb chunk in the middle, since that crosses a
2288c2ecf20Sopenharmony_ci(hardware) page boundary. Note that this entire issue may be hidden by
2298c2ecf20Sopenharmony_cihigher level libraries.
2308c2ecf20Sopenharmony_ci
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ciCCB Structure
2338c2ecf20Sopenharmony_ci-------------
2348c2ecf20Sopenharmony_ciA CCB is an array of 8 64-bit words. Several of these words provide
2358c2ecf20Sopenharmony_cicommand opcodes, parameters, flags, etc., and the rest are addresses
2368c2ecf20Sopenharmony_cifor the completion area, output buffer, and various inputs::
2378c2ecf20Sopenharmony_ci
2388c2ecf20Sopenharmony_ci   struct ccb {
2398c2ecf20Sopenharmony_ci       u64   control;
2408c2ecf20Sopenharmony_ci       u64   completion;
2418c2ecf20Sopenharmony_ci       u64   input0;
2428c2ecf20Sopenharmony_ci       u64   access;
2438c2ecf20Sopenharmony_ci       u64   input1;
2448c2ecf20Sopenharmony_ci       u64   op_data;
2458c2ecf20Sopenharmony_ci       u64   output;
2468c2ecf20Sopenharmony_ci       u64   table;
2478c2ecf20Sopenharmony_ci   };
2488c2ecf20Sopenharmony_ci
2498c2ecf20Sopenharmony_ciSee libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
2508c2ecf20Sopenharmony_cieach of these fields, and see dax-hv-api.txt for a complete description
2518c2ecf20Sopenharmony_ciof the Hypervisor API available to the guest OS (ie, Linux kernel).
2528c2ecf20Sopenharmony_ci
2538c2ecf20Sopenharmony_ciThe first word (control) is examined by the driver for the following:
2548c2ecf20Sopenharmony_ci - CCB version, which must be consistent with hardware version
2558c2ecf20Sopenharmony_ci - Opcode, which must be one of the documented allowable commands
2568c2ecf20Sopenharmony_ci - Address types, which must be set to "virtual" for all the addresses
2578c2ecf20Sopenharmony_ci   given by the user, thereby ensuring that the application can
2588c2ecf20Sopenharmony_ci   only access memory that it owns
2598c2ecf20Sopenharmony_ci
2608c2ecf20Sopenharmony_ci
2618c2ecf20Sopenharmony_ciExample Code
2628c2ecf20Sopenharmony_ci============
2638c2ecf20Sopenharmony_ci
2648c2ecf20Sopenharmony_ciThe DAX is accessible to both user and kernel code.  The kernel code
2658c2ecf20Sopenharmony_cican make hypercalls directly while the user code must use wrappers
2668c2ecf20Sopenharmony_ciprovided by the driver. The setup of the CCB is nearly identical for
2678c2ecf20Sopenharmony_ciboth; the only difference is in preparation of the completion area. An
2688c2ecf20Sopenharmony_ciexample of user code is given now, with kernel code afterwards.
2698c2ecf20Sopenharmony_ci
2708c2ecf20Sopenharmony_ciIn order to program using the driver API, the file
2718c2ecf20Sopenharmony_ciarch/sparc/include/uapi/asm/oradax.h must be included.
2728c2ecf20Sopenharmony_ci
2738c2ecf20Sopenharmony_ciFirst, the proper device must be opened. For M7 it will be
2748c2ecf20Sopenharmony_ci/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
2758c2ecf20Sopenharmony_ciprocedure is to attempt to open both, as only one will succeed::
2768c2ecf20Sopenharmony_ci
2778c2ecf20Sopenharmony_ci	fd = open("/dev/oradax1", O_RDWR);
2788c2ecf20Sopenharmony_ci	if (fd < 0)
2798c2ecf20Sopenharmony_ci		fd = open("/dev/oradax2", O_RDWR);
2808c2ecf20Sopenharmony_ci	if (fd < 0)
2818c2ecf20Sopenharmony_ci	       /* No DAX found */
2828c2ecf20Sopenharmony_ci
2838c2ecf20Sopenharmony_ciNext, the completion area must be mapped::
2848c2ecf20Sopenharmony_ci
2858c2ecf20Sopenharmony_ci      completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
2868c2ecf20Sopenharmony_ci
2878c2ecf20Sopenharmony_ciAll input and output buffers must be fully contained in one hardware
2888c2ecf20Sopenharmony_cipage, since as explained above, the DAX is strictly constrained by
2898c2ecf20Sopenharmony_civirtual page boundaries.  In addition, the output buffer must be
2908c2ecf20Sopenharmony_ci64-byte aligned and its size must be a multiple of 64 bytes because
2918c2ecf20Sopenharmony_cithe coprocessor writes in units of cache lines.
2928c2ecf20Sopenharmony_ci
2938c2ecf20Sopenharmony_ciThis example demonstrates the DAX Scan command, which takes as input a
2948c2ecf20Sopenharmony_civector and a match value, and produces a bitmap as the output. For
2958c2ecf20Sopenharmony_cieach input element that matches the value, the corresponding bit is
2968c2ecf20Sopenharmony_ciset in the output.
2978c2ecf20Sopenharmony_ci
2988c2ecf20Sopenharmony_ciIn this example, the input vector consists of a series of single bits,
2998c2ecf20Sopenharmony_ciand the match value is 0. So each 0 bit in the input will produce a 1
3008c2ecf20Sopenharmony_ciin the output, and vice versa, which produces an output bitmap which
3018c2ecf20Sopenharmony_ciis the input bitmap inverted.
3028c2ecf20Sopenharmony_ci
3038c2ecf20Sopenharmony_ciFor details of all the parameters and bits used in this CCB, please
3048c2ecf20Sopenharmony_cirefer to section 36.2.1.3 of the DAX Hypervisor API document, which
3058c2ecf20Sopenharmony_cidescribes the Scan command in detail::
3068c2ecf20Sopenharmony_ci
3078c2ecf20Sopenharmony_ci	ccb->control =       /* Table 36.1, CCB Header Format */
3088c2ecf20Sopenharmony_ci		  (2L << 48)     /* command = Scan Value */
3098c2ecf20Sopenharmony_ci		| (3L << 40)     /* output address type = primary virtual */
3108c2ecf20Sopenharmony_ci		| (3L << 34)     /* primary input address type = primary virtual */
3118c2ecf20Sopenharmony_ci		             /* Section 36.2.1, Query CCB Command Formats */
3128c2ecf20Sopenharmony_ci		| (1 << 28)     /* 36.2.1.1.1 primary input format = fixed width bit packed */
3138c2ecf20Sopenharmony_ci		| (0 << 23)     /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
3148c2ecf20Sopenharmony_ci		| (8 << 10)     /* 36.2.1.1.6 output format = bit vector */
3158c2ecf20Sopenharmony_ci		| (0 <<  5)	/* 36.2.1.3 First scan criteria size = 0 (1 byte) */
3168c2ecf20Sopenharmony_ci		| (31 << 0);	/* 36.2.1.3 Disable second scan criteria */
3178c2ecf20Sopenharmony_ci
3188c2ecf20Sopenharmony_ci	ccb->completion = 0;    /* Completion area address, to be filled in by driver */
3198c2ecf20Sopenharmony_ci
3208c2ecf20Sopenharmony_ci	ccb->input0 = (unsigned long) input; /* primary input address */
3218c2ecf20Sopenharmony_ci
3228c2ecf20Sopenharmony_ci	ccb->access =       /* Section 36.2.1.2, Data Access Control */
3238c2ecf20Sopenharmony_ci		  (2 << 24)    /* Primary input length format = bits */
3248c2ecf20Sopenharmony_ci		| (nbits - 1); /* number of bits in primary input stream, minus 1 */
3258c2ecf20Sopenharmony_ci
3268c2ecf20Sopenharmony_ci	ccb->input1 = 0;       /* secondary input address, unused */
3278c2ecf20Sopenharmony_ci
3288c2ecf20Sopenharmony_ci	ccb->op_data = 0;      /* scan criteria (value to be matched) */
3298c2ecf20Sopenharmony_ci
3308c2ecf20Sopenharmony_ci	ccb->output = (unsigned long) output;	/* output address */
3318c2ecf20Sopenharmony_ci
3328c2ecf20Sopenharmony_ci	ccb->table = 0;	       /* table address, unused */
3338c2ecf20Sopenharmony_ci
3348c2ecf20Sopenharmony_ciThe CCB submission is a write() or pwrite() system call to the
3358c2ecf20Sopenharmony_cidriver. If the call fails, then a read() must be used to retrieve the
3368c2ecf20Sopenharmony_cistatus::
3378c2ecf20Sopenharmony_ci
3388c2ecf20Sopenharmony_ci	if (pwrite(fd, ccb, 64, 0) != 64) {
3398c2ecf20Sopenharmony_ci		struct ccb_exec_result status;
3408c2ecf20Sopenharmony_ci		read(fd, &status, sizeof(status));
3418c2ecf20Sopenharmony_ci		/* bail out */
3428c2ecf20Sopenharmony_ci	}
3438c2ecf20Sopenharmony_ci
3448c2ecf20Sopenharmony_ciAfter a successful submission of the CCB, the completion area may be
3458c2ecf20Sopenharmony_cipolled to determine when the DAX is finished. Detailed information on
3468c2ecf20Sopenharmony_cithe contents of the completion area can be found in section 36.2.2 of
3478c2ecf20Sopenharmony_cithe DAX HV API document::
3488c2ecf20Sopenharmony_ci
3498c2ecf20Sopenharmony_ci	while (1) {
3508c2ecf20Sopenharmony_ci		/* Monitored Load */
3518c2ecf20Sopenharmony_ci		__asm__ __volatile__("lduba [%1] 0x84, %0\n"
3528c2ecf20Sopenharmony_ci				     : "=r" (status)
3538c2ecf20Sopenharmony_ci				     : "r"  (completion_area));
3548c2ecf20Sopenharmony_ci
3558c2ecf20Sopenharmony_ci		if (status)	     /* 0 indicates command in progress */
3568c2ecf20Sopenharmony_ci			break;
3578c2ecf20Sopenharmony_ci
3588c2ecf20Sopenharmony_ci		/* MWAIT */
3598c2ecf20Sopenharmony_ci		__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
3608c2ecf20Sopenharmony_ci	}
3618c2ecf20Sopenharmony_ci
3628c2ecf20Sopenharmony_ciA completion area status of 1 indicates successful completion of the
3638c2ecf20Sopenharmony_ciCCB and validity of the output bitmap, which may be used immediately.
3648c2ecf20Sopenharmony_ciAll other non-zero values indicate error conditions which are
3658c2ecf20Sopenharmony_cidescribed in section 36.2.2::
3668c2ecf20Sopenharmony_ci
3678c2ecf20Sopenharmony_ci	if (completion_area[0] != 1) {	/* section 36.2.2, 1 = command ran and succeeded */
3688c2ecf20Sopenharmony_ci		/* completion_area[0] contains the completion status */
3698c2ecf20Sopenharmony_ci		/* completion_area[1] contains an error code, see 36.2.2 */
3708c2ecf20Sopenharmony_ci	}
3718c2ecf20Sopenharmony_ci
3728c2ecf20Sopenharmony_ciAfter the completion area has been processed, the driver must be
3738c2ecf20Sopenharmony_cinotified that it can release any resources associated with the
3748c2ecf20Sopenharmony_cirequest. This is done via the dequeue operation::
3758c2ecf20Sopenharmony_ci
3768c2ecf20Sopenharmony_ci	struct dax_command cmd;
3778c2ecf20Sopenharmony_ci	cmd.command = CCB_DEQUEUE;
3788c2ecf20Sopenharmony_ci	if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
3798c2ecf20Sopenharmony_ci		/* bail out */
3808c2ecf20Sopenharmony_ci	}
3818c2ecf20Sopenharmony_ci
3828c2ecf20Sopenharmony_ciFinally, normal program cleanup should be done, i.e., unmapping
3838c2ecf20Sopenharmony_cicompletion area, closing the dax device, freeing memory etc.
3848c2ecf20Sopenharmony_ci
3858c2ecf20Sopenharmony_ciKernel example
3868c2ecf20Sopenharmony_ci--------------
3878c2ecf20Sopenharmony_ci
3888c2ecf20Sopenharmony_ciThe only difference in using the DAX in kernel code is the treatment
3898c2ecf20Sopenharmony_ciof the completion area. Unlike user applications which mmap the
3908c2ecf20Sopenharmony_cicompletion area allocated by the driver, kernel code must allocate its
3918c2ecf20Sopenharmony_ciown memory to use for the completion area, and this address and its
3928c2ecf20Sopenharmony_citype must be given in the CCB::
3938c2ecf20Sopenharmony_ci
3948c2ecf20Sopenharmony_ci	ccb->control |=      /* Table 36.1, CCB Header Format */
3958c2ecf20Sopenharmony_ci	        (3L << 32);     /* completion area address type = primary virtual */
3968c2ecf20Sopenharmony_ci
3978c2ecf20Sopenharmony_ci	ccb->completion = (unsigned long) completion_area;   /* Completion area address */
3988c2ecf20Sopenharmony_ci
3998c2ecf20Sopenharmony_ciThe dax submit hypercall is made directly. The flags used in the
4008c2ecf20Sopenharmony_ciccb_submit call are documented in the DAX HV API in section 36.3.1/
4018c2ecf20Sopenharmony_ci
4028c2ecf20Sopenharmony_ci::
4038c2ecf20Sopenharmony_ci
4048c2ecf20Sopenharmony_ci  #include <asm/hypervisor.h>
4058c2ecf20Sopenharmony_ci
4068c2ecf20Sopenharmony_ci	hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
4078c2ecf20Sopenharmony_ci				 HV_CCB_QUERY_CMD |
4088c2ecf20Sopenharmony_ci				 HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
4098c2ecf20Sopenharmony_ci				 HV_CCB_VA_PRIVILEGED,
4108c2ecf20Sopenharmony_ci				 0, &bytes_accepted, &status_data);
4118c2ecf20Sopenharmony_ci
4128c2ecf20Sopenharmony_ci	if (hv_rv != HV_EOK) {
4138c2ecf20Sopenharmony_ci		/* hv_rv is an error code, status_data contains */
4148c2ecf20Sopenharmony_ci		/* potential additional status, see 36.3.1.1 */
4158c2ecf20Sopenharmony_ci	}
4168c2ecf20Sopenharmony_ci
4178c2ecf20Sopenharmony_ciAfter the submission, the completion area polling code is identical to
4188c2ecf20Sopenharmony_cithat in user land::
4198c2ecf20Sopenharmony_ci
4208c2ecf20Sopenharmony_ci	while (1) {
4218c2ecf20Sopenharmony_ci		/* Monitored Load */
4228c2ecf20Sopenharmony_ci		__asm__ __volatile__("lduba [%1] 0x84, %0\n"
4238c2ecf20Sopenharmony_ci				     : "=r" (status)
4248c2ecf20Sopenharmony_ci				     : "r"  (completion_area));
4258c2ecf20Sopenharmony_ci
4268c2ecf20Sopenharmony_ci		if (status)	     /* 0 indicates command in progress */
4278c2ecf20Sopenharmony_ci			break;
4288c2ecf20Sopenharmony_ci
4298c2ecf20Sopenharmony_ci		/* MWAIT */
4308c2ecf20Sopenharmony_ci		__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
4318c2ecf20Sopenharmony_ci	}
4328c2ecf20Sopenharmony_ci
4338c2ecf20Sopenharmony_ci	if (completion_area[0] != 1) {	/* section 36.2.2, 1 = command ran and succeeded */
4348c2ecf20Sopenharmony_ci		/* completion_area[0] contains the completion status */
4358c2ecf20Sopenharmony_ci		/* completion_area[1] contains an error code, see 36.2.2 */
4368c2ecf20Sopenharmony_ci	}
4378c2ecf20Sopenharmony_ci
4388c2ecf20Sopenharmony_ciThe output bitmap is ready for consumption immediately after the
4398c2ecf20Sopenharmony_cicompletion status indicates success.
4408c2ecf20Sopenharmony_ci
4418c2ecf20Sopenharmony_ciExcer[t from UltraSPARC Virtual Machine Specification
4428c2ecf20Sopenharmony_ci=====================================================
4438c2ecf20Sopenharmony_ci
4448c2ecf20Sopenharmony_ci .. include:: dax-hv-api.txt
4458c2ecf20Sopenharmony_ci    :literal:
446