162306a36Sopenharmony_ci================================== 262306a36Sopenharmony_ciVDUSE - "vDPA Device in Userspace" 362306a36Sopenharmony_ci================================== 462306a36Sopenharmony_ci 562306a36Sopenharmony_civDPA (virtio data path acceleration) device is a device that uses a 662306a36Sopenharmony_cidatapath which complies with the virtio specifications with vendor 762306a36Sopenharmony_cispecific control path. vDPA devices can be both physically located on 862306a36Sopenharmony_cithe hardware or emulated by software. VDUSE is a framework that makes it 962306a36Sopenharmony_cipossible to implement software-emulated vDPA devices in userspace. And 1062306a36Sopenharmony_cito make the device emulation more secure, the emulated vDPA device's 1162306a36Sopenharmony_cicontrol path is handled in the kernel and only the data path is 1262306a36Sopenharmony_ciimplemented in the userspace. 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ciNote that only virtio block device is supported by VDUSE framework now, 1562306a36Sopenharmony_ciwhich can reduce security risks when the userspace process that implements 1662306a36Sopenharmony_cithe data path is run by an unprivileged user. The support for other device 1762306a36Sopenharmony_citypes can be added after the security issue of corresponding device driver 1862306a36Sopenharmony_ciis clarified or fixed in the future. 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ciCreate/Destroy VDUSE devices 2162306a36Sopenharmony_ci---------------------------- 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ciVDUSE devices are created as follows: 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ci1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on 2662306a36Sopenharmony_ci /dev/vduse/control. 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ci2. Setup each virtqueue with ioctl(VDUSE_VQ_SETUP) on /dev/vduse/$NAME. 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ci3. Begin processing VDUSE messages from /dev/vduse/$NAME. The first 3162306a36Sopenharmony_ci messages will arrive while attaching the VDUSE instance to vDPA bus. 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ci4. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE 3462306a36Sopenharmony_ci instance to vDPA bus. 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ciVDUSE devices are destroyed as follows: 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ci1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE 3962306a36Sopenharmony_ci instance from vDPA bus. 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ci2. Close the file descriptor referring to /dev/vduse/$NAME. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ci3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on 4462306a36Sopenharmony_ci /dev/vduse/control. 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ciThe netlink messages can be sent via vdpa tool in iproute2 or use the 4762306a36Sopenharmony_cibelow sample codes: 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ci.. code-block:: c 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci static int netlink_add_vduse(const char *name, enum vdpa_command cmd) 5262306a36Sopenharmony_ci { 5362306a36Sopenharmony_ci struct nl_sock *nlsock; 5462306a36Sopenharmony_ci struct nl_msg *msg; 5562306a36Sopenharmony_ci int famid; 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ci nlsock = nl_socket_alloc(); 5862306a36Sopenharmony_ci if (!nlsock) 5962306a36Sopenharmony_ci return -ENOMEM; 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ci if (genl_connect(nlsock)) 6262306a36Sopenharmony_ci goto free_sock; 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME); 6562306a36Sopenharmony_ci if (famid < 0) 6662306a36Sopenharmony_ci goto close_sock; 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ci msg = nlmsg_alloc(); 6962306a36Sopenharmony_ci if (!msg) 7062306a36Sopenharmony_ci goto close_sock; 7162306a36Sopenharmony_ci 7262306a36Sopenharmony_ci if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0)) 7362306a36Sopenharmony_ci goto nla_put_failure; 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name); 7662306a36Sopenharmony_ci if (cmd == VDPA_CMD_DEV_NEW) 7762306a36Sopenharmony_ci NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse"); 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ci if (nl_send_sync(nlsock, msg)) 8062306a36Sopenharmony_ci goto close_sock; 8162306a36Sopenharmony_ci 8262306a36Sopenharmony_ci nl_close(nlsock); 8362306a36Sopenharmony_ci nl_socket_free(nlsock); 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ci return 0; 8662306a36Sopenharmony_ci nla_put_failure: 8762306a36Sopenharmony_ci nlmsg_free(msg); 8862306a36Sopenharmony_ci close_sock: 8962306a36Sopenharmony_ci nl_close(nlsock); 9062306a36Sopenharmony_ci free_sock: 9162306a36Sopenharmony_ci nl_socket_free(nlsock); 9262306a36Sopenharmony_ci return -1; 9362306a36Sopenharmony_ci } 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ciHow VDUSE works 9662306a36Sopenharmony_ci--------------- 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciAs mentioned above, a VDUSE device is created by ioctl(VDUSE_CREATE_DEV) on 9962306a36Sopenharmony_ci/dev/vduse/control. With this ioctl, userspace can specify some basic configuration 10062306a36Sopenharmony_cisuch as device name (uniquely identify a VDUSE device), virtio features, virtio 10162306a36Sopenharmony_ciconfiguration space, the number of virtqueues and so on for this emulated device. 10262306a36Sopenharmony_ciThen a char device interface (/dev/vduse/$NAME) is exported to userspace for device 10362306a36Sopenharmony_ciemulation. Userspace can use the VDUSE_VQ_SETUP ioctl on /dev/vduse/$NAME to 10462306a36Sopenharmony_ciadd per-virtqueue configuration such as the max size of virtqueue to the device. 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ciAfter the initialization, the VDUSE device can be attached to vDPA bus via 10762306a36Sopenharmony_cithe VDPA_CMD_DEV_NEW netlink message. Userspace needs to read()/write() on 10862306a36Sopenharmony_ci/dev/vduse/$NAME to receive/reply some control messages from/to VDUSE kernel 10962306a36Sopenharmony_cimodule as follows: 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ci.. code-block:: c 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ci static int vduse_message_handler(int dev_fd) 11462306a36Sopenharmony_ci { 11562306a36Sopenharmony_ci int len; 11662306a36Sopenharmony_ci struct vduse_dev_request req; 11762306a36Sopenharmony_ci struct vduse_dev_response resp; 11862306a36Sopenharmony_ci 11962306a36Sopenharmony_ci len = read(dev_fd, &req, sizeof(req)); 12062306a36Sopenharmony_ci if (len != sizeof(req)) 12162306a36Sopenharmony_ci return -1; 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ci resp.request_id = req.request_id; 12462306a36Sopenharmony_ci 12562306a36Sopenharmony_ci switch (req.type) { 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ci /* handle different types of messages */ 12862306a36Sopenharmony_ci 12962306a36Sopenharmony_ci } 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ci len = write(dev_fd, &resp, sizeof(resp)); 13262306a36Sopenharmony_ci if (len != sizeof(resp)) 13362306a36Sopenharmony_ci return -1; 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci return 0; 13662306a36Sopenharmony_ci } 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ciThere are now three types of messages introduced by VDUSE framework: 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ci- VDUSE_GET_VQ_STATE: Get the state for virtqueue, userspace should return 14162306a36Sopenharmony_ci avail index for split virtqueue or the device/driver ring wrap counters and 14262306a36Sopenharmony_ci the avail and used index for packed virtqueue. 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ci- VDUSE_SET_STATUS: Set the device status, userspace should follow 14562306a36Sopenharmony_ci the virtio spec: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html 14662306a36Sopenharmony_ci to process this message. For example, fail to set the FEATURES_OK device 14762306a36Sopenharmony_ci status bit if the device can not accept the negotiated virtio features 14862306a36Sopenharmony_ci get from the VDUSE_DEV_GET_FEATURES ioctl. 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for specified 15162306a36Sopenharmony_ci IOVA range, userspace should firstly remove the old mapping, then setup the new 15262306a36Sopenharmony_ci mapping via the VDUSE_IOTLB_GET_FD ioctl. 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciAfter DRIVER_OK status bit is set via the VDUSE_SET_STATUS message, userspace is 15562306a36Sopenharmony_ciable to start the dataplane processing as follows: 15662306a36Sopenharmony_ci 15762306a36Sopenharmony_ci1. Get the specified virtqueue's information with the VDUSE_VQ_GET_INFO ioctl, 15862306a36Sopenharmony_ci including the size, the IOVAs of descriptor table, available ring and used ring, 15962306a36Sopenharmony_ci the state and the ready status. 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ci2. Pass the above IOVAs to the VDUSE_IOTLB_GET_FD ioctl so that those IOVA regions 16262306a36Sopenharmony_ci can be mapped into userspace. Some sample codes is shown below: 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ci.. code-block:: c 16562306a36Sopenharmony_ci 16662306a36Sopenharmony_ci static int perm_to_prot(uint8_t perm) 16762306a36Sopenharmony_ci { 16862306a36Sopenharmony_ci int prot = 0; 16962306a36Sopenharmony_ci 17062306a36Sopenharmony_ci switch (perm) { 17162306a36Sopenharmony_ci case VDUSE_ACCESS_WO: 17262306a36Sopenharmony_ci prot |= PROT_WRITE; 17362306a36Sopenharmony_ci break; 17462306a36Sopenharmony_ci case VDUSE_ACCESS_RO: 17562306a36Sopenharmony_ci prot |= PROT_READ; 17662306a36Sopenharmony_ci break; 17762306a36Sopenharmony_ci case VDUSE_ACCESS_RW: 17862306a36Sopenharmony_ci prot |= PROT_READ | PROT_WRITE; 17962306a36Sopenharmony_ci break; 18062306a36Sopenharmony_ci } 18162306a36Sopenharmony_ci 18262306a36Sopenharmony_ci return prot; 18362306a36Sopenharmony_ci } 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ci static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len) 18662306a36Sopenharmony_ci { 18762306a36Sopenharmony_ci int fd; 18862306a36Sopenharmony_ci void *addr; 18962306a36Sopenharmony_ci size_t size; 19062306a36Sopenharmony_ci struct vduse_iotlb_entry entry; 19162306a36Sopenharmony_ci 19262306a36Sopenharmony_ci entry.start = iova; 19362306a36Sopenharmony_ci entry.last = iova; 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ci /* 19662306a36Sopenharmony_ci * Find the first IOVA region that overlaps with the specified 19762306a36Sopenharmony_ci * range [start, last] and return the corresponding file descriptor. 19862306a36Sopenharmony_ci */ 19962306a36Sopenharmony_ci fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry); 20062306a36Sopenharmony_ci if (fd < 0) 20162306a36Sopenharmony_ci return NULL; 20262306a36Sopenharmony_ci 20362306a36Sopenharmony_ci size = entry.last - entry.start + 1; 20462306a36Sopenharmony_ci *len = entry.last - iova + 1; 20562306a36Sopenharmony_ci addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED, 20662306a36Sopenharmony_ci fd, entry.offset); 20762306a36Sopenharmony_ci close(fd); 20862306a36Sopenharmony_ci if (addr == MAP_FAILED) 20962306a36Sopenharmony_ci return NULL; 21062306a36Sopenharmony_ci 21162306a36Sopenharmony_ci /* 21262306a36Sopenharmony_ci * Using some data structures such as linked list to store 21362306a36Sopenharmony_ci * the iotlb mapping. The munmap(2) should be called for the 21462306a36Sopenharmony_ci * cached mapping when the corresponding VDUSE_UPDATE_IOTLB 21562306a36Sopenharmony_ci * message is received or the device is reset. 21662306a36Sopenharmony_ci */ 21762306a36Sopenharmony_ci 21862306a36Sopenharmony_ci return addr + iova - entry.start; 21962306a36Sopenharmony_ci } 22062306a36Sopenharmony_ci 22162306a36Sopenharmony_ci3. Setup the kick eventfd for the specified virtqueues with the VDUSE_VQ_SETUP_KICKFD 22262306a36Sopenharmony_ci ioctl. The kick eventfd is used by VDUSE kernel module to notify userspace to 22362306a36Sopenharmony_ci consume the available ring. This is optional since userspace can choose to poll the 22462306a36Sopenharmony_ci available ring instead. 22562306a36Sopenharmony_ci 22662306a36Sopenharmony_ci4. Listen to the kick eventfd (optional) and consume the available ring. The buffer 22762306a36Sopenharmony_ci described by the descriptors in the descriptor table should be also mapped into 22862306a36Sopenharmony_ci userspace via the VDUSE_IOTLB_GET_FD ioctl before accessing. 22962306a36Sopenharmony_ci 23062306a36Sopenharmony_ci5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl 23162306a36Sopenharmony_ci after the used ring is filled. 23262306a36Sopenharmony_ci 23362306a36Sopenharmony_ciFor more details on the uAPI, please see include/uapi/linux/vduse.h. 234