162306a36Sopenharmony_ci==================================
262306a36Sopenharmony_ciVDUSE - "vDPA Device in Userspace"
362306a36Sopenharmony_ci==================================
462306a36Sopenharmony_ci
562306a36Sopenharmony_civDPA (virtio data path acceleration) device is a device that uses a
662306a36Sopenharmony_cidatapath which complies with the virtio specifications with vendor
762306a36Sopenharmony_cispecific control path. vDPA devices can be both physically located on
862306a36Sopenharmony_cithe hardware or emulated by software. VDUSE is a framework that makes it
962306a36Sopenharmony_cipossible to implement software-emulated vDPA devices in userspace. And
1062306a36Sopenharmony_cito make the device emulation more secure, the emulated vDPA device's
1162306a36Sopenharmony_cicontrol path is handled in the kernel and only the data path is
1262306a36Sopenharmony_ciimplemented in the userspace.
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ciNote that only virtio block device is supported by VDUSE framework now,
1562306a36Sopenharmony_ciwhich can reduce security risks when the userspace process that implements
1662306a36Sopenharmony_cithe data path is run by an unprivileged user. The support for other device
1762306a36Sopenharmony_citypes can be added after the security issue of corresponding device driver
1862306a36Sopenharmony_ciis clarified or fixed in the future.
1962306a36Sopenharmony_ci
2062306a36Sopenharmony_ciCreate/Destroy VDUSE devices
2162306a36Sopenharmony_ci----------------------------
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ciVDUSE devices are created as follows:
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ci1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
2662306a36Sopenharmony_ci   /dev/vduse/control.
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ci2. Setup each virtqueue with ioctl(VDUSE_VQ_SETUP) on /dev/vduse/$NAME.
2962306a36Sopenharmony_ci
3062306a36Sopenharmony_ci3. Begin processing VDUSE messages from /dev/vduse/$NAME. The first
3162306a36Sopenharmony_ci   messages will arrive while attaching the VDUSE instance to vDPA bus.
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ci4. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE
3462306a36Sopenharmony_ci   instance to vDPA bus.
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciVDUSE devices are destroyed as follows:
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ci1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE
3962306a36Sopenharmony_ci   instance from vDPA bus.
4062306a36Sopenharmony_ci
4162306a36Sopenharmony_ci2. Close the file descriptor referring to /dev/vduse/$NAME.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ci3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on
4462306a36Sopenharmony_ci   /dev/vduse/control.
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ciThe netlink messages can be sent via vdpa tool in iproute2 or use the
4762306a36Sopenharmony_cibelow sample codes:
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ci.. code-block:: c
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci	static int netlink_add_vduse(const char *name, enum vdpa_command cmd)
5262306a36Sopenharmony_ci	{
5362306a36Sopenharmony_ci		struct nl_sock *nlsock;
5462306a36Sopenharmony_ci		struct nl_msg *msg;
5562306a36Sopenharmony_ci		int famid;
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ci		nlsock = nl_socket_alloc();
5862306a36Sopenharmony_ci		if (!nlsock)
5962306a36Sopenharmony_ci			return -ENOMEM;
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ci		if (genl_connect(nlsock))
6262306a36Sopenharmony_ci			goto free_sock;
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci		famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME);
6562306a36Sopenharmony_ci		if (famid < 0)
6662306a36Sopenharmony_ci			goto close_sock;
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci		msg = nlmsg_alloc();
6962306a36Sopenharmony_ci		if (!msg)
7062306a36Sopenharmony_ci			goto close_sock;
7162306a36Sopenharmony_ci
7262306a36Sopenharmony_ci		if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0))
7362306a36Sopenharmony_ci			goto nla_put_failure;
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ci		NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name);
7662306a36Sopenharmony_ci		if (cmd == VDPA_CMD_DEV_NEW)
7762306a36Sopenharmony_ci			NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse");
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ci		if (nl_send_sync(nlsock, msg))
8062306a36Sopenharmony_ci			goto close_sock;
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ci		nl_close(nlsock);
8362306a36Sopenharmony_ci		nl_socket_free(nlsock);
8462306a36Sopenharmony_ci
8562306a36Sopenharmony_ci		return 0;
8662306a36Sopenharmony_ci	nla_put_failure:
8762306a36Sopenharmony_ci		nlmsg_free(msg);
8862306a36Sopenharmony_ci	close_sock:
8962306a36Sopenharmony_ci		nl_close(nlsock);
9062306a36Sopenharmony_ci	free_sock:
9162306a36Sopenharmony_ci		nl_socket_free(nlsock);
9262306a36Sopenharmony_ci		return -1;
9362306a36Sopenharmony_ci	}
9462306a36Sopenharmony_ci
9562306a36Sopenharmony_ciHow VDUSE works
9662306a36Sopenharmony_ci---------------
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ciAs mentioned above, a VDUSE device is created by ioctl(VDUSE_CREATE_DEV) on
9962306a36Sopenharmony_ci/dev/vduse/control. With this ioctl, userspace can specify some basic configuration
10062306a36Sopenharmony_cisuch as device name (uniquely identify a VDUSE device), virtio features, virtio
10162306a36Sopenharmony_ciconfiguration space, the number of virtqueues and so on for this emulated device.
10262306a36Sopenharmony_ciThen a char device interface (/dev/vduse/$NAME) is exported to userspace for device
10362306a36Sopenharmony_ciemulation. Userspace can use the VDUSE_VQ_SETUP ioctl on /dev/vduse/$NAME to
10462306a36Sopenharmony_ciadd per-virtqueue configuration such as the max size of virtqueue to the device.
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ciAfter the initialization, the VDUSE device can be attached to vDPA bus via
10762306a36Sopenharmony_cithe VDPA_CMD_DEV_NEW netlink message. Userspace needs to read()/write() on
10862306a36Sopenharmony_ci/dev/vduse/$NAME to receive/reply some control messages from/to VDUSE kernel
10962306a36Sopenharmony_cimodule as follows:
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ci.. code-block:: c
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ci	static int vduse_message_handler(int dev_fd)
11462306a36Sopenharmony_ci	{
11562306a36Sopenharmony_ci		int len;
11662306a36Sopenharmony_ci		struct vduse_dev_request req;
11762306a36Sopenharmony_ci		struct vduse_dev_response resp;
11862306a36Sopenharmony_ci
11962306a36Sopenharmony_ci		len = read(dev_fd, &req, sizeof(req));
12062306a36Sopenharmony_ci		if (len != sizeof(req))
12162306a36Sopenharmony_ci			return -1;
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ci		resp.request_id = req.request_id;
12462306a36Sopenharmony_ci
12562306a36Sopenharmony_ci		switch (req.type) {
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ci		/* handle different types of messages */
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ci		}
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ci		len = write(dev_fd, &resp, sizeof(resp));
13262306a36Sopenharmony_ci		if (len != sizeof(resp))
13362306a36Sopenharmony_ci			return -1;
13462306a36Sopenharmony_ci
13562306a36Sopenharmony_ci		return 0;
13662306a36Sopenharmony_ci	}
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ciThere are now three types of messages introduced by VDUSE framework:
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ci- VDUSE_GET_VQ_STATE: Get the state for virtqueue, userspace should return
14162306a36Sopenharmony_ci  avail index for split virtqueue or the device/driver ring wrap counters and
14262306a36Sopenharmony_ci  the avail and used index for packed virtqueue.
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci- VDUSE_SET_STATUS: Set the device status, userspace should follow
14562306a36Sopenharmony_ci  the virtio spec: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html
14662306a36Sopenharmony_ci  to process this message. For example, fail to set the FEATURES_OK device
14762306a36Sopenharmony_ci  status bit if the device can not accept the negotiated virtio features
14862306a36Sopenharmony_ci  get from the VDUSE_DEV_GET_FEATURES ioctl.
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for specified
15162306a36Sopenharmony_ci  IOVA range, userspace should firstly remove the old mapping, then setup the new
15262306a36Sopenharmony_ci  mapping via the VDUSE_IOTLB_GET_FD ioctl.
15362306a36Sopenharmony_ci
15462306a36Sopenharmony_ciAfter DRIVER_OK status bit is set via the VDUSE_SET_STATUS message, userspace is
15562306a36Sopenharmony_ciable to start the dataplane processing as follows:
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ci1. Get the specified virtqueue's information with the VDUSE_VQ_GET_INFO ioctl,
15862306a36Sopenharmony_ci   including the size, the IOVAs of descriptor table, available ring and used ring,
15962306a36Sopenharmony_ci   the state and the ready status.
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ci2. Pass the above IOVAs to the VDUSE_IOTLB_GET_FD ioctl so that those IOVA regions
16262306a36Sopenharmony_ci   can be mapped into userspace. Some sample codes is shown below:
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ci.. code-block:: c
16562306a36Sopenharmony_ci
16662306a36Sopenharmony_ci	static int perm_to_prot(uint8_t perm)
16762306a36Sopenharmony_ci	{
16862306a36Sopenharmony_ci		int prot = 0;
16962306a36Sopenharmony_ci
17062306a36Sopenharmony_ci		switch (perm) {
17162306a36Sopenharmony_ci		case VDUSE_ACCESS_WO:
17262306a36Sopenharmony_ci			prot |= PROT_WRITE;
17362306a36Sopenharmony_ci			break;
17462306a36Sopenharmony_ci		case VDUSE_ACCESS_RO:
17562306a36Sopenharmony_ci			prot |= PROT_READ;
17662306a36Sopenharmony_ci			break;
17762306a36Sopenharmony_ci		case VDUSE_ACCESS_RW:
17862306a36Sopenharmony_ci			prot |= PROT_READ | PROT_WRITE;
17962306a36Sopenharmony_ci			break;
18062306a36Sopenharmony_ci		}
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ci		return prot;
18362306a36Sopenharmony_ci	}
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ci	static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len)
18662306a36Sopenharmony_ci	{
18762306a36Sopenharmony_ci		int fd;
18862306a36Sopenharmony_ci		void *addr;
18962306a36Sopenharmony_ci		size_t size;
19062306a36Sopenharmony_ci		struct vduse_iotlb_entry entry;
19162306a36Sopenharmony_ci
19262306a36Sopenharmony_ci		entry.start = iova;
19362306a36Sopenharmony_ci		entry.last = iova;
19462306a36Sopenharmony_ci
19562306a36Sopenharmony_ci		/*
19662306a36Sopenharmony_ci		 * Find the first IOVA region that overlaps with the specified
19762306a36Sopenharmony_ci		 * range [start, last] and return the corresponding file descriptor.
19862306a36Sopenharmony_ci		 */
19962306a36Sopenharmony_ci		fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry);
20062306a36Sopenharmony_ci		if (fd < 0)
20162306a36Sopenharmony_ci			return NULL;
20262306a36Sopenharmony_ci
20362306a36Sopenharmony_ci		size = entry.last - entry.start + 1;
20462306a36Sopenharmony_ci		*len = entry.last - iova + 1;
20562306a36Sopenharmony_ci		addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED,
20662306a36Sopenharmony_ci			    fd, entry.offset);
20762306a36Sopenharmony_ci		close(fd);
20862306a36Sopenharmony_ci		if (addr == MAP_FAILED)
20962306a36Sopenharmony_ci			return NULL;
21062306a36Sopenharmony_ci
21162306a36Sopenharmony_ci		/*
21262306a36Sopenharmony_ci		 * Using some data structures such as linked list to store
21362306a36Sopenharmony_ci		 * the iotlb mapping. The munmap(2) should be called for the
21462306a36Sopenharmony_ci		 * cached mapping when the corresponding VDUSE_UPDATE_IOTLB
21562306a36Sopenharmony_ci		 * message is received or the device is reset.
21662306a36Sopenharmony_ci		 */
21762306a36Sopenharmony_ci
21862306a36Sopenharmony_ci		return addr + iova - entry.start;
21962306a36Sopenharmony_ci	}
22062306a36Sopenharmony_ci
22162306a36Sopenharmony_ci3. Setup the kick eventfd for the specified virtqueues with the VDUSE_VQ_SETUP_KICKFD
22262306a36Sopenharmony_ci   ioctl. The kick eventfd is used by VDUSE kernel module to notify userspace to
22362306a36Sopenharmony_ci   consume the available ring. This is optional since userspace can choose to poll the
22462306a36Sopenharmony_ci   available ring instead.
22562306a36Sopenharmony_ci
22662306a36Sopenharmony_ci4. Listen to the kick eventfd (optional) and consume the available ring. The buffer
22762306a36Sopenharmony_ci   described by the descriptors in the descriptor table should be also mapped into
22862306a36Sopenharmony_ci   userspace via the VDUSE_IOTLB_GET_FD ioctl before accessing.
22962306a36Sopenharmony_ci
23062306a36Sopenharmony_ci5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl
23162306a36Sopenharmony_ci   after the used ring is filled.
23262306a36Sopenharmony_ci
23362306a36Sopenharmony_ciFor more details on the uAPI, please see include/uapi/linux/vduse.h.
234