Writing a NetBSD Kernel Module
Kernel modules are object recordsdata oldschool to prolong an running map’s
kernel efficiency at bustle time.
In this post, we’ll stare at implementing a straightforward persona map driver as a
kernel module in NetBSD.
As soon as it is miles loaded, userspace processes will in all probability be in a location to write
an arbitrary byte string to
the map,
and on every successive read
query a cryptographically-stable pseudorandom permutation of
the usual byte string.
Sooner than we open up, compiling a kernel module requires the NetBSD source code to dwell in /usr/src
.
This explains straightforward the vogue to
gain that.
Fundamentally, most userspace interfaces to persona or block devices are thru
special recordsdata that dwell in /dev
. We’ll create one such special file thru
the tell
$ mknod /dev/rperm c 420 0
The c
indicates that this file is an interface
to a persona map, 420
indicates this map’s necessary quantity, and 0
indicates this map’s minor quantity. The necessary quantity is oldschool by the kernel
to uniquely title every map, and the minor quantity is as soon as in a while oldschool
internally by map drivers but we won’t be bothering with it.
Our map driver will namely implement the open
, read
, write
, and shut
I/O systems. To
register our implementations of those systems with the kernel, we first
prototype them in formula that makes the compiler ecstatic utilizing the
dev_type_*
space of macros, after which build them staunch into a struct cdevsw
.
dev_type_open(rperm_open);
dev_type_close(rperm_close);
dev_type_write(rperm_write);
dev_type_read(rperm_read);
static struct cdevsw rperm_cdevsw = {
.d_open = rperm_open,
.d_close = rperm_close,
.d_read = rperm_read,
.d_write = rperm_write,
.d_ioctl = noioctl,
.d_stop = nostop,
.d_tty = notty,
.d_poll = nopoll,
.d_mmap = nommap,
.d_kqfilter = nokqfilter,
.d_discard = nodiscard,
.d_flag = D_OTHER
};
As we can look, there are a selection of suggestions we won’t be implementing.
devsw
stands for map swap.
Each kernel module is required to define it’s metadata thru the C macro
MODULE(class, title, required)
. Since our module is a map driver, named
rperm
, and won’t require one more module being pre-loaded, we write
MODULE(MODULE_CLASS_DRIVER, rperm, NULL);
Each module is additionally required to implement a MODNAME_modcmd
goal, which the kernel
calls to record necessary module-connected events, like when the module loads
or unloads. That is where we’ll register our struct cdevsw
.
#define CMAJOR 420
static int
rperm_modcmd(modcmd_t cmd, void *args)
{
devmajor_t bmajor, cmajor;
bmajor = -1;
cmajor = CMAJOR;
swap(cmd) {
case MODULE_CMD_INIT:
devsw_attach("rperm", NULL, &bmajor, &rperm_cdevsw, &cmajor);
ruin;
case MODULE_CMD_FINI:
devsw_detach(NULL, &rperm_cdevsw);
ruin;
default:
ruin;
}
return 0;
}
The NULL
argument to the devsw_*
suggestions is for a block map swap structure,
which we aren’t stricken with. In an identical way for bmajor
, but the kernel ends up
assigning an unused block map quantity for our driver anyway.
Now we flip to in actual fact implementing the four map I/O systems.
On every write
, we must retailer the byte string someplace. We exercise a static
structure for that.
static struct rperm_softc {
char *buf;
int buf_len;
} sc;
sc.buf
will pause up pointing to a location within the kernel’s heap that contains
the byte string. sc
and softc
stand for instrument context,
which is factual a convention followed within the NetBSD kernel for naming static structures in
map driver code.
open
is a required implementation, as it is miles always the principle
syscall in Unix I/O. However, there is nothing meaningful for us to enact there, so we
simply write a stub.
int
rperm_open(dev_t self, int flag, int mod, struct lwp *l)
{
return 0;
}
In write
, we allocate sufficient memory within the kernel’s heap to retailer the
byte string, after which switch the byte string from userspace to kernelspace.
int
rperm_write(dev_t self, struct uio *uio, int flags)
{
if (sc.buf)
kmem_free(sc.buf, sc.buf_len);
sc.buf_len = uio->uio_iov->iov_len;
sc.buf = (char *)kmem_alloc(sc.buf_len, KM_SLEEP);
uiomove(sc.buf, sc.buf_len, uio);
return 0;
}
First, let’s discuss about the allocations.
kmem_alloc
is same to userspace malloc
,
in that it allocates some quantity of bytes of memory within the heap. Curiously, this memory
is wired, which formula that for the period of bodily memory tension, it is miles now not paged
out to a swap disk like userspace memory is. The KM_SLEEP
flag to kmem_alloc
tells the kernel that
the sizzling kernel thread ought to serene sleep till sufficient bodily memory is avaiable for the demand, if it
already isn’t, as against kmem_alloc
simply returning NULL
in such a arena. Hence,
our allocation demand by no formula fails, and we don’t must test for sc.buf == NULL
.
kmem_free
is same to userspace free
, aside from for a second argument that has to be the quantity of bytes allocated utilizing kmem_alloc
.
Next, we way to the switch of the byte string from userspace to kernelspace.
In most cases, memory to be transfered, in both direction, comes in one or extra
non-contiguous chunks of memory (contemplate scatter-rep I/O) in conjunction with some extra
affirm variables like the quantity of recordsdata closing to be transfered within the sizzling
session, an offset staunch into a block map, and a few flags. All that knowledge
is encapulated in a struct uio
recordsdata form. And uiomove
performs the actual switch
by utilizing that knowledge. As an illustration, here uio->uio_rw
is space to UIO_WRITE
,
telling uiomove
that recordsdata
from uio
has to be transfered to sc.buf
. uiomove
additionally ends up updating
uio->uio_resid
, which is the full quantity of bytes left to switch to uio
.
Next we way to read
.
int
rperm_read(dev_t self, struct uio *uio, int flags)
{
if (sc.buf == NULL || uio->uio_resid < sc.buf_len)
return EINVAL;
char c;
uint32_t i, n, r;
for (i = 0; i < sc.buf_len-1; i++) {
r = rand_n(i, sc.buf_len);
c = sc.buf[r];
sc.buf[r] = sc.buf[i];
sc.buf[i] = c;
}
uiomove(sc.buf, sc.buf_len, uio);
return 0;
}
We first check if there is enough space in uio
to transfer a permuted byte
string, then use the Fisher–Yates shuffle
to permute the original byte string using a random number generated by rand_n
,
and then copy the permuted string back to userspace. In this case, uio->uio_rw
would be space to UIO_READ
, telling uiomove
that recordsdata
from sc.buf
has to be transfered to uio
.
The rand_n
goal, which we must implement, returns a random integer n
uniformly disbursed
over the variety [low, high)
.
#define R32MAX 4294967295
uint32_t rand_n(uint32_t low, uint32_t high) {
uint32_t limit, diff, r;
diff = high - low;
limit = diff (R32MAX/diff);
do {
r = cprng_strong32();
} while (r > limit);
return (r % diff) + low;
}
For a source of randomness, we use cprng_strong32
. The cprng_*
family of
functions supply cryptographically secure pseudorandom
bytes (in this case, 4
) to callers within the kernel.
Once we have it, we transform the range of our random number from [0, 2^32)
to
[low, high)
by an iterative test that discards those values of r
that are
larger than the largest multiple of r
less than 2^32
, as using those numbers
would result in numbers in a certain subrange of [low, high)
being more
likely to occur than those not.
In close
, we free sc.buf
if it was allocated before.
int
rperm_close(dev_t self, int flag, int mod, struct lwp *l)
{
if (sc.buf != NULL) {
kmem_free(sc.buf, sc.buf_len);
sc.buf = NULL;
}
return 0;
}
Lastly, we write a three line Makefile
to build our module.
KMOD= rperm
SRCS= rperm.c
.include
Now we compile and load the module.
$ make
$ modload ./rperm.kmod
The ./
has to be present for modload
.
Unfortunately, we can’t simply do
$ echo 'bloop' > /dev/rperm
$ cat /dev/rperm
as that would end up copying the