Writing a NetBSD Kernel Module

Kernel modules are object recordsdata oldschool to prolong an running map’s
kernel efficiency at bustle time.

In this post, we’ll stare at implementing a straightforward persona map driver as a
kernel module in NetBSD.
As soon as it is miles loaded, userspace processes will in all probability be in a location to write an arbitrary byte string to
the map,
and on every successive read query a cryptographically-stable pseudorandom permutation of
the usual byte string.

Sooner than we open up, compiling a kernel module requires the NetBSD source code to dwell in /usr/src.
This explains straightforward the vogue to
gain that.

Fundamentally, most userspace interfaces to persona or block devices are thru
special recordsdata that dwell in /dev. We’ll create one such special file thru
the tell

$ mknod /dev/rperm c 420 0

The c indicates that this file is an interface
to a persona map, 420 indicates this map’s necessary quantity, and 0
indicates this map’s minor quantity. The necessary quantity is oldschool by the kernel
to uniquely title every map, and the minor quantity is as soon as in a while oldschool
internally by map drivers but we won’t be bothering with it.

Our map driver will namely implement the open, read, write, and shut I/O systems. To
register our implementations of those systems with the kernel, we first
prototype them in formula that makes the compiler ecstatic utilizing the
dev_type_* space of macros, after which build them staunch into a struct cdevsw.


static struct cdevsw rperm_cdevsw = {
    .d_open = rperm_open,
    .d_close = rperm_close,
    .d_read = rperm_read,
    .d_write = rperm_write,

    .d_ioctl = noioctl,
    .d_stop = nostop,
    .d_tty = notty,
    .d_poll = nopoll,
    .d_mmap = nommap,
    .d_kqfilter = nokqfilter,
    .d_discard = nodiscard,
    .d_flag = D_OTHER

As we can look, there are a selection of suggestions we won’t be implementing.
devsw stands for map swap.

Each kernel module is required to define it’s metadata thru the C macro
MODULE(class, title, required). Since our module is a map driver, named
rperm, and won’t require one more module being pre-loaded, we write


Each module is additionally required to implement a MODNAME_modcmd goal, which the kernel
calls to record necessary module-connected events, like when the module loads
or unloads. That is where we’ll register our struct cdevsw.

#define CMAJOR 420

static int
rperm_modcmd(modcmd_t cmd, void *args)
    devmajor_t bmajor, cmajor;

    bmajor = -1;
    cmajor = CMAJOR;
    swap(cmd) {
        case MODULE_CMD_INIT:
            devsw_attach("rperm", NULL, &bmajor, &rperm_cdevsw, &cmajor);
        case MODULE_CMD_FINI:
            devsw_detach(NULL, &rperm_cdevsw);
    return 0;

The NULL argument to the devsw_* suggestions is for a block map swap structure,
which we aren’t stricken with. In an identical way for bmajor, but the kernel ends up
assigning an unused block map quantity for our driver anyway.

Now we flip to in actual fact implementing the four map I/O systems.

On every write, we must retailer the byte string someplace. We exercise a static
structure for that.

static struct rperm_softc {
    char *buf;
    int buf_len;
} sc;

sc.buf will pause up pointing to a location within the kernel’s heap that contains
the byte string. sc and softc stand for instrument context,
which is factual a convention followed within the NetBSD kernel for naming static structures in
map driver code.

open is a required implementation, as it is miles always the principle
syscall in Unix I/O. However, there is nothing meaningful for us to enact there, so we
simply write a stub.

rperm_open(dev_t self, int flag, int mod, struct lwp *l)
    return 0;

In write, we allocate sufficient memory within the kernel’s heap to retailer the
byte string, after which switch the byte string from userspace to kernelspace.

rperm_write(dev_t self, struct uio *uio, int flags)
    if (sc.buf)
	kmem_free(sc.buf, sc.buf_len);
    sc.buf_len = uio->uio_iov->iov_len;
    sc.buf = (char *)kmem_alloc(sc.buf_len, KM_SLEEP);
    uiomove(sc.buf, sc.buf_len, uio);
    return 0;

First, let’s discuss about the allocations.

kmem_alloc is same to userspace malloc,
in that it allocates some quantity of bytes of memory within the heap. Curiously, this memory
is wired, which formula that for the period of bodily memory tension, it is miles now not paged
out to a swap disk like userspace memory is. The KM_SLEEP flag to kmem_alloc tells the kernel that
the sizzling kernel thread ought to serene sleep till sufficient bodily memory is avaiable for the demand, if it
already isn’t, as against kmem_alloc simply returning NULL in such a arena. Hence,
our allocation demand by no formula fails, and we don’t must test for sc.buf == NULL.

kmem_free is same to userspace free, aside from for a second argument that has to be the quantity of bytes allocated utilizing kmem_alloc.

Next, we way to the switch of the byte string from userspace to kernelspace.
In most cases, memory to be transfered, in both direction, comes in one or extra
non-contiguous chunks of memory (contemplate scatter-rep I/O) in conjunction with some extra
affirm variables like the quantity of recordsdata closing to be transfered within the sizzling
session, an offset staunch into a block map, and a few flags. All that knowledge
is encapulated in a struct uio recordsdata form. And uiomove performs the actual switch
by utilizing that knowledge. As an illustration, here uio->uio_rw is space to UIO_WRITE,
telling uiomove that recordsdata
from uio has to be transfered to sc.buf. uiomove additionally ends up updating
uio->uio_resid, which is the full quantity of bytes left to switch to uio.

Next we way to read.

rperm_read(dev_t self, struct uio *uio, int flags)
    if (sc.buf == NULL || uio->uio_resid < sc.buf_len)
	return EINVAL;

    char c;
    uint32_t i, n, r;

    for (i = 0; i < sc.buf_len-1; i++) {
	r = rand_n(i, sc.buf_len);
	c = sc.buf[r];
	sc.buf[r] = sc.buf[i];
	sc.buf[i] = c;
    uiomove(sc.buf, sc.buf_len, uio);
    return 0;

We first check if there is enough space in uio to transfer a permuted byte
string, then use the Fisher–Yates shuffle
to permute the original byte string using a random number generated by rand_n,
and then copy the permuted string back to userspace. In this case, uio->uio_rw
would be space to UIO_READ, telling uiomove that recordsdata
from sc.buf has to be transfered to uio.

The rand_n goal, which we must implement, returns a random integer n uniformly disbursed
over the variety [low, high).

#define R32MAX 4294967295

uint32_t rand_n(uint32_t low, uint32_t high) {
    uint32_t limit, diff, r;

    diff = high - low;
    limit = diff (R32MAX/diff);
    do {
	r = cprng_strong32();
    } while (r > limit);
    return (r % diff) + low;

For a source of randomness, we use cprng_strong32. The cprng_* family of
functions supply cryptographically secure pseudorandom
bytes (in this case, 4) to callers within the kernel.

Once we have it, we transform the range of our random number from [0, 2^32) to
[low, high) by an iterative test that discards those values of r that are
larger than the largest multiple of r less than 2^32, as using those numbers
would result in numbers in a certain subrange of [low, high) being more
likely to occur than those not.

In close, we free sc.buf if it was allocated before.

rperm_close(dev_t self, int flag, int mod, struct lwp *l)
    if (sc.buf != NULL) {
	kmem_free(sc.buf, sc.buf_len);
	sc.buf = NULL;
    return 0;

Lastly, we write a three line Makefile to build our module.

KMOD=   rperm
SRCS=   rperm.c


Now we compile and load the module.

$ make
$ modload ./rperm.kmod

The ./ has to be present for modload.

Unfortunately, we can’t simply do

$ echo 'bloop' > /dev/rperm
$ cat /dev/rperm

as that would end up copying the along with the rest of the string into the driver, and the driver
would end up shuffling the as well, which we don’t want.

So we settle for a simple test program.

#define BUF_LEN  80
#define N_ITER   20

int main() {
    char buf[BUF_LEN] = "Howdy NetBSD!";
    int i, fd, str_len;

    str_len = strlen(buf);
    fd = open("/dev/rperm", O_RDWR);
    write(fd, buf, str_len);
    for (i = 0; i < N_ITER; i++) {
	read(fd, buf, BUF_LEN);
	printf("%sn", buf);
    return 0;
$ ./test
NDe eBlS!tHol
Hl!elDSNeo Bt
N!llD eSoHetB
tHelS!BlNeo D

The total above code is supplied in it’s entirety on github.
Extra examples of kernel modules can even be advise within the NetBSD source tree
at src/sys/modules/examples/.

posted on Feb 2, 2017 by Saurav Sachidanand

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button