Having fun with signal handlers

Posted on Sat 21 November 2020 in programming

As every C and C++ programmer knows far too well, if you dereference a pointer that points outside of the space mapped on your process' memory, you get a segmentation fault and your programs crashes. As far as the language itself is concerned, you don't have a second chance and you cannot know in advance whether that dereferencing operation is going to set a bomb off or not. In technical terms, you are invoking undefined behaviour, and you should never do that: you are responsible for knowing in advance if your pointers are valid, and if they are not you keep the pieces.

However, turns out that most actual operating system give you a second chance, although with a lot of fine print attached. So I tried to implement a function that tries to dereference a pointer: if it can, it gives you the value; if it can't, it tells you it couldn't. Again, I stress this should never happen in a real program, except possibly for debugging (or for having fun).

The prototype is

word_t peek(word_t *addr, int *success);

The function is basically equivalent to return *addr, except that if addr is not mapped it doesn't crash, and if success is not NULL it is set to 0 or 1 to indicate that addr was not mapped or mapped. If addr was not mapped the return value is meaningless.

I won't explain it in detail to leave you some fun. Basically the idea is to install a handler for SIGSEGV: if the address is invalid, the handler is called, which basically fixes everything by advancing a little bit the instruction pointer, in order to skip the faulting instruction. The dereferencing instruction is written as hardcoded Assembly bytes, so that I know exactly how many bytes I need to skip.

Of course this is very architecture-dependent: I wrote the i386 and amd64 variants (no x32). And I don't guarantee there are no bugs or subtelties!

Another solution would have been to just parse /proc/self/maps before dereferencing and check whether the pointer is in a mapped area, but it would have suffered of a TOCTTOU problem: another thread might have changed the mappings between the time when /proc/self/maps was parsed and when the pointer was dereferenced (also, parsing that file can take a relatively long amount of time). Another less architecture-dependent but still not pure-C approach would have been to establish a setjmp before attempting the dereference and longjmp-ing back from the signal handler (but again you would need to use different setjmp contexts in different threads to exclude race conditions).

Have fun! (and again, don't try this in real programs)

EDIT I realized I should specify the language for source code highlighting to work decently. Now it's better!

EDIT 2 I also realized that my version of peek has problems when there are other threads, because signal actions are per-process, not per-thread (as I initially thought). See the comments for a better version (though not perfect).

#define _GNU_SOURCE
#include <stdint.h>
#include <signal.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <ucontext.h>

#ifdef __i386__
typedef uint32_t word_t;
#define IP_REG REG_EIP
#define IP_REG_SKIP 3
#define READ_CODE __asm__ __volatile__(".byte 0x8b, 0x03\n"  /* mov (%ebx), %eax */ \
                                       ".byte 0x41\n"        /* inc %ecx */ \
                                       : "=a"(ret), "=c"(tmp) : "b"(addr), "c"(tmp));
#endif

#ifdef __x86_64__
typedef uint64_t word_t;
#define IP_REG REG_RIP
#define IP_REG_SKIP 6
#define READ_CODE __asm__ __volatile__(".byte 0x48, 0x8b, 0x03\n"  /* mov (%rbx), %rax */ \
                                       ".byte 0x48, 0xff, 0xc1\n"  /* inc %rcx */ \
                                       : "=a"(ret), "=c"(tmp) : "b"(addr), "c"(tmp));
#endif

static void segv_action(int sig, siginfo_t *info, void *ucontext) {
    (void) sig;
    (void) info;
    ucontext_t *uctx = (ucontext_t*) ucontext;
    uctx->uc_mcontext.gregs[IP_REG] += IP_REG_SKIP;
}

struct sigaction peek_sigaction = {
    .sa_sigaction = segv_action,
    .sa_flags = SA_SIGINFO,
    .sa_mask = 0,
};

word_t peek(word_t *addr, int *success) {
    word_t ret;
    int tmp, res;
    struct sigaction prev_act;

    res = sigaction(SIGSEGV, &peek_sigaction, &prev_act);
    assert(res == 0);

    tmp = 0;
    READ_CODE

    res = sigaction(SIGSEGV, &prev_act, NULL);
    assert(res == 0);

    if (success) {
        *success = tmp;
    }

    return ret;
}

int main() {
    int success;
    word_t number = 22;
    word_t value;

    number = 22;
    value = peek(&number, &success);
    printf("%d %d\n", success, value);

    value = peek(NULL, &success);
    printf("%d %d\n", success, value);

    value = peek((word_t*)0x1234, &success);
    printf("%d %d\n", success, value);

    return 0;
}

Leave a comment

Comment will be manually reviewed before being published.

Comments