What about allocating one kernel thread to act as a watchdog to detect if a user-space thread has not performed the preemption check in time, and only send an interrupt if that's the case.
The preemption check would simply write some data to memory (the actual data wouldn't matter), and the watchdog would use the MMU modification flag to check if the memory page was updated. That way you'd avoid synchronisation problems.
That is essentially what is happening anyhow. Your runtime (e.g. GO) usually has multiple worker threads executing fibres / coroutines (one OS thread for each CPU core, usually). These worker threads have a userspace scheduler which performs userspace (cooperative) context switches. In addition, every so often the OS would also interrupt the worker threads and does an OS-level context switch (so that other processes can run).
The problem is 2-fold; (1) is that in cooperative (userspace) multithreading, context switches can only happen at a safepoint, where the runtime knows what's going on, and what the current state is; that's necessary for any kind of userspace service provided by the runtime, such as GC, synchronisation, JIT, etc. I think JVM people (or maybe Go? edit: found the Go issue [0]) are trying to generalize this so that every point of the program is a safepoint; not sure how it's going. (2) is that userspace has very limited capacity to even observe OS-level context switches, let alone modify them... E.g. ideally you'd be able to take a look where (at which instruction) the context switch happened, what the current state is (e.g. values of registers), and maybe modify it (e.g. switch to another coroutine). AFAIK there's no portable way to do this, and some OSs/platforms don't support any way to do this. Basically pretty much the only thing that can happen after a thread is interrupted by the OS, is that the thread resumes exactly when it was.
You can implement something similar without needing any special kernel features:
- Userspaces cooperative scheduling for all the tasks, in multiple threads if multi-core.
- A periodic timer signal sent to all the userspace threads. (Or one thread, but you might have to give it high scheduling priority; not all kernels offer this)
- The key to keeping kernel-userspace transition overhead down is the periodic timer doesn't run too often.
- The timer signal handler checks the per-thread "context-switched since last check" flag. If set, clear it. If not set, either pre-empt the active co-operative task in that thread (if that's possible; see another comment about safepoints), set a "pre-empt soon" flag for the co-operative task to detect (e.g. if it's looping but checks the flag in the loop), or move the inactive tasks in that thread to another thread (work-stealing), or a new thread (using pre-allocated idle threads because you can't make a new thread in a signal handler).
- If the timing of the timer signals is inconvenient, for example if you want to give the active task about 15ms more execution time, the signal handler may start a second timer instead of immediate action.
- Checking a per-thread flag from a signal handler may prove entertaining to do in a standard async-signal-safe way. (This matters more on older platforms where threads are themselves implemented in all sorts of ways.)
- Waking an idle "monitor" thread with a timer would be a cleaner way to do some of this, but there's no guarantee about when the monitor thread will run, and it may be a while if the CPUs are already busy. RT priority helps, if available.
would this monitor thread run on the same core? If so, a timer interrupt or other kernel preemption event need to fire to get it scheduled. At that point might as well handle the preemption directly.
On the other hand, dedicating a core as a preemption supervisor seems wasteful. I guess you could run the supervisor on an hyper thread to reduce wastage.
The preemption check would simply write some data to memory (the actual data wouldn't matter), and the watchdog would use the MMU modification flag to check if the memory page was updated. That way you'd avoid synchronisation problems.