What about allocating one kernel thread to act as a watchdog to detect if a user...

tomp · on Nov 13, 2019

That is essentially what is happening anyhow. Your runtime (e.g. GO) usually has multiple worker threads executing fibres / coroutines (one OS thread for each CPU core, usually). These worker threads have a userspace scheduler which performs userspace (cooperative) context switches. In addition, every so often the OS would also interrupt the worker threads and does an OS-level context switch (so that other processes can run).

The problem is 2-fold; (1) is that in cooperative (userspace) multithreading, context switches can only happen at a safepoint, where the runtime knows what's going on, and what the current state is; that's necessary for any kind of userspace service provided by the runtime, such as GC, synchronisation, JIT, etc. I think JVM people (or maybe Go? edit: found the Go issue [0]) are trying to generalize this so that every point of the program is a safepoint; not sure how it's going. (2) is that userspace has very limited capacity to even observe OS-level context switches, let alone modify them... E.g. ideally you'd be able to take a look where (at which instruction) the context switch happened, what the current state is (e.g. values of registers), and maybe modify it (e.g. switch to another coroutine). AFAIK there's no portable way to do this, and some OSs/platforms don't support any way to do this. Basically pretty much the only thing that can happen after a thread is interrupted by the OS, is that the thread resumes exactly when it was.

[0] https://github.com/golang/go/issues/24543

jlokier · on Nov 13, 2019

You can implement something similar without needing any special kernel features:

- Userspaces cooperative scheduling for all the tasks, in multiple threads if multi-core.

- A periodic timer signal sent to all the userspace threads. (Or one thread, but you might have to give it high scheduling priority; not all kernels offer this)

- The key to keeping kernel-userspace transition overhead down is the periodic timer doesn't run too often.

- The timer signal handler checks the per-thread "context-switched since last check" flag. If set, clear it. If not set, either pre-empt the active co-operative task in that thread (if that's possible; see another comment about safepoints), set a "pre-empt soon" flag for the co-operative task to detect (e.g. if it's looping but checks the flag in the loop), or move the inactive tasks in that thread to another thread (work-stealing), or a new thread (using pre-allocated idle threads because you can't make a new thread in a signal handler).

- If the timing of the timer signals is inconvenient, for example if you want to give the active task about 15ms more execution time, the signal handler may start a second timer instead of immediate action.

- Checking a per-thread flag from a signal handler may prove entertaining to do in a standard async-signal-safe way. (This matters more on older platforms where threads are themselves implemented in all sorts of ways.)

- Waking an idle "monitor" thread with a timer would be a cleaner way to do some of this, but there's no guarantee about when the monitor thread will run, and it may be a while if the CPUs are already busy. RT priority helps, if available.

gpderetta · on Nov 13, 2019

would this monitor thread run on the same core? If so, a timer interrupt or other kernel preemption event need to fire to get it scheduled. At that point might as well handle the preemption directly.

On the other hand, dedicating a core as a preemption supervisor seems wasteful. I guess you could run the supervisor on an hyper thread to reduce wastage.

lokedhs · on Nov 13, 2019

I was thinking it would be running on its own core. Cores tend to be quite cheap these days.