It depends on how high performance your high performance context requires. Being...

It depends on how high performance your high performance context requires. Being an array language, ideally traditional interpreter overhead doesn't matter if you write "array-y" code as most time would be spent in within-interpreter loops.

But as currently there isn't any loop fusion in CBQN (though I'd definitely like to add such at some point), despite the native ops being all nice SIMD where possible, it can still lose to autovectorized C or similar, or even scalar code, due to memory overhead.

FFI is libffi currently (with non-JITted preparation), so on the order of 100ns per call.