The size of P programs verifiable by a model checker is similar to that of TLA+ specifications checkable by TLA+, but because TLA+ can specify systems at a non-compilable level of detail, it also allows specifying and (perhaps non-soundly) verifying much larger systems than those you can soundly verify in P. There are many programming languages with model-checkers, but they are commonly used in niche areas -- like hardware or embedded software -- where programs are relatively small and simple compared to most "ordinary" software.
This is a fair call-out. I'll like to share a counter-example where we 'model check' our real-world service written in C# with pretty decent effectiveness. As you rightly call out, the number of execution points around which the model checker explores interleavings in a concurrent situation cannot be too large. If you look at _most_ micro-services, the really interesting points around which interleavings have to be explored are to external storage systems, say, Cosmos DB or Azure Storage etc. This is because most micro-services maintain their state in these external systems and are otherwise stateless. This insight allows us to only explore interleavings around calls to these external services as that's where a ton of concurrency bugs reside and ignore the interleavings at arbitrary points _within_ the services. Leveraging this insight allows us to utilize concurrency testing tools like Coyote on real world code and services with the testing scaling decently with the size of the program.