Yes, you can pre-fill the assistant's response with "```json {" or even "{" and that should increase the likelihood of getting a proper JSON in the response, but it's still not guaranteed. It's not nearly reliable enough for a production use case, even on a bigger (8B) model.
I could recommend using ollama or VLLm inference servers. They support a `response_format="json"` parameter (by implementing grammars on top of the base model). It makes it reliable for a production use, but in my experience the quality of the response decreases slightly when a grammar is applied.