On my LinkedIn post about this topic someone actually replied with a superior me...

shawnz · on Oct 24, 2024

There are many solutions for constrained/structured generation with LLMs these days, here is a blog post my employer published about this a while back: https://monadical.com/posts/how-to-make-llms-speak-your-lang...

I'm partial to Outlines lately, but they all have various upsides and downsides.

OpenAI even natively added support for this on their platform recently: https://openai.com/index/introducing-structured-outputs-in-t...

hedgehog · on Oct 25, 2024

This is a really good post. I did find one error, Instructor works well with at least one other back end (Ollama).

Outlines looks quite interesting but I wasn't able to get it to work reliably.

zackangelo · on Oct 25, 2024

With mixlayer, because the round trip time to the model is so short, you can alternate between appending known tokens of the JSON output and values you want the model to generate. I think this works better than constraining the sampling in a lot of cases.

We haven’t built a state machine over JSON schema that uses this approach yet but it’s on the way.

com2kid · on Oct 25, 2024

> With mixlayer, because the round trip time to the model is so short, you can alternate between appending known tokens of the JSON output and values you want the model to generate. I think this works better than constraining the sampling in a lot of cases.

Wow, that is a much more succinct way of describing it!

> We haven’t built a state machine over JSON schema that uses this approach yet but it’s on the way.

Really this should just be a simple library in JS and Python. Schema goes in, state machine pops out.

Complications will be around optional fields, I'm not sure offhand how to solve that!

zackangelo · on Oct 25, 2024

I'd love it if you checked out what we've been working on.

It's still in early stages, but might be usable for something you're trying to build. Here's an example (this buffers the entire JSON object, but you can also gen as you go): https://docs.mixlayer.com/examples/json-output