Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On my LinkedIn post about this topic someone actually replied with a superior method of steering LLM output compared to anything else I've ever heard of, so I've decided that until I find time to implement their method, I'm not going to worry about things.

tl;dr you put into the prompt all the JSON up until what you want the LLM to say, and you set the stop token to the end token of the current JSON item (so ',' or '}' ']', whatever) and you then your code fills out the rest of the JSON syntax up until another LLM generated value is needed.

I hope that makes sense.

It is super cool, and I am pretty sure there is a way to make a generator that takes in an arbitrary JSON schema and builds a state machine to do the above.

The performance should be super fast on locally hosted models that are using context caching.

Eh I should write this up as a blog post, hope someone else implements it, and if not, just do it myself.



There are many solutions for constrained/structured generation with LLMs these days, here is a blog post my employer published about this a while back: https://monadical.com/posts/how-to-make-llms-speak-your-lang...

I'm partial to Outlines lately, but they all have various upsides and downsides.

OpenAI even natively added support for this on their platform recently: https://openai.com/index/introducing-structured-outputs-in-t...


This is a really good post. I did find one error, Instructor works well with at least one other back end (Ollama).

Outlines looks quite interesting but I wasn't able to get it to work reliably.


With mixlayer, because the round trip time to the model is so short, you can alternate between appending known tokens of the JSON output and values you want the model to generate. I think this works better than constraining the sampling in a lot of cases.

We haven’t built a state machine over JSON schema that uses this approach yet but it’s on the way.


> With mixlayer, because the round trip time to the model is so short, you can alternate between appending known tokens of the JSON output and values you want the model to generate. I think this works better than constraining the sampling in a lot of cases.

Wow, that is a much more succinct way of describing it!

> We haven’t built a state machine over JSON schema that uses this approach yet but it’s on the way.

Really this should just be a simple library in JS and Python. Schema goes in, state machine pops out.

Complications will be around optional fields, I'm not sure offhand how to solve that!


I'd love it if you checked out what we've been working on.

It's still in early stages, but might be usable for something you're trying to build. Here's an example (this buffers the entire JSON object, but you can also gen as you go): https://docs.mixlayer.com/examples/json-output




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: