Thanks :). It's just a lot of prompting and string parsing. There are models lik...

davidcollantes · on April 4, 2024

Got a link for that one? I have found a few with Hermes-2-Mistral in the name.

keyle · on April 3, 2024

Wow, I didn't know about "Hermes 2 Pro - Mistral 7B", cheers!

nilsherzig · on April 3, 2024

It's my go to "structured text model" atm. Try "starling-ml-beta" (7b) for some very impressive chat capabilities. I honestly think that it outperforms GPT3 half the time.

peterldowns · on April 4, 2024

Sorry to repeat the same question I just asked the other commenter in this thread, but could you link the model page and recommend a specific level of quantization for the models you've referenced? I'd love to play with these models and see what you're talking about.

BOOSTERHIDROGEN · on April 4, 2024

It's from nous research https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Q5 is minimum.

peterldowns · on April 4, 2024

Thank you — from that page, at the bottom, I was able to find this link to what I think are the quantized versions

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-...

If you have the time, could you explain what you mean by "Q5 is minimum"? Did you determine that by trying the different models and finding this one is best, or did someone else do that evaluation, or is that just generally accepted knowledge? Sorry, I find this whole ecosystem quite confusing still, but I'm very new and that's not your problem.

d-z-m · on April 4, 2024

Talking GGUF, Usually the higher you can afford to go wrt. quantization(e.g. Q5 is better than Q4, etc), the better. A Q6_K has minimal performance loss from the Q8, so in most cases if you can fit a Q6_K it's recommended to just use that. TheBloke's READMEs[0] usually have a good table summarizing each quantization level.

If you're RAM constrained, you'll also have to make trade-offs about the context length. e.g. you could have 8 GB RAM and a Q5 quant with shorter context, vs Q3 with longer, etc.

[0]:https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

peterldowns · on April 4, 2024

Thank you!

BOOSTERHIDROGEN · on April 4, 2024

It's the best balance if you have limited compute performance.

peterldowns · on April 4, 2024

Thank you

madacol · on April 5, 2024

Have you considered using grammar sampling?