Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for sharing. TIL about rerankers.

Chunking strategy is a big issue. I found acceptable results by shoving large texts to to gemini flash and have it summarize and extract chunks instead of whatever text splitter I tried. I use the method published by Anthropic https://www.anthropic.com/engineering/contextual-retrieval i.e. include full summary along with chunks for each embedding.

I also created a tool to enable the LLM to do vector search on its own .

I do not use Langchain or python.. I use Clojure+ LLMs' REST APIs.



I made a startup, https://tokencrush.ai/, to do just this.

I've struggled to find a target market though. Would you mind sharing what your use case is? It would really help give me some direction.


Have you measured your latency, and how sensitive are you to it?


>> Have you measured your latency, and how sensitive are you to it?

Not sensitive to latency at all. My users would rather have well researched answers than poor answers.

Also, I use batch mode APIs for chunking .. it is so much cheaper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: