Running Open-Source AI Models Locally with Ruby

mark_l_watson · on Feb 5, 2024

Nice idea, I do the same with Ollama and local models, except my client code is in Common Lisp, Clojure, and Racket. I have three books for these languages with Ollama examples, all can be read free online https://leanpub.com/u/markwatson

I have been paid to do so-called “AI work” since 1982, lots of early work with neural networks and symbolic AI, then more recently deep learning. I have never been as excited about any technology in my life as I am about LLMs.

Commercial APIs from Anthropic, Mistral, and OpenAI are great tools, but I get off more on running smaller models locally myself.

emersonmacro · on Feb 5, 2024

What models do you get the best results with locally?

mark_l_watson · on Feb 5, 2024

I like mistral:7b-instruct, yi:34b, and wizard-vicuna-uncensored:30b. I think the so-called "uncensored" models tend to work better for general purpose, but mistral and yi aren't available uncensored.

I have a M2 Pro 32G memory so I need to use 3-bit quantization to run mixtral: dolphin-mixtral:8x7b-v2.5-q3_K_S. In general I don't like to go below 4-bit quantization.

harryf · on Feb 5, 2024

Wow. This actually "just worked" for me as in followed the instructions and got a result. Meanwhile the words "jupyter notebook" I've come to associate with python dependency hell.

To be fair I work as a PM and I rarely get more than about 60 minutes to play around with anything involving code, which has blocked me on getting hands dirty with anything AI related.

spywaregorilla · on Feb 5, 2024

as someone who just went through this, the process to getting mixtral running in python did "just work" (pip install the interface, download the model, run the sample)

The process to get it running on the gpu wasn't there yet.

lxe · on Feb 5, 2024

It's an ollama interface, not an actual model implementation in Ruby. Title should be "INTERFACING with Open-Source AI Models Locally with Ruby"

itake · on Feb 5, 2024

Or “using an http client to call an api with Ruby”

riffraff · on Feb 5, 2024

notice the langchainrb gem[0] does have an ollama provider[1].

It might be overkill for your needs tho.

[0] https://github.com/ttilberg/langchainrb [1] https://github.com/ttilberg/langchainrb/blob/main/lib/langch...

lolinder · on Feb 5, 2024

I tried using langchain's Ollama provider but for my use case it was strictly worse than just using Ollama directly. Ollama builds a conversation context automatically that langchain provides no handle for, and the context langchain encourages you to build isn't as useful because it forces Ollama to re-process the full context each time, whereas the native Ollama context represents the current state of inference.

The other kinds of non-conversational context that I needed were trivial to put together myself, so for my use case langchain just got in the way. Ollama's API was already trivial to wrap myself.

e12e · on Feb 5, 2024

> Although there’s no dedicated gem for Ollama yet

https://rubygems.org/gems/ollama-ai

https://github.com/gbaptista/ollama-ai

> A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally

To be fair, this just depends on Faraday and wrap the http API - it still doesn't automate ollama install etc.

Ed: also

https://github.com/andreibondarev/langchainrb

https://github.com/andreibondarev/langchainrb_rails

Ed2: anyone have some insight as to why it doesn't support summarize for ollama?

isodev · on Feb 5, 2024

That's not bad at all!

There is also Nx and Bumblebee in Elixir land - it really changes the how one approaches running models in production. The fact that one can put together a service (or local process) running any model published to hugging face in a couple of lines of code is amazing.

[0] https://github.com/elixir-nx/bumblebee/blob/main/examples/ph...

[1] https://gist.github.com/toranb/8be408eaa97d5a5b795aec7d7fbee...

anonyfox · on Feb 5, 2024

how is the deployment story though? assuming a standard phoenix-on-fly.io process, I was under the assumption that the bumblebee models are downloaded at runtime? or are they "built" as part of the CI pipeline and then shipped within their docker container as blobs?

isodev · on Feb 5, 2024

That’s all configurable. You can choose to download and build at startup, bundle into the docker image or “prebuild” the cache in advance / separately from the main app. I think it’s quite alright for both cloud, Docker and VPS-y deployments.

anonyfox · on Feb 6, 2024

oh, interesting! to you have a link for me for the "bundle into docker" option?

tracerbulletx · on Feb 5, 2024

Calling an http api from a ruby program doesn't really constitute running an "AI Model Locally with Ruby" for me. But if you want to get a little closer to that being true you could also use the llama.cpp bindings for ruby. https://github.com/yoshoku/llama_cpp.rb

TimPC · on Feb 5, 2024

This seems highly misleading to me. In no world are LLMs the kind of neural net you talk about. You grossly misrepresent how they work by pretending they are entirely built of fully connected layers.

spywaregorilla · on Feb 5, 2024

This nuance seems extremely unlikely to come up with anything the article discusses

felipemesquita · on Feb 5, 2024

The system prompt has a typo: You are an _excerpt_ Ruby developer

danielbln · on Feb 5, 2024

"Why am I only getting little ruby snippets generated..."

Alifatisk · on Feb 5, 2024

This is such a neat idea and interesting to see Ruby being used here!