Called it. It's very unfortunate that the local inference community has aggregat...

tarruda · 2025-08-05T22:34:31 1754433271

Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

mchiang · 2025-08-05T22:42:34 1754433754

totally respect your choice, and it's a great project too. Of course as a maintainer of Ollama, my preference is to win you over with Ollama. If it doesn't meet your needs, it's okay. We are more energized than ever to keep improving Ollama. Hopefully one day we will win you back.

Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it. The team is great, we just have features we want to build, and want to implement the models directly in Ollama. (We do use GGML and ask partners to help it. This is a project that also powers llama.cpp and is maintained by that same team)

am17an · 2025-08-06T02:02:15 1754445735

I’ve never seen a PR on ggml from Ollama folks though. Could you mention one contribution you did?

kristjansson · 2025-08-06T00:33:17 1754440397

> Ollama does not use llama.cpp anymore;

> We do use GGML

Sorry, but this is kind of hiding the ball. You don't use llama.cpp, you just ... use their core library that implements all the difficult bits, and carry a patchset on top of it?

Why do you have to start with the first statement at all? "we use the core library from llama.cpp/ggml and implement what we think is a better interface and UX. we hope you like it and find it useful."

mchiang · 2025-08-06T01:34:24 1754444064

thanks, I'll take that feedback, but I do want to clarify that it's not from llama.cpp/ggml. It's from ggml-org/ggml. I supposed it's all interchangeable though, so thank you for it.

kristjansson · 2025-08-06T16:58:27 1754499507

  % diff -ru ggml/src llama.cpp/ggml/src | grep -E '^(\+|\-) .*' | wc -l
      1445

i.e. as of time of writing +/- 1445 lines between the two, on about 175k total lines. a lot of which is the recent MXFP4 stuff.

Ollama is great software. It's integral to the broader diffusion of LLMs. You guys should be incredibly proud of it and the impact its had. I understand the current environment rewards bold claims, but the sense I get from some of your communications is "what's the boldest, strongest claim we can make that's still mostly technically true". As a potential user, taking those claims as true until closer evaluation reveals the discrepancy feels pretty bad, and keeps me firmly in the 'potential' camp.

Have the confidence in your software and the respect for your users to advertise your system as it is.

benreesman · 2025-08-09T04:00:26 1754712026

I'm torn on this, I was a fan of the project from the very beginning and never sent any of my stuff upstream, so I'm less than a contributor but more than don't care, and it's still non-obvious how the split happened.

But the takeaway is pretty clearly that `llama.cpp`, `GGML`/`GGUF`, and generally `ggerganov`'s single-handedly Carmacking it when everyone thought it was impossible is all the value. I think a lot of people made Docker containers with `ggml`/`gguf` in them and one was like "we can make this a business if we realllllly push it".

Ollama as a hobby project or even a serious OSS project? With a cordial upstream relationship and massive attribution labels everywhere? Sure. Maybe even as a commercial thing that has a massive "Wouldn't Be Possible Without" page for it's OSS core upstream.

But like: startup company for making money that's (to all appearances) completely out of reach for the principles to ever do without totally `cp -r && git commit` repeatedly? It's complicated, a lot of stuff starts as a fork and goes off in a very different direction, and I got kinda nauseous and stopped paying attention at some point, but near as I can tell they're still just copying all the stuff they can't figure out how to do themselves on an ongoing basis without resolving the upstream drama?

It's like, in bounds barely I guess. I can't point to it being "this is strictly against the rules or norms", but it's bending everything to the absolute limit. It's not a zone I'd want to spend a lot of time in.

kristjansson · 2025-08-10T16:52:35 1754844755

To be clear I was comparing ggml-org/ggml to ggml-org/llama.cpp/ggml to respond to the earlier thing. Ollama carries an additional patchset on top of ggml-org/ggml.

> [ggml] is all the value

That’s what gets me about Ollama - they have real value too! Docker is just the kernel’s cgroups/chroots/iptables/… but it deserves a lot of credit for articulating and operating those on behalf of the user. Ollama deserves the same. But they’re consistently kinda weird about owning just that?

dcreater · 2025-08-09T03:37:43 1754710663

This is utterly damming.

cortesoft · 2025-08-06T06:00:23 1754460023

Why are you being so accusatory about a choice about which details are important?

tarruda · 2025-08-05T22:53:05 1754434385

> Ollama does not use llama.cpp anymore

That is interesting, did Ollama develop its own proprietary inference engine or did you move to something else?

Any specific reason why you moved away from llama.cpp?

mchiang · 2025-08-05T22:59:08 1754434748

it's all open, and specifically, the new models are implemented here: https://github.com/ollama/ollama/tree/main/model/models

daft_pink · 2025-08-06T02:09:03 1754446143

So I’m using turbo and just want to provide some feedback. I can’t figure out how to connect raycast and project goose to ollama turbo. The software that calls it essentially looks for the models via ollama but cannot find the turbo ones and the documentation is not clear yet. Just my two cents, the inference is very quick and I’m happy with the speed but not quite usable yet.

mchiang · 2025-08-06T03:39:23 1754451563

so sorry about this. We are learning. Possible to email, and we will first make it right while we improve Ollama's turbo mode. hello@ollama.com

daft_pink · 2025-08-06T14:17:09 1754489829

no worries. i totally understand that the first day something is released it doesn’t work perfectly with third party/community software.

thanks for the feedback address :)

halJordan · 2025-08-05T23:44:20 1754437460

Fully compatible is a stretch, it's important we dont fall into a celebrity "my guy is perfect" trap. They implement a few endpoints.

jychang · 2025-08-05T23:57:13 1754438233

They implement more openai-compatible endpoints than ollama at least

benreesman · 2025-08-09T04:06:53 1754712413

I won't use `ollama` on principle. I use `llama-cli` and `llama-server` if I'm not linking `ggml`/`gguf` directly. It's like, two extra commands to use the one by the genius that wrote it and not the one that the guys just jacked it.

The models are on HuggingFace and downloading them is `uvx huggingface-cli`, the `GGUF` quants were `TheBloke` (with a grant from pmarca IIRC) for ages and now everyone does them (`unsloth` does a bunch of them).

Maybe I've got it twisted, but it seems to be that the people who actually do `ggml` aren't happy about it, and I've got their back on this.

om8 · 2025-08-05T23:36:55 1754437015

It’s unfortunate that llama.cpp’s code is a mess. It’s impossible to make any meaningful contributions to it.

kristjansson · 2025-08-06T00:08:04 1754438884

I'm the first to admit I'm not a heavy C++ user, so I'm not a great judge of the quality looking at the code itself ... but ggml-org has 400 contributors on ggml, 1200 on llama.cpp and has kept pace with ~all major innovations in transformers over the last year and change. Clearly some people can and do make meaningful contributions.

A4ET8a8uTh0_v2 · 2025-08-05T22:42:12 1754433732

Interesting, admittedly, I am slowly getting to the point, where ollama's defaults get a little restrictive. If the setup is not too onerous, I would not mind trying. Where did you start?

tarruda · 2025-08-05T22:56:43 1754434603

Download llama-server from llama.cpp Github and install it some PATH directory. AFAIK they don't have an automated installer, so that can be intimidating to some people

Assuming you have llama-server installed, you can download + run a hugging face model with something like

    llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja

And access http://localhost:8080

theshrike79 · 2025-08-06T09:51:51 1754473911

Isn't the open-webui maintainer heavily against MCP support and tool calling?

mchiang · 2025-08-05T22:28:14 1754432894

hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.

Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?

At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.

dcreater · 2025-08-06T06:19:14 1754461154

First, I must say I appreciate you taking the time to be engaged on this thread and responding to so many of us.

What I'm referring to is a broader pattern that I (and several) others have been seeing. Of the top of my head: not crediting llama.cpp previously, still not crediting llama.cpp now and saying you are using your own inference engine when you are still using ggml and the core of what Georgi made, most importantly why even create your own version - is it not better for the community to just contribute to llama.cpp?, making your own propreitary model storage platform disallowing using weights with other local engines requiring people to duplicate downloads and more.

I dont know how to regard these other than being largely motivated out of self interest.

I think what Jeff and you have built have been enormously helpful to us - Ollama is how I got started running models locally and have enjoyed using it for years now. For that, I think you guys should be paid millions. But what I fear is going to happen is you guys will go the way of the current dogma of capturing users (at least in mindshare) and then continually squeezing more. I would love to be wrong, but I am not going to stick around to find out as its risk I cannot take.

tomrod · 2025-08-06T00:35:12 1754440512

Everyone just wants to solarpunk this up.

dcreater · 2025-08-06T06:00:39 1754460039

In an ideal world yes - as we should - especially for us Californian/Bay Area people, that's literally our spirit animal. But I understand that is idle dreaming. What I believe certainly is within reach is a state that is much better than what we are in.

tomrod · 2025-08-06T12:43:04 1754484184

It needn't be idle dreaming? What fundamental law or societal agreement prevents solarpunk versus the current status quo of corporate anti-human cyberpunk?

dcreater · 2025-08-06T14:25:41 1754490341

Being realistic about economics and how money works in the current paradigm where it is concentrated

sitkack · 2025-08-05T23:38:01 1754437081

I believe that is what https://github.com/containers/ramalama set out to do.

janalsncm · 2025-08-05T22:38:58 1754433538

Huggingface also offers a cloud product, but that doesn’t take away from downloading weights and running them locally.

idiotsecant · 2025-08-05T22:30:32 1754433032

Oh no this is a positively diabolical development, offering...hosting services tailored to a specific use case at a reasonable price ...

SV_BubbleTime · 2025-08-06T05:19:12 1754457552

They can’t keep getting away with this.

mrcwinn · 2025-08-05T22:32:53 1754433173

Yes, better to get free sh*t unsustainably. By the way, you're free to create an open source alternative and pour your time into that so we can all benefit. But when you don't — remember I called it!

rpdillon · 2025-08-05T22:40:54 1754433654

What? The obvious move is to never have switched to Ollama and just use Llama.cpp directly, which I've been doing for years. Llama.cpp was created first, is the foundation for this product, and is actually open source.

wkat4242 · 2025-08-06T03:09:20 1754449760

But there's much less that works with that. OpenWebUI for example.

vntok · 2025-08-06T12:53:19 1754484799

Open WebUI works perfectly fine with llama.cpp though.

They have very detailed quick start docs on it: https://docs.openwebui.com/getting-started/quick-start/start...

wkat4242 · 2025-08-06T17:50:25 1754502625

Oh thanks I didn't know that :O

I do also need an API server though. The one built into OpenWebUI is no good because it always reloads the model if you use it first from the web console and then run an API call using the same model (like literally the same model from the workspace). Very weird but I avoid it for that reason.

rpdillon · 2025-08-07T04:17:38 1754540258

llama.cpp is what you want. It offers both a web UI and an API on the same port. I use llama.cpp's webui with gpt-oss-20b, and I also leverage it as an OpenAI-compatible server with gptel for Emacs. Very good product.

Aurornis · 2025-08-06T00:45:10 1754441110

> Its imperative we move away ASAP

Why? If the tool works then use it. They’re not forcing you to use the cloud.

dcreater · 2025-08-06T06:03:20 1754460200

There are many, many FOSS apps that use Ollama as a dependency. If Ollama rugs, then all those projects suffer.

Its a tale we seen played out many times. Redis is the most recent example.

Hasnep · 2025-08-06T08:04:22 1754467462

Most apps that integrate with ollama that I've seen just have an OpenAI compatible API parameter which defaults to port 11434 which ollama uses, but can be changed easily. Is there a way to integrate ollama more deeply?

dcreater · 2025-08-06T14:27:22 1754490442

Yes, but I fear the average person will not understand that and assume you need Ollama. That false perception is sufficiently damaging im afraid

prettyblocks · 2025-08-06T03:34:13 1754451253

Local inference is becoming completely commoditized imo. These days even docker has a local models you can launch with a single click (or command).

fud101 · 2025-08-06T06:33:22 1754462002

i was trying to remove it but noticed they've hidden the uninstall away. It amounts to doing a rm - which is a joke.

jcelerier · 2025-08-06T01:13:31 1754442811

happy sglang user here :)

cchance · 2025-08-06T00:34:48 1754440488

I stopped using them when they started doing the weird model naming bullshit stuck with lmstudio since