Local LLM

Gespeichert von Erik Wegner am/um Mon, 10/27/2025 - 21:01

The video from c't 3003 (https://youtu.be/ii8Npn8H2BQ?si=jtI9rGwYfVTaa4Zu) suggests to install LM Studio for running models on local hardware.

A previous attempt to run models on my Radeon card with ollama failed, because this particular hardware was not supported. But even running the model on CPU only gave some results.

Now, LM Studio claims to support fully offloading to the GPU. And these are the first results:

qwen/qwen3-4b-2507 (2.33 GB) runs at 18.79 tok/sec, 0.71s to first token

ibm/granite-4-h-tiny (3.94 GB) runs at 30.10 tok/sec, 3.13s to first token

LLM
Künstliche Intelligenz