Introduction #
LLM is shorten of Large Language Model or in simple way that is brain of AI, as we see in GPT, Deepseek, or any else they working LLM in behind but the diffrent with our case we dont have giant horse-power, so! lets put on low-power SBC (max 10w), like bulb mates.
Installation #
First thing we need is our machine of course, but before we start lemme show what specification of this machine.
root@server
-----------
OS: Debian GNU/Linux 11 (bullseye) aarch64
Host: H96 Max X3
Kernel: 6.6.88-ophub
Uptime: 6 days, 19 hours, 59 mins
Packages: 218 (dpkg)
Shell: bash 5.1.4
Resolution: 720x576i
Terminal: /dev/pts/1
CPU: ARMv8 rev 0 (v8l) (4) @ 1.908GHz
Memory: 150MiB / 3654MiBThat just slice of neofetch output, and then tool what we gonna use is llama.cpp.
There is so many alternative like llama.cpp, here for example ollama, llamafile, and etc. The reason is because its lightweight,
i’ve been tested it on my Orange Pi Zero 3 with 1GB RAM and its works, yeah just works! but dont expect too much.
Installing dependencies #
Dependencies is git, cmake, make, ccache, python3, python3-pip and build-essential for development, in our case am using debian based.
apt install build-essential git cmake make ccache python3 python3-pipCloning the source code #
This command will cloning source code from github repository.
git clone https://github.com/ggml-org/llama.cpp
cd llama.cppCompiling the source code #
After you cloned, you need to compile it into binary executable to make it works, then back to home users.
cmake -B build -DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
cmake --build build --config Release -j$(nproc)
cd ~| Additional Command | Command Description |
|---|---|
| DBUILD_SHARED_LIBS=OFF | Static Build |
| DLLAMA_CURL=OFF | Disabling curl integration to prevent error |
Cloning LLM trained source #
After you done compiling, the next step you need is clone the LLM source from the repository.
This command will compiling tools for downloading LLM from Ollama repository.
git clone https://github.com/akx/ollama-dl.git
cd ollama-dl
pip install uvAfter you done installing downloader, you need to browse to Ollama repos then choose whatever LLM that you wanna try.
Here i’ll using the lastest qwen3:0.6b for reliable purpose, its tiny.
uv run ollama_dl.py qwen3:0.6b Then lets try to run it, before we run it lets back to home user dir for cleaned purpose.
cd ~
./llama.cpp/build/bin/llama-cli -m ollama-dl/library-qwen3-0.6b/model-7f4030143c1c.gguf -p "what is life?"Here is the output: #
User:
“what is life?”
Assistant:
“Okay, the user is asking, “What is life?” I need to provide a thoughtful and informative answer. Let me start by breaking down the question. The user might be interested in understanding the concept of life in different contexts, like biology, philosophy, or personal meaning.
First, I should explain life from a biological perspective, highlighting the fundamental processes like growth, reproduction, and metabolism. Then, I can mention cultural and philosophical perspectives too, such as existentialism or existentialist views on meaning. It’s important to acknowledge that life can have different interpretations and emphasize the complexity involved.
I should make sure to address the user’s possible underlying questions they might have, like whether they want to know the definition of life or the meaning behind it. It’s also good to touch on the idea that life is interconnected and that different aspects of existence contribute to the overall experience.
Wait, should I include examples or metaphors? Maybe a metaphor like a tree growing through time or a journey to discover self? That could add depth. Also, emphasizing that life isn’t just about the physical aspect but the emotional and spiritual parts can help the user better understand.
I need to keep the language clear and avoid jargon. Make sure the answer flows logically, connecting the biological definition with broader philosophical views. Finally, conclude by reinforcing the idea that life is a continuous and dynamic experience.
Life is a fundamental concept that can be interpreted in various ways, depending on one’s perspective and the context of understanding. Here’s a structured exploration of the meaning of life:
Biological Perspective: From a biological standpoint, life involves the processes of growth, reproduction, metabolism, and adaptation. It’s about the transformation of one organism into another over time, driven by genetic inheritance and environmental conditions.
Philosophical and Existential Views:
- Existentialism: Emphasizes the freedom and meaning in living, often through the lens of individual choice and the search for self-fulfillment.
- Existentialist: Suggests that life is a continuous journey, and meaning arises through personal agency and the pursuit of one’s own values.
Cultural and Personal Contexts:
- Personal Meaning: Life can be seen as a dynamic, evolving experience, shaped by relationships, personal goals, and life experiences.
- Symbolism: Many cultures view life as a path to discovery, creativity, or fulfillment, with metaphors like a tree growing through time or a journey through challenges.
Interconnectedness: Life is not a single entity but a tapestry of physical and emotional aspects. It’s about living authentically, embracing life’s complexity, and finding purpose in the present.
In summary, life is a multifaceted experience that transcends physical existence, embracing the richness of existence through personal growth, creativity, and meaningful connections. The answer depends on how one defines life and the values they seek to understand.”
~ Qwen3:0.6b
Command up there is answered mode generation, but if you wanna try like chat mode with web interface you can use this command.
./llama.cpp/build/bin/llama-server -m ollama-dl/library-qwen3-0.6b/model-7f4030143c1c.gguf --port 8080 --host 0.0.0.0- -m: LLM data
- –port: IP port
- –host: IP address
Here speed average of text generation for my low-power machine.
llama_perf_sampler_print: sampling time = 2079.36 ms / 618 runs ( 3.36 ms per token, 297.21 tokens per second)
llama_perf_context_print: load time = 5165.18 ms
llama_perf_context_print: prompt eval time = 6111.70 ms / 12 tokens ( 509.31 ms per token, 1.96 tokens per second)
llama_perf_context_print: eval time = 549838.88 ms / 605 runs ( 908.82 ms per token, 1.10 tokens per second)
llama_perf_context_print: total time = 643256.20 ms / 617 tokensIts says: 1.10/s or Word/s
The end #
After doing this all, hope you find this useful and thank you for coming.