v2.4.0
Local • CPU
Welcome to LlamaLocal! I'm ready for local Llama inference. Select a model from the sidebar or ask me anything below.
Fine-tune Llama inference parameters
Limits vocabulary to top-k tokens
Maximum length of generated text
Number of tokens to predict in batch
Generate vector representations
Adaptive sampling control
Path to the Llama.cpp server binary
Port for the local Llama server
Directory containing your Llama models
Number of CPU threads to use
Custom system instructions for the model
Log server activity to file
Automatically optimize model weights
1. Install Llama.cpp server • 2. Load a model • 3. Start chatting
Use quantized models (GGUF) and adjust threads for your CPU
100% local processing • No data leaves your device
${text}