local-llm-server
A robust, production-ready API for managing and serving local language models with comprehensive performance monitoring. It provides an OpenAI-compatible API layer over local inference engines (llama.cpp, etc.), enabling secure, air-gapped AI capabilities for the enterprise.
Key Features
- OpenAI-compatible API Interface
- Real-time GPU/TPS Performance Monitoring
- Model Management & Switching UI
- Efficiency Mode (No-Log Inference)
- Support for ROCm/CUDA and CPU Inference
- RBAC Integration (Planned)
API Endpoints
| Method | Path | Description |
|---|---|---|
| GET | `/v1/models` | List currently loaded and available models |
| POST | `/v1/chat/completions` | OpenAI-compatible chat completion endpoint |
| POST | `/api/orchestrate/load` | Load a specific model into VRAM |
| POST | `/api/orchestrate/unload` | Unload current model to free VRAM |
| GET | `/api/performance/metrics` | Get real-time token generation and GPU stats |
Usage Example
import requests
# Example interaction
response = requests.get(
url="https://api.arcore.internal/v1/models",
headers={"Authorization": "Bearer <token>"}
)
print(response.json())Tech Stack
Authentication
- •**Header:** `Authorization: Bearer <token>`
- •**Scopes:** RBAC is enforced at the object level via `ArcoreCodex` policies.
Compliance & Security
Compliance
- ✓Network: Air-gap capable
- ✓Access: API Key auth
- ✓Data Privacy: No external data egress
Security
- ✓Access: API Key auth
Coming Soon
3 plannedContent Filtering Policies
Target: Q1 2025
Per-User Resource Quotas
Target: Q2 2025
RBAC Integration for Model Access
Target: Verification Required
Related Products
Arcore Maestro
Arcore Maestro is a hybrid, agent-based orchestration conductor for AI and data workflows. It intelligently routes tasks to efficient local LLMs or secure, sandboxed worker tools, reserving large external models for planning, creative generation, and self-healing analysis. It serves as the central nervous system for autonomous agents within the Arcore ecosystem.
Chapterize
A document processing engine that converts static PDFs into structured, web-ready HTML chapters. Chapterize uses a rules-based engine with strict regex patterns and DOM analysis to detect logical breaks, clean content, and make legacy documentation accessible, searchable, and mobile-friendly.
Arcore Career
A Career Knowledge Graph System that treats individual career data (skills, achievements, roles) as a queryable database. ArcoreCareer uses rules-based parsing to extract job details, enables dynamic resume generation, and facilitates structured career planning.