local-llm-server

ProductionData & Intelligence

A robust, production-ready API for managing and serving local language models with comprehensive performance monitoring. It provides an OpenAI-compatible API layer over local inference engines (llama.cpp, etc.), enabling secure, air-gapped AI capabilities for the enterprise.

Key Features

OpenAI-compatible API Interface
Real-time GPU/TPS Performance Monitoring
Model Management & Switching UI
Efficiency Mode (No-Log Inference)
Support for ROCm/CUDA and CPU Inference
RBAC Integration (Planned)

API Endpoints

Method	Path	Description
GET	`/v1/models`	List currently loaded and available models
POST	`/v1/chat/completions`	OpenAI-compatible chat completion endpoint
POST	`/api/orchestrate/load`	Load a specific model into VRAM
POST	`/api/orchestrate/unload`	Unload current model to free VRAM
GET	`/api/performance/metrics`	Get real-time token generation and GPU stats

Usage Example

python

import requests
# Example interaction
response = requests.get(
    url="https://api.arcore.internal/v1/models",
    headers={"Authorization": "Bearer <token>"}
)
print(response.json())

Tech Stack

PythonFastAPISQLiteDockerROCm/CUDAllama.cpp

Authentication

•**Header:** `Authorization: Bearer <token>`
•**Scopes:** RBAC is enforced at the object level via `ArcoreCodex` policies.

Compliance & Security

Compliance

✓Network: Air-gap capable
✓Access: API Key auth
✓Data Privacy: No external data egress

Security

✓Access: API Key auth

Coming Soon

3 planned

Content Filtering Policies
Target: Q1 2025
Per-User Resource Quotas
Target: Q2 2025
RBAC Integration for Model Access
Target: Verification Required

Related Products

Production

Arcore Maestro

Arcore Maestro is a hybrid, agent-based orchestration conductor for AI and data workflows. It intelligently routes tasks to efficient local LLMs or secure, sandboxed worker tools, reserving large external models for planning, creative generation, and self-healing analysis. It serves as the central nervous system for autonomous agents within the Arcore ecosystem.

Data & Intelligence

Production

Chapterize

A document processing engine that converts static PDFs into structured, web-ready HTML chapters. Chapterize uses a rules-based engine with strict regex patterns and DOM analysis to detect logical breaks, clean content, and make legacy documentation accessible, searchable, and mobile-friendly.

Data & Intelligence

Production

Arcore Career

A Career Knowledge Graph System that treats individual career data (skills, achievements, roles) as a queryable database. ArcoreCareer uses rules-based parsing to extract job details, enables dynamic resume generation, and facilitates structured career planning.

Data & Intelligence