ArcoreConduit icon

ArcoreConduit

ProductionData & Intelligence

A robust, scalable data ingestion and ETL (Extract, Transform, Load) platform that serves as the central "conduit" for the Arcore ecosystem. ArcoreConduit extracts data from diverse external sources (REST APIs, websites, RSS feeds), stores raw data, and transforms it into normalized, structured formats for downstream consumption. It combines traditional ETL capabilities with modern AI/LLM processing for intelligent data transformation.

Key Features

  • Multi-Source Data Extraction (REST API, Web Scraping, JavaScript Rendering, RSS/Atom)
  • AI-Powered Data Transformation (Gemini, GPT, Claude)
  • Automated Scheduling with Celery Beat (Cron-based)
  • Health Monitoring & Change Detection
  • Pre-configured Template Library (50+ data sources)
  • Real-Time Task Queue Processing
  • Dynamic Schema Support & Parsing Rules Engine
  • Alert System (Email, Slack, Discord, Webhooks)

API Endpoints

MethodPathDescription
GET`/api/datasources/`List all data sources
POST`/api/datasources/`Create new data source
POST`/api/datasources/{id}/run_now/`Trigger immediate extraction
GET`/api/datasources/active/`List active sources only
GET`/api/normalized/crypto-prices/`Cryptocurrency price data
GET`/api/normalized/stock-quotes/`Stock market data
GET`/api/normalized/llm-outputs/`LLM processing results
GET`/api/monitoring/job-logs/`Task execution history
GET`/api/monitoring/health-metrics/`Aggregated health metrics
GET`/api/discovery/templates/`Data source templates
POST`/api/discovery/test-template/{id}/`Test template connectivity
GET`/api/health/`System health check endpoint

Usage Example

python
import requests
# Example interaction: Create data source
response = requests.post(
    url="https://api.arcore.internal/api/datasources/",
    headers={"Authorization": "Bearer <token>"},
    json={
        "name": "CoinGecko BTC Price",
        "source_type": "API",
        "endpoint": "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd",
        "parser_function": "parsers.crypto_parser",
        "schedule_cron": "*/15 * * * *"
    }
)
print(response.json())

# Trigger immediate extraction
response = requests.post(
    url="https://api.arcore.internal/api/datasources/123/run_now/",
    headers={"Authorization": "Bearer <token>"}
)
print(response.json())

Tech Stack

Python (Django)Django REST FrameworkPostgreSQL (JSONB)CeleryRedisNext.js/ReactPlaywrightBeautifulSoup4ScrapyGoogle GeminiOpenAI GPTAnthropic ClaudeInfisicalDocker

Authentication

  • **Header:** `Authorization: Bearer <token>`
  • **Scopes:** RBAC is enforced at the object level via `ArcoreCodex` policies.
  • **Infisical Integration:** External API keys stored in Infisical vault

Compliance & Security

Compliance

  • Authentication: Django session-based with CSRF protection
  • Secrets Management: Infisical vault integration for API credentials
  • Data Integrity: Raw data preserved in JSONB before transformation (no data loss)
  • Health Monitoring: Automated health checks with alert rules
  • AI Guardrails: LLM processing with retry logic and token tracking
  • Audit Logging: Complete task execution history with Celery task IDs
  • Encryption: TLS 1.3 transit, AES-256 rest

Security

  • Encryption: TLS 1.3 transit, AES-256 rest

Related Products