Portfolio

Back to Blog

Nexus AI: Building the Swiss Army Knife of Modern AI Backends

A comprehensive, microservice-driven AI backend designed for flexibility, scalability, and real-world value

AIMicroservicesBackendArchitecture
🚀 Multi-modal AI processing platform

A Multi-Modal AI Processing Microservice

Nexus AI: Building the Swiss Army Knife of Modern AI Backends

Are you looking to transform fragmented AI experiments—from video projects to code snippets—into a unified, production-scale platform? If so, let me walk you through how I architected, implemented, and launched Nexus AI: a comprehensive, microservice-driven AI backend designed for flexibility, scalability, and real-world value.


Why a Microservice Architecture for AI?

The challenge with AI projects today is their diversity. From embeddings and document search to intelligent agents, content analysis, and multi-modal processing—most solutions are built in silos. Nexus AI breaks down these walls with a clear principle: specialize every major AI task as its own service, but unify all of them behind a single, developer-friendly API.


🚀 Nexus AI Platform: Deep Dive

1. Architecture Overview

Key Services:

  • API Gateway: Your unified interface for all AI operations
  • AI Embedding Service: Cutting-edge vector embedding and batch processing
  • RAG Query Service: Retrieval-Augmented Generation, powered by sophisticated chunking and metadata strategies
  • Function Calling Service: Dynamic tool orchestration, including API integration and procedural automation
  • Content Analysis Service: Automated extraction and insight generation from YouTube, documents, and web pages
  • Model Management Service: Model evaluation, benchmark, and comparison for continuous improvement

Tech Stack:

  • TypeScript, Bun, Express: Secure and fast service logic
  • Ollama, ChromaDB: Local model execution and high-performant vector search
  • Redis: Caching, rate limiting
  • Docker: Full containerization for easy scaling and deployment

2. Key Features

  • Multi-Model Embeddings: Swap, compare, and benchmark different models and strategies, including advanced chunking and similarity algorithms.
  • Powerful RAG: Manage document collections with sentence, word, or semantic chunking for smarter retrieval.
  • Intelligent Function Calling: Seamlessly register and invoke external APIs and built-in tools with LLM-powered reasoning.
  • Deep Content Analysis: Extract core insights—ideas, habits, facts—from any web or video source, and summarize content at scale.
  • End-to-End Model Management: Directly test, evaluate, and optimize models with benchmarking and consistency scoring APIs.
  • Unified, Extensible API Gateway: Route all requests through a transparent, production-ready endpoint for any service.

3. Project Structure

text

nexus-ai/ ├── api-gateway/ ├── services/ │ ├── embedding/ │ ├── rag/ │ ├── function-calling/ │ ├── content-analysis/ │ └── model-management/ ├── shared/ ├── infrastructure/ └── docs/

4. Deployment & Scalability

Out-of-the-box containerization with Docker and support for horizontal scaling:

  • GPU resource management with Ollama
  • Vector database clustering (ChromaDB)
  • Async batch processing for heavy workloads

How Does Nexus AI Compare to Traditional Model Control Plane Servers?

While an MCP server manages model deployment, versioning, and traffic routing for inference APIs, Nexus AI goes beyond. It is an application engine:

  • Orchestrates data analysis, search, function calling, and tool use
  • Integrates multi-modal content (video, text, web) and multi-model pipelines
  • Delivers immediate business value for building chatbots, copilots, intelligent search engines, and workflow automation—most of which require much more than just model management

What Can You Build With Nexus AI?

Once live, Nexus AI enables you to rapidly create:

  • Internal and external search engines (with RAG and semantic embeddings)
  • Intelligent chatbots and copilots
  • Automated research agents (ingest, analyze, summarize any content)
  • SaaS platforms powered by robust, scalable AI APIs
  • Advanced analytics, recommendations, and Q&A engines
  • Automated document and multimedia processing pipelines

All with local-first privacy, rapid developer iteration, and flexible integration.


Getting Started

bash

npm run setup npm run dev # for development npm run start # or docker-compose up -d for production

API at a Glance

  • /api/v1/ai/process — Unified entrypoint
  • /api/v1/embeddings/* — Vector embeddings
  • /api/v1/rag/* — Retrieval-augmented queries
  • /api/v1/tools/* — Tool and function invocation
  • /api/v1/content/* — Content analysis endpoints
  • /api/v1/models/* — Model management and benchmarking

Business Value

Nexus AI provides:

  • One platform for all AI operations
  • Developer-friendly APIs and docs
  • Easy extensibility and production readiness
  • Local-first, privacy-respecting architecture
  • Significant cost savings—no external licensing required

Conclusion

Nexus AI is the ultimate “Swiss Army Knife” for AI development—blending sophisticated video project learnings and microservice best practices into a battle-tested platform. Whether for research, product development, or deploying enterprise-grade intelligent applications, Nexus AI is your launchpad.

Ready to modernize your AI stack? Start building with Nexus AI today!


Related Articles

1. Install dependencies and setup

Quick Start Guide

Nexus AI is a microservice-based AI platform that combines embeddings, RAG, function calling, and content analysis capabilities.

Prerequisites

  • Docker & Docker Compose
  • Bun (JavaScript runtime)
  • Ollama (for AI models)
  • NVIDIA GPU (recommended)

Setup & Run

# 1. Install dependencies and setup
pnpm run setup

# 2. Start all services (Docker)
pnpm run start

# OR run in development mode
pnpm run dev

Architecture

  • API Gateway (3000) - Unified interface
  • Embedding Service (3001) - Vector embeddings
  • RAG Service (3002) - Retrieval Augmented Generation
  • Function Calling Service (3003) - Tool execution
  • Content Analysis Service (3004) - Content analysis
  • Model Management Service (3005) - Model testing

Key Endpoints

  • Main API: http://localhost:3000/api/v1/ai/process
  • Health Check: http://localhost:3000/health
  • API Docs: http://localhost:3000/api/v1/docs

Development Commands

# Run individual services
pnpm run dev:embedding
pnpm run dev:rag
pnpm run dev:function
pnpm run dev:content
pnpm run dev:model

# Stop services
pnpm run stop

The setup script automatically installs dependencies, pulls required Ollama models, and creates configuration files. The platform runs entirely locally with Docker containers for Ollama, ChromaDB, and Redis.

Use Cases

Core Services

1. AI Embedding Service (Port 3001)

Single Embedding

curl -X POST http://localhost:3001/api/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The quick brown fox jumps over the lazy dog",
    "model": "nomic-embed-text",
    "prefix": "search_document: "
  }'

Response:

{
  "embedding": [0.1, -0.2, 0.3, 0.4, -0.5, 0.6, ...],
  "model": "nomic-embed-text",
  "processing_time": 45.2
}

Batch Embeddings

curl -X POST http://localhost:3001/api/v1/embeddings/batch \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing helps computers understand human language"
    ],
    "model": "nomic-embed-text",
    "chunking_strategy": "sentence_based",
    "chunk_size": 100,
    "overlap": 20
  }'

Response:

{
  "embeddings": [
    [0.1, -0.2, 0.3, ...],
    [0.4, -0.5, 0.6, ...],
    [0.7, -0.8, 0.9, ...]
  ],
  "chunks": [
    "Machine learning is a subset of artificial intelligence",
    "Deep learning uses neural networks with multiple layers",
    "Natural language processing helps computers understand human language"
  ],
  "model": "nomic-embed-text",
  "total_processing_time": 125.8
}

Similarity Search

curl -X POST http://localhost:3001/api/v1/embeddings/similarity \
  -H "Content-Type: application/json" \
  -d '{
    "query_embedding": [0.1, -0.2, 0.3, 0.4, -0.5],
    "candidate_embeddings": [
      [0.1, -0.2, 0.3, 0.4, -0.5],
      [0.6, -0.7, 0.8, 0.9, -1.0],
      [0.2, -0.3, 0.4, 0.5, -0.6]
    ],
    "top_k": 2
  }'

Response:

{
  "results": [
    {"index": 0, "score": 1.0},
    {"index": 2, "score": 0.95}
  ],
  "query_embedding_length": 5,
  "candidate_count": 3
}

2. RAG Query Service (Port 3002)

Document Ingestion

curl -X POST http://localhost:3002/api/v1/rag/documents \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "content": "Artificial Intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of 'intelligent agents': any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.",
        "metadata": {"source": "wikipedia", "topic": "ai_basics", "author": "AI Research Team"},
        "id": "ai_intro_001"
      },
      {
        "content": "Machine Learning is a subset of AI that focuses on algorithms that can learn from data without being explicitly programmed. It includes supervised learning, unsupervised learning, and reinforcement learning approaches.",
        "metadata": {"source": "textbook", "topic": "ml_basics", "chapter": "Introduction"},
        "id": "ml_intro_001"
      }
    ],
    "collection": "ai_knowledge",
    "chunking_strategy": "sentence_based",
    "chunk_size": 200,
    "overlap": 50
  }'

Response:

{
  "collection": "ai_knowledge",
  "documents_processed": 2,
  "chunks_created": 4,
  "processing_time": 234.5,
  "chunking_strategy": "sentence_based",
  "chunk_size": 200,
  "overlap": 50
}

RAG Query

curl -X POST http://localhost:3002/api/v1/rag/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the difference between AI and machine learning?",
    "collection": "ai_knowledge",
    "top_k": 3,
    "model": "llama3.1",
    "include_metadata": true
  }'

Response:

{
  "answer": "Artificial Intelligence (AI) is the broader field of intelligence demonstrated by machines, while Machine Learning is a specific subset of AI that focuses on algorithms that can learn from data without being explicitly programmed. AI encompasses any device that perceives its environment and takes actions to achieve goals, whereas ML specifically deals with learning algorithms including supervised, unsupervised, and reinforcement learning approaches.",
  "sources": [
    {
      "content": "Artificial Intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals.",
      "score": 0.92,
      "metadata": {"source": "wikipedia", "topic": "ai_basics", "author": "AI Research Team"}
    },
    {
      "content": "Machine Learning is a subset of AI that focuses on algorithms that can learn from data without being explicitly programmed.",
      "score": 0.89,
      "metadata": {"source": "textbook", "topic": "ml_basics", "chapter": "Introduction"}
    },
    {
      "content": "It includes supervised learning, unsupervised learning, and reinforcement learning approaches.",
      "score": 0.85,
      "metadata": {"source": "textbook", "topic": "ml_basics", "chapter": "Introduction"}
    }
  ],
  "model": "llama3.1",
  "processing_time": 156.7
}

3. Function Calling Service (Port 3003)

Function Execution

curl -X POST http://localhost:3003/api/v1/tools/execute \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the weather in Paris and calculate the distance from Tokyo?",
    "available_tools": ["weather", "location", "calculator"],
    "model": "llama3.1",
    "max_iterations": 3
  }'

Response:

{
  "result": {
    "weather": {
      "location": "Paris",
      "temperature": 15.2,
      "humidity": 78,
      "wind_speed": 12.5,
      "units": "celsius"
    },
    "distance": {
      "expression": "6371 * acos(cos(radians(48.8566)) * cos(radians(35.6762)) * cos(radians(2.3522) - radians(139.6503)) + sin(radians(48.8566)) * sin(radians(35.6762)))",
      "result": 9718.5
    }
  },
  "function_calls": [
    {
      "function_name": "weather",
      "parameters": {"location": "Paris"},
      "result": {
        "location": "Paris",
        "temperature": 15.2,
        "humidity": 78,
        "wind_speed": 12.5,
        "units": "celsius"
      },
      "execution_time": 234.5
    },
    {
      "function_name": "location",
      "parameters": {"city": "Tokyo"},
      "result": {
        "city": "Tokyo",
        "latitude": 35.6762,
        "longitude": 139.6503,
        "display_name": "Tokyo, Japan"
      },
      "execution_time": 156.7
    },
    {
      "function_name": "calculator",
      "parameters": {"expression": "6371 * acos(cos(radians(48.8566)) * cos(radians(35.6762)) * cos(radians(2.3522) - radians(139.6503)) + sin(radians(48.8566)) * sin(radians(35.6762)))"},
      "result": {
        "expression": "6371 * acos(cos(radians(48.8566)) * cos(radians(35.6762)) * cos(radians(2.3522) - radians(139.6503)) + sin(radians(48.8566)) * sin(radians(35.6762)))",
        "result": 9718.5
      },
      "execution_time": 12.3
    }
  ],
  "model": "llama3.1",
  "processing_time": 456.8
}

Tool Registration

curl -X POST http://localhost:3003/api/v1/tools/register \
  -H "Content-Type: application/json" \
  -d '{
    "name": "stock_price",
    "description": "Get current stock price for a given symbol",
    "parameters": [
      {
        "name": "symbol",
        "description": "Stock symbol (e.g., AAPL, GOOGL)",
        "type": "string",
        "required": true
      },
      {
        "name": "currency",
        "description": "Currency for the price",
        "type": "string",
        "required": false,
        "default": "USD"
      }
    ],
    "endpoint": "https://api.example.com/stock",
    "handler": "stockPriceHandler"
  }'

Response:

{
  "message": "Tool 'stock_price' registered successfully",
  "tool": {
    "name": "stock_price",
    "description": "Get current stock price for a given symbol",
    "parameters": [
      {
        "name": "symbol",
        "description": "Stock symbol (e.g., AAPL, GOOGL)",
        "type": "string",
        "required": true
      },
      {
        "name": "currency",
        "description": "Currency for the price",
        "type": "string",
        "required": false,
        "default": "USD"
      }
    ],
    "endpoint": "https://api.example.com/stock",
    "handler": "stockPriceHandler"
  }
}

4. Content Analysis Service (Port 3004)

Text Content Analysis

curl -X POST http://localhost:3004/api/v1/content/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "content": "In this video, we discuss the future of artificial intelligence and how it will impact various industries. The speaker mentions several key trends including automation, machine learning, and the importance of ethical AI development. They also highlight the need for continuous learning and adaptation in the tech industry.",
    "analysis_type": "comprehensive",
    "extract": ["ideas", "quotes", "habits", "facts", "references"],
    "word_limits": {
      "summary": 50,
      "insights": 15
    }
  }'

Response:

{
  "summary": "Discussion about AI future impact on industries, covering automation trends and ethical development needs.",
  "insights": [
    "AI will transform multiple industries through automation",
    "Machine learning is a key driver of AI advancement",
    "Ethical AI development requires careful consideration",
    "Continuous learning is essential in tech industry",
    "Adaptation skills become increasingly important"
  ],
  "quotes": [
    "The future of artificial intelligence will impact various industries",
    "Automation is one of the key trends in AI development",
    "Ethical AI development is of paramount importance",
    "Continuous learning and adaptation are crucial in tech"
  ],
  "habits": [
    "Stay updated with AI trends and developments",
    "Focus on ethical considerations in AI projects",
    "Continuously learn new technologies and skills",
    "Adapt to changing industry requirements"
  ],
  "facts": [
    "AI impacts multiple industries simultaneously",
    "Machine learning is a subset of AI technology",
    "Ethical considerations are important in AI development",
    "Tech industry requires continuous learning"
  ],
  "references": [
    "Artificial intelligence technologies",
    "Machine learning algorithms",
    "Ethical AI development frameworks",
    "Industry automation trends"
  ],
  "processing_time": 234.5
}

YouTube Video Analysis

curl -X POST http://localhost:3004/api/v1/content/youtube \
  -H "Content-Type: application/json" \
  -d '{
    "video_id": "dQw4w9WgXcQ",
    "analysis_type": "insights"
  }'

Response:

{
  "summary": "Music video featuring catchy melody and memorable lyrics with universal appeal.",
  "insights": [
    "Music has universal emotional impact",
    "Catchy melodies create lasting memories",
    "Simple lyrics can be highly effective",
    "Visual storytelling enhances musical experience",
    "Repetition creates memorable musical phrases"
  ],
  "quotes": [
    "Never gonna give you up",
    "Never gonna let you down",
    "Never gonna run around and desert you"
  ],
  "habits": [
    "Listen to music for emotional connection",
    "Appreciate simple yet effective melodies",
    "Value memorable musical experiences"
  ],
  "facts": [
    "Music videos combine audio and visual elements",
    "Repetition is a common musical technique",
    "Simple lyrics can achieve widespread popularity"
  ],
  "references": [
    "Rick Astley music",
    "Pop music genre",
    "Music video production"
  ],
  "processing_time": 456.7
}

Web Content Analysis

curl -X POST http://localhost:3004/api/v1/content/web \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/article", "https://news.example.com/story"],
    "extract_content": true,
    "summarize": true
  }'

Response:

{
  "results": [
    {
      "url": "https://example.com/article",
      "content": "This article discusses the latest developments in artificial intelligence...",
      "summary": "Comprehensive overview of recent AI breakthroughs and their implications for various industries.",
      "success": true
    },
    {
      "url": "https://news.example.com/story",
      "content": "Breaking news about technological advancements...",
      "summary": "Latest technological innovations and their potential impact on society.",
      "success": true
    }
  ],
  "processing_time": 1234.5,
  "total_urls": 2,
  "successful_urls": 2
}

5. Model Management Service (Port 3005)

Model Testing

curl -X POST http://localhost:3005/api/v1/models/test \
  -H "Content-Type: application/json" \
  -d '{
    "models": ["llama3.1", "mistral", "dolphin-mistral"],
    "prompt": "Explain quantum computing in simple terms",
    "iterations": 3,
    "format": "text"
  }'

Response:

{
  "results": [
    {
      "model": "llama3.1",
      "responses": [
        "Quantum computing uses quantum mechanical phenomena to process information in ways that classical computers cannot.",
        "Quantum computers leverage quantum bits (qubits) that can exist in multiple states simultaneously.",
        "Quantum computing harnesses quantum mechanics to perform calculations exponentially faster than classical computers."
      ],
      "average_time": 1234.5,
      "consistency_score": 0.87
    },
    {
      "model": "mistral",
      "responses": [
        "Quantum computing is a revolutionary approach that uses quantum physics principles for computation.",
        "Unlike classical bits, quantum bits can be in superposition, enabling parallel processing.",
        "Quantum computers exploit quantum phenomena like entanglement and superposition for computation."
      ],
      "average_time": 987.3,
      "consistency_score": 0.92
    },
    {
      "model": "dolphin-mistral",
      "responses": [
        "Quantum computing utilizes quantum mechanical properties to perform computations.",
        "Qubits can exist in multiple states at once, allowing for massive parallel processing.",
        "Quantum computers use quantum effects to solve problems intractable for classical computers."
      ],
      "average_time": 1456.7,
      "consistency_score": 0.89
    }
  ],
  "comparison": {
    "fastest_model": "mistral",
    "most_consistent": "mistral",
    "best_performance": "mistral",
    "metrics": {
      "llama3.1_avg_time": 1234.5,
      "llama3.1_consistency": 0.87,
      "mistral_avg_time": 987.3,
      "mistral_consistency": 0.92,
      "dolphin-mistral_avg_time": 1456.7,
      "dolphin-mistral_consistency": 0.89
    }
  }
}

Model Comparison

curl -X POST http://localhost:3005/api/v1/models/compare \
  -H "Content-Type: application/json" \
  -d '{
    "models": ["llama3.1", "mistral"],
    "test_queries": [
      "What is machine learning?",
      "How does a neural network work?",
      "Explain the concept of deep learning"
    ],
    "metric": "response_time"
  }'

Response:

{
  "models": ["llama3.1", "mistral"],
  "test_queries": [
    "What is machine learning?",
    "How does a neural network work?",
    "Explain the concept of deep learning"
  ],
  "results": {
    "llama3.1": [
      {
        "query": "What is machine learning?",
        "response": "Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.",
        "response_time": 1.2
      },
      {
        "query": "How does a neural network work?",
        "response": "Neural networks are computing systems inspired by biological neural networks, consisting of interconnected nodes that process information.",
        "response_time": 1.1
      },
      {
        "query": "Explain the concept of deep learning",
        "response": "Deep learning is a subset of machine learning that uses neural networks with multiple layers to model and understand complex patterns in data.",
        "response_time": 1.3
      }
    ],
    "mistral": [
      {
        "query": "What is machine learning?",
        "response": "Machine learning is an AI technique that allows systems to automatically learn and improve from experience without being explicitly programmed.",
        "response_time": 0.9
      },
      {
        "query": "How does a neural network work?",
        "response": "Neural networks are computational models inspired by the human brain, using interconnected nodes to process and transmit information.",
        "response_time": 0.8
      },
      {
        "query": "Explain the concept of deep learning",
        "response": "Deep learning involves neural networks with many layers that can learn complex representations of data through hierarchical feature learning.",
        "response_time": 1.0
      }
    ]
  },
  "metrics": {
    "llama3.1": {"average_response_time": 1.2},
    "mistral": {"average_response_time": 0.9}
  },
  "processing_time": 2345.6
}

6. API Gateway - Unified Interface (Port 3000)

Unified AI Processing

curl -X POST http://localhost:3000/api/v1/ai/process \
  -H "Content-Type: application/json" \
  -d '{
    "type": "rag_query",
    "input": {
      "query": "How do I implement a neural network?",
      "collection": "programming_knowledge",
      "top_k": 5,
      "include_metadata": true
    },
    "options": {
      "model": "llama3.1"
    }
  }'

Response:

{
  "type": "rag_query",
  "result": {
    "answer": "To implement a neural network, you need to define the network architecture, initialize weights, implement forward and backward propagation, and train using gradient descent...",
    "sources": [
      {
        "content": "Neural networks consist of layers of interconnected nodes...",
        "score": 0.95,
        "metadata": {"source": "tutorial", "topic": "neural_networks"}
      }
    ],
    "model": "llama3.1",
    "processing_time": 234.5
  },
  "processing_time": 245.6,
  "metadata": {
    "service_endpoint": "http://localhost:3002",
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}

Multi-Modal Processing

curl -X POST http://localhost:3000/api/v1/ai/process \
  -H "Content-Type: application/json" \
  -d '{
    "type": "multimodal",
    "input": {
      "sources": [
        {
          "type": "youtube",
          "video_id": "dQw4w9WgXcQ"
        },
        {
          "type": "web",
          "url": "https://example.com/article"
        },
        {
          "type": "document",
          "content": "Additional context about the topic..."
        }
      ],
      "analysis_type": "comprehensive"
    }
  }'

Response:

{
  "type": "multimodal",
  "result": {
    "sources": [
      {
        "type": "youtube",
        "data": {
          "summary": "Music video with catchy melody...",
          "insights": ["Music has universal appeal", "Simple lyrics are effective"],
          "processing_time": 456.7
        }
      },
      {
        "type": "web",
        "data": {
          "summary": "Article about technological advancements...",
          "processing_time": 234.5
        }
      },
      {
        "type": "document",
        "data": {
          "summary": "Additional context provides valuable insights...",
          "processing_time": 123.4
        }
      }
    ],
    "total_sources": 3,
    "successful_sources": 3
  },
  "processing_time": 1234.6,
  "metadata": {
    "service_endpoint": "multiple",
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}

Health Check Examples

API Gateway Health

curl http://localhost:3000/health

Response:

{
  "service": "api-gateway",
  "status": "healthy",
  "uptime": 3600.5,
  "dependencies": {
    "embedding": "up",
    "rag": "up",
    "function-calling": "up",
    "content-analysis": "up",
    "model-management": "up"
  },
  "timestamp": "2024-01-15T10:30:00.000Z"
}

These examples demonstrate the full capabilities of each service in the Nexus AI Platform.