π§ AIGr.id: An Open, Plural, and Polycentric AI Network
AIGrid represents a fundamental shift from siloed, monolithic AI to an open, plural, and networked AI ecosystem.
π AIGr.id is a polycentric network of independent, plural AI components that coordinate to perform tasks, exchange data, and compose into higher-level intelligence β broader and greater than the sum of its parts.
π§© Designed as global public infrastructure for AI, AIGrid is not owned or controlled by any single entity. It is contributed to and accessed as a digital commons β intelligence built by many, for all.
βοΈ Powered by OpenOS.AI (AIOS) β A distributed AI Operating System for open, plural and poly-centric AI networks.
OpenOS.AI is 100% open source and community-driven.
π Product Deep Dive: Discover the philosophy, design, strategy, and purpose behind this platform in our in-depth write-up.
π Read the full write-up
While a variant of AIGr.id has been running in production at nearly 500k inferences per second on bare metal infrastructure in federated setting for close to a yearβsupporting real-time, sustainable Vision AI workloadsβthe released version includes significant upgrades to support broader goals, including LLM integration. Although unit and integration tested, this version has not yet been validated at similar scale or duration. As such, the project remains in beta and is not recommended for production use at this time.
OpenOS.AI provides full stack AI operations, globally distributed and optimized AI compute scale platform, and data management for decentralized AI networks.
OpenOS.AI enables:
- π§ Creation and coordination of multiple cognitive architectures
- π Composition of modular, networked AI systems
- βοΈ Dynamic orchestration of distributed AI agents and services
- βοΈ Optimized, cloud-native, and sovereign AI computing at scale
- ποΈ Actor-controlled resource allocation across shared infrastructure
- π€ Shareable AI, compute and data as digital commons and gig-economy for AI
- ποΈ Distributed data management and flow at massive scale
- ποΈ Polycentric governance and programmable autonomy
- π Open, multiplayer AI production and distribution at a global scale
- π System-wide observability, behavior tracing, and telemetry
π Core Features Overview
π‘ Feature | π Description |
---|---|
π Unified Network-Level, Multi-Cluster Resource Pooling | Seamlessly connect Kubernetes clusters from different locations to form a unified resource pool for running any kind of computational workload. |
βοΈ Flexible Resource allocation & Scheduling | Schedule AI models (LLMs), general compute logic, or custom Blocks on any cluster. Includes customizable scaling, load balancing, and health checks. |
π‘οΈ Policy-Driven Infrastructure and Job Management | Govern infrastructure and workloads using Python-based policies for full control over network, cluster, and job behavior. |
π Distributed Graph Execution (vDAGs) | Define complex workflows as DAGs of Blocks, allowing distributed execution across nodes and clusters. |
π§ Model Splitting and Distributed Inference | Break large models (like LLMs) into smaller splits, deploy them as vDAGs across infrastructure, framework-agnostic. |
π§° Developer-Friendly SDKs | Use SDKs to easily write and deploy AI model servers or compute logic across the distributed network. |
π§© Third-Party Framework Integration | Bring your own stackβwrap existing frameworks and libraries as Blocks using init containers. |
π§ͺ Multiple Instance Execution / GPU Sharing | Run multiple Block instances on the same node; a single GPU can be time-shared across multiple instances for maximum efficiency. |
π Customizable Parser-Based Workload Definitions | Define workloads using flexible, pluggable parsers to support different input formats and metadata structures. |
π Policy-Based Load Balancing and Health Checks | Use policy logic to drive runtime decisions for load balancing, instance health, and failover handling. |
π§© Breakdown of Features
-
Global Cluster Networking
Easily connect Kubernetes clusters across regions, forming a globally distributed, policy-governed compute mesh. -
Node Onboarding
Add VMs or bare-metal nodes to any cluster within the network, enabling flexible infrastructure expansion. -
Custom Rule-Based Orchestration
Write Python policies to control how clusters and networks are formed and how workloads are scheduled, tailored to your specific operational needs. -
Python-Native Policy Engine
Policies are written in Python, offering high expressiveness and support for external libraries, enabling complex logic and integration. -
Flexible Policy Deployment Modes
Deploy policies as standalone services, ad hoc jobs, or policy graphs, depending on the use case. -
Decentralized Registries
Set up and register your own asset or container registries on any cluster. These registries are globally discoverable and shareable within the network. -
Block and vDAG Specification via SDKs
Define compute workloads (e.g., LLM inference, object detection, etc.) using the Python SDK. Compose them into vDAGs to form cross-node or cross-cluster workflows. Blocks can be reused across multiple vDAGs. -
Sidecar Extensions for Blocks
Extend the functionality of Blocks through customizable sidecar containers. -
Resource-Aware Scheduling
Use policies to control resource allocation, auto-scaling, and load balancing. Blocks can scale across nodes and utilize multiple GPUs as needed. -
GPU Sharing Across Blocks
Schedule multiple Block instances on the same GPU for efficient resource utilization. -
End-to-End Metrics Collection
Collect metrics from Blocks, vDAGs, and nodes. Use them in policy logic for decision-making or define custom metrics as needed. -
Policy-Based Auditing and Quotas
Apply policies for vDAG-level audit logging, access controls, and quota management. -
Custom Health Checks
Define health check logic using policies for fine-grained monitoring. -
gRPC-Based Inference APIs
Submit tasks to Blocks or vDAGs via gRPC-based inference servers. -
Multi-Gateway Inference Support
Any user or administrator can deploy their own inference server and register it in a public directory. Each server can enforce its own policies for quotas and access control. -
Customizable Specification Format
Define and extend the specification format for onboarding clusters, nodes, Blocks, and vDAGs. Use policies to build custom specification parsers. -
Reusable Specification Store
Browse, search, and reuse predefined or customized specifications to quickly deploy Blocks and vDAGs. -
Third-Party System Integration
Seamlessly extend Blocks with third-party services or tools, either deployed alongside or externally, automated via init containers. -
LLM Splitting and Reusability
Split large LLMs into modular components and distribute them as vDAGs. Each model chunk can be reused across multiple vDAGs, enabling scalable and efficient deployments.
π Getting Started
π§© Essentials |
---|
Paper |
Concepts |
Architecture |
π§ User Flow Guides |
Network Creator & Admin Flow |
Cluster Contributor & Admin Flow |
Node Contributor Flow |
Block Creator Flow |
vDAG Creator Flow |
End User (Inference Task Submitter) Flow |
βοΈ Installation |
Network Creation |
Onboarding Cluster |
Onboarding Node to a Cluster |
π§ͺ Quickstart Tutorial:
The quickstart tutorial explains how to:
-
Simple block deployment across multiple GPUs (Reference model considered: Mistral7B LLM)
-
Simple block deployment on a single GPU (Sample model considered: YOLOv5
-
Linking an externally deployed vLLM system to the block for serving
-
Deploying external system along with the block using init containers
-
Splitting LLMs and deploying them across the network as a vDAG
π Upcoming Activities
-
Comparison Document: A detailed comparison between AIOSv1 and Ray/AnyScale β To be announced.
-
Benchmarking and Performance Analysis: Evaluation of system services, cluster services, block components, and end-to-end benchmarking of popular LLM and non-LLM models on the platform.
(To submit or suggest a model for benchmarking, please open an issue.) β To be announced. -
Mainnet Release: Launch of the mainnet, supporting both public and private deployments β To be announced.
-
Platform Security and IAM: Implementation of security measures for all platform services, including user IAM using decentralized identity protocols, role-based access control (RBAC), and integration with the policy system for fine-grained security actions β To be announced.
-
Model/Asset Security: End-to-end security for models and assets, along with enhanced security for policy execution β To be announced.
π’ Communications
- π§ Email: [email protected]
- π¬ Discord: OpenCyberspace
- π¦ X (Twitter): @opencyberspace
πββοΈ Call for Contributors
AIGrid is an open, collaborative project β and weβre actively looking for contributors who resonate with the mission of building open, plural, networked AI infrastructure.
π§ We Welcome:
-
Systems thinkers & protocol designers
Help refine the architecture of polycentric networks -
Distributed systems engineers
Build and scale the open execution layer -
AI/ML developers
Create interoperable cognitive modules and agent topologies -
Researchers in ethics, governance, trust, alignment, guardrails, incentives, economics
Design and evolve the policy layers -
Writers & communicators
Help document, narrate, and amplify the vision -
Hackers, tinkerers, visionaries
If this speaks to you β youβre already one of us
π Whether you want to:
- Co-design AI primitives
- Propose a new kind of network
- Experiment with governance models
- Help run a sovereign AIGrid node, cluster, or network
Weβd love to hear from you.
π Join the Collective
π§ Reach Out [email protected]
Letβs co-create an open & networked AI future β plural, sovereign, and evolving.