Inference flows
Users can submit inference tasks to the available blocks and vDAGs through inference servers and vDAG controllers, respectively.
Here are the inference flows:
1. Submitting inference requests to a block:
-
Discover available inference servers
-
Submit inference task to a block by specifying the
block_id
-
Inference with
block_id
and files attached asFileInfo
elements -
Inference with
query_parameters
using similarity search -
Inference using FrameDB as storage by specifying
frame_ptr
-
Inference using Graphs
2. vDAG inference
-
Discover vDAG controllers
-
Submit inference task
1. Block inference:
1.1 Discovering inference servers:
To submit an inference task, first select an inference server from the registry.
The Inference Server Registry lists all inference servers for discovery purposes. Users can add their inference servers to this global registry if they want them to be publicly accessible.
Schema:
Here is the data class used to represent an inference server in the registry:
@dataclass
class InferenceServer:
inference_server_id: str = field(default_factory=lambda: str(uuid.uuid4()))
inference_server_name: str = ''
inference_server_metadata: Dict[str, str] = field(default_factory=dict)
inference_server_tags: List[str] = field(default_factory=list)
inference_server_public_url: str = ''
Sample entry:
{
"inference_server_name": "inference-server-us-east-1",
"inference_server_metadata": {
"region": "us-east-1",
"availability_zone": "us-east-1a",
"provider": "AWS",
"cluster_id": "cluster-123",
"quota_management_data": {
"requests_per_second_total": 100,
"requests_per_second_per_session": 10,
"requests_per_second_per_block_id": 10,
"requests_per_session_id": 1000,
"requests_per_block_id": 100
}
},
"inference_server_tags": ["NLP", "Transformer", "BERT"],
"inference_server_public_url": "https://us-east-1.inference.example.com"
}
For more details about the inference server registry, refer to this documentation.
Query endpoint:
Endpoint:
POST /inference_servers
Description:
Retrieves a list of inference servers matching specific criteria using MongoDB-style queries.
Example Query: Find all servers in cluster-123
curl -X POST <inference-servers-registry>/inference_servers \
-H "Content-Type: application/json" \
-d '{
"inference_server_metadata.cluster_id": "cluster-123"
}'
For more information about the inference server registry, refer to this documentation
Once you find the required inference server, use inference_server_public_url
as the gRPC endpoint for submitting inference tasks.
1.2 Submit inference task to a block using an inference server by specifying the block_id
The simplest and most direct way to submit an inference task is by specifying the block_id
. The task will be submitted directly to the block, and the output will be returned.
For example Python client code, refer to this documentation.
For the schema of the inference task, refer to this documentation
1.3 Inference with block_id
and files attached as FileInfo
elements
Users can also attach files along with the inference task request. These files will be accessible to the block and can be used for processing.
For example Python client code, refer to this documentation.
For the schema of the inference task, refer to this documentation
1.4 Inference with query_parameters
using similarity search
If the block_id
is not known, users can specify query_parameters
to perform a similarity search across all available blocks. A matching block will then be selected as the target block.
For example Python client code, refer to this documentation.
For the schema of the inference task, refer to this documentation
For writing similarity search queries, refer to this documentation
1.5 Inference using FrameDB as storage by specifying frame_ptr
If very large files need to be submitted with the task, it is recommended to insert the file into FrameDB and submit the FrameDB pointer along with the task data.
For example Python client code, refer to this documentation.
For the schema of the inference task, refer to this documentation
1.6 Inference using Graphs
The inference server allows users to define a dynamic graph across multiple known blocks to execute an inference workflow. These dynamic graphs are not vDAGs—they do not support pre/post-processing policies and assume interoperability of inputs and outputs across blocks.
For example Python client code, refer to this documentation.
For the schema of the inference task, refer to this documentation
vDAG inference
2.1 Discovering vDAG controllers
The vDAG controller for the provided vdagURI
must be discovered first in order to connect and submit an inference task.
Users can query the vDAG registry to obtain the corresponding vDAG controller, if available, for the given vDAG. The vDAG controller registry stores all vDAG controllers created to serve vDAG inference requests.
Here is the schema of a vDAG controller:
@dataclass
class vDAGController:
# Unique identifier for the vDAG controller instance
vdag_controller_id: str = ''
# Associated vDAG URI this controller is managing
vdag_uri: str = ''
# Publicly accessible URL for interacting with the controller
public_url: str = ''
# Identifier of the cluster where the controller is deployed
cluster_id: str = ''
# Arbitrary metadata for storing additional information
metadata: Dict[str, Any] = field(default_factory=dict)
# Configuration parameters used by the controller
config: Dict[str, Any] = field(default_factory=dict)
# Tags used for search and discovery of the controller
search_tags: List[str] = field(default_factory=list)
For more details about the vDAG controller registry, refer to this documentation.
Endpoint: /vdag-controllers/by-vdag-uri/:vdag_uri
Method: GET
Description:
Fetches all vDAG controller documents associated with the given vdag_uri
.
Example curl Command:
curl -X GET http://<vdag-registry>/vdag-controllers/by-vdag-uri/sample-vdag:1.0-stable
The public_url
field of the vDAG controller can be used as the gRPC server URL for submitting the task.
2.2 Submit inference task
Inference tasks for a vDAG can be submitted using the vDAG controller as the gateway. The vDAG controller provides a gRPC API through which tasks can be submitted.
For example Python client code, refer to this documentation.
For the schema of the inference task, refer to this documentation