Cloudfunction

Functions

class nvcf.api.asset.AssetAPI(api_client)
delete(
asset_id: str,
starfleet_api_key: str | None = None,
) None

Delete a given asset, removing the ability to use for future invocations.

Parameters:
  • asset_id -- A unique identifier for the asset.

  • starfleet_api_key -- An API key with access to manage assets.

Returns:

JSON Response of NVCF function information.

Return type:

dict

info(
asset_id: str,
starfleet_api_key: str | None = None,
) dict

Get metadata about a given function and version id.

Parameters:
  • asset_id -- A unique identifier for the asset.

  • starfleet_api_key -- An API key with access to manage assets.

Returns:

JSON Response of NVCF function information.

Return type:

dict

list(starfleet_api_key: str | None = None) dict

List assets available to the account.

Parameters:

starfleet_api_key -- api key with access to invoke functions

Returns:

Keyed List of Functions.

Return type:

dict

upload(
path: str,
description: str,
starfleet_api_key: str | None = None,
) dict

Upload a given metadata about a given function and version id.

Parameters:
  • asset_id -- A unique identifier for the asset.

  • starfleet_api_key -- An API key with access to manage assets.

Returns:

JSON Response of NVCF function information.

Return type:

dict

class nvcf.api.function.FunctionAPI(api_client)
create(
name: str,
inference_url: str,
health_uri: str | None = None,
container_image: str | None = None,
helm_chart: str | None = None,
helm_chart_service: str | None = None,
models: list[str] | None = None,
function_id: str | None = None,
inference_port: int | None = None,
container_args: str | None = None,
api_body_format: str | None = None,
container_environment_variables: list[str] | None = None,
tags: list[str] | None = None,
resources: list[str] | None = None,
*,
function_type: str = 'DEFAULT',
health_expected_status_code: int | None = None,
health_port: int | None = None,
health_timeout: str | None = None,
health_protocol: str | None = None,
description: str | None = None,
secrets: list[str] | None = None,
json_secrets: list[tuple[str, bytes]] | None = None,
logs_telemetry_id: str | None = None,
metrics_telemetry_id: str | None = None,
traces_telemetry_id: str | None = None,
rate_limit_pattern: str | None = None,
rate_limit_exempt_nca_ids: list[str] | None = None,
rate_limit_sync_check: bool | None = None,
) DotDict

Create a function with the input specification provided by input.

Parameters:
  • name -- Display name of the function.

  • inference_url -- Endpoint you wish to use to do invocations.

  • health_uri -- Health endpoint for inferencing

  • container_image -- Container Image.

  • models -- NGC models. In form [override_name:]model

  • helm_chart -- Helm Chart URL.

  • helm_chart_service -- Only necessary when helm chart is specified.

  • function_id -- If provided, generate another version of the same function.

  • inference_port -- Optional port override which inference is forwarded to.

  • container_args -- Optional list of arguments to provide to container.

  • api_body_format -- Optional body format to use.

  • container_environment_variables -- List of key pair values to pass as variables to container. In form ["key1:value1", "key2:value2"]

  • tags -- Optional list of tags to create the function with.

  • resources -- Optional list of resources.

Keyword Arguments:
  • function_type -- Used to indicate a streaming function, defaults to DEFAULT.

  • health_port -- Port number where the health listener is running.

  • health_protocol -- HTTP/gPRC protocol type for health endpoint. Choices ["HTTP, gRPC"].

  • health_timeout -- ISO 8601 duration string in PnDTnHnMn.nS format.

  • health_expected_status_code -- Expected return status code considered as successful.

  • description -- Optional function/version description.

  • secrets -- Optional secret key/value pairs. In form ["key1:value1", "key2:value2"]

  • json_secrets -- Optional secret key/value pairs. In form [("key", {"jsonkey":1, "jsonkey2":{"nestedkey1":"nestedvalue"}}]

  • logs_telemetry_id -- UUID of telemetry log endpoint to map to function.

  • metrics_telemetry_id -- UUID of telemetry metrics endpoint to map to function.

  • traces_telemetry_id -- UUID of telemetry traces endpoint to map to function.

  • rate_limit_pattern -- Rate limit, format NUMBER-S|M|H|D, ex: 3-S.

  • rate_limit_exempt_nca_ids -- exempt NCA Ids.

  • rate_limit_sync_check -- Rate limit sync check.

Raises:
  • InvalidArgumentError -- If neither container image, models, or helm chart is provided, this is thrown.

  • ResourceNotFoundException -- If the image or model or helm chart cannot be found.

Returns:

Function Response provided by NVCF

Return type:

DotDict

delete(function_id: str, function_version_id: str)

Delete a function version id.

Parameters:
  • function_id -- Function's ID.

  • function_version_id -- Function's version ID.

Returns:

JSON Response of NVCF function information.

Return type:

DotDict

info(
function_id: str,
function_version_id: str,
) DotDict

Get information about a given function version.

Parameters:
  • function_id -- Function's ID.

  • function_version_id -- Function's version ID.

Returns:

JSON Response of NVCF function information.

Return type:

DotDict

invoke(
function_id: str,
payload: dict,
function_version_id: str | None = None,
starfleet_api_key: str | None = None,
asset_ids: list[str] | None = None,
output_zip_path: str | None = None,
polling_request_timeout: int | None = 300,
pending_request_timeout: int | None = 600,
pending_request_interval: float | None = 1.0,
) DotDict
Parameters:
  • function_id -- ID of NVCF Function being invoked.

  • payload -- JSON payload specific to the function you are invoking.

  • SPEC. (The shape should adhere to your function's API)

  • starfleet_api_key -- Key with invocation access to the function.

  • function_version_id -- Optionally provide a version id to invoke a specific version of a function.

  • asset_ids -- Asset ids that are referenced in the payload.

  • output_zip -- If output provides a zip file, this is the location to save the zip file.

Raises:

NgcException -- Matching HTTP Response code if fails in any way.

Returns:

Dictionary corresponding to JSON response from function invoked.

invoke_grpc(
function_id: str,
starfleet_api_key: str,
function_request: Any,
grpc_stub_function: Callable,
function_version_id: str | None = None,
) Any
Parameters:
  • function_id -- ID of GRPC NVCF Function being invoked.

  • starfleet_api_key -- Key with invocation access to the function.

  • function_request -- GRPC Payload specific to the function you are invoking.

  • grpc_stub_function -- GRPC Stub function to invoke.

  • function_version_id -- Optionally provide a version id to invoke a specific version of a function.

Raises:

NgcException -- Matching HTTP Response code if fails in any way.

Returns:

GRPC Response of function invocation.

Return type:

Any

invoke_grpc_triton(
function_id: str,
function_request,
starfleet_api_key: str | None = None,
function_version_id: str | None = None,
)
Parameters:
  • function_id -- ID of Triton based GRPC NVCF Function being invoked.

  • function_request -- GRPC Payload specific to the function you are invoking.

  • starfleet_api_key -- Key with invocation access to the function.

  • function_version_id -- Optionally provide a version id to invoke a specific version of a function.

Returns:

GRPC Response of function invocation

Return type:

ModelInferResponse

invoke_stream(
function_id: str,
payload: dict,
starfleet_api_key: str | None = None,
function_version_id: str | None = None,
asset_ids: list[str] | None = None,
request_timeout: int | None = 300,
) Generator[bytes, None, None]
Parameters:
  • function_id -- ID of NVCF Function being invoked.

  • payload -- JSON payload specific to the function you are invoking.

  • SPEC. (The shape should adhere to your function's API)

  • starfleet_api_key -- Key with invocation access to the function.

  • function_version_id -- Optionally provide a version id to invoke a specific version of a function.

Raises:

NgcException -- Matching HTTP Response code if fails in any way.

Returns:

Streaming response of function invocation.

Return type:

Generator[bytes, None, None]

invoke_stream_grpc(
function_id: str,
starfleet_api_key: str,
function_request: Any,
grpc_stub_function: Callable,
function_version_id: str | None = None,
) Any
Parameters:
  • function_id -- ID of GRPC NVCF Function being invoked.

  • starfleet_api_key -- Key with invocation access to the function.

  • function_request -- GRPC Payload specific to the function you are invoking.

  • grpc_stub_function -- GRPC Stub function to invoke.

  • function_version_id -- Optionally provide a version id to invoke a specific version of a function.

Raises:

NgcException -- Matching HTTP Response code if fails in any way.

Returns:

GRPC Response of function invocation.

Return type:

Any

invoke_stream_grpc_triton(
function_id: str,
function_request: Any,
starfleet_api_key: str | None = None,
function_version_id: str | None = None,
) list[Any]
Parameters:
  • function_id -- ID of Triton based GRPC NVCF Function being invoked.

  • function_request -- GRPC Payload specific to the function you are invoking.

  • starfleet_api_key -- Key with invocation access to the function.

  • function_version_id -- Optionally provide a version id to invoke a specific version of a function.

Returns:

Streaming response of function invocation

Return type:

ModelStreamInferResponse

list(
function_id: str | None = None,
name_pattern: str | None = None,
access_filter: list[str] | None = None,
) DotDict

List functions available to the organization. Currently set.

Parameters:
  • function_id -- Optional parameter to list only versions of a specific function. Defaults to None.

  • name_pattern -- Optional parameter to filter functions that contain this name. Supports wildcards.

  • access_filter -- Optional parameter to filter functions by their access

  • ["private" (to the account to)

  • "public"

  • "authorized"].

Returns:

Keyed List of Functions.

Return type:

dict

class nvcf.api.authorization.FunctionAuthorizationAPI(api_client: Client = None)
add(
function_id: str,
function_version_id: str | None = None,
nca_id: str | None = None,
) DotDict

Authorize additional NCA ids to invoke this function/function version.

Parameters:
  • function_id -- Function's ID.

  • function_version_id -- Function's version ID.

  • nca_id -- NCA ID of party you wish to authorize.

Returns:

JSON Response of NVCF function information.

Return type:

dict

clear(
function_id: str,
function_version_id: str | None = None,
) DotDict

Delete all extra account authorizations for a given function/function version.

Parameters:
  • function_id -- Function's ID.

  • function_version_id -- Function's version ID.

Returns:

JSON Response of NVCF function information.

Return type:

dict

info(
function_id: str,
function_version_id: str | None = None,
) DotDict

Get account authorization about a given function/function version.

Parameters:
  • function_id -- Function's ID.

  • function_version_id -- Function's version ID.

Returns:

JSON Response of NVCF function information.

Return type:

dict

remove(
function_id: str,
function_version_id: str | None = None,
nca_id: str | None = None,
) DotDict

Remove authorization for clients to invoke this function/function version.

Parameters:
  • function_id -- Function's ID.

  • function_version_id -- Function's version ID.

  • nca_id -- NCA ID of party you wish to authorize.

Returns:

JSON Response of NVCF function information.

Return type:

dict

class nvcf.api.deployment_spec.DeploymentSpecification(
backend: str,
gpu: str,
min_instances: int,
max_instances: int,
instance_type: str | None = None,
availability_zones: list[str] | None = None,
max_request_concurrency: int | None = None,
configuration: dict | None = None,
)

Represents a deployment specification for NVCF.

class nvcf.api.deploy.DeployAPI(api_client: Client = None)
create(
function_id: str,
function_version_id: str,
deployment_specifications: list[DeploymentSpecification] | None = None,
targeted_deployment_specifications: list[TargetedDeploymentSpecification] | None = None,
) DotDict

Create a deployment with a function id, version and a set of available deployment specifications.

delete(
function_id: str,
function_version_id: str,
*,
graceful: bool = False,
)

Delete a given deployment.

info(
function_id: str,
function_version_id: str,
) DotDict

Get information about a given function's deployment.

query_logs(
function_id: str,
function_version_id: str,
start_time: datetime | None = None,
end_time: datetime | None = None,
duration: timedelta | None = None,
) Iterator[dict]

Deployment logs.

Parameters:
  • function_id -- Id of function logs are pulled from.

  • duration -- Specifies the duration of time, either after begin-time or before end-time. Format: [nD][nH][nM][nS]. Default: 1 day, doesn't respect decimal measurements.

  • start_time -- Specifies the start time for querying logs. Default: None.

  • end_time -- Specifies the end_time time for querying logs. Default: Now.

  • function_version_id -- Optional version to specify for function id.

Returns:

Use to recieve logs one by one.

Return type:

Iterator

update(
function_id: str,
function_version_id: str,
deployment_specifications: list[DeploymentSpecification] | None = None,
targeted_deployment_specifications: list[TargetedDeploymentSpecification] | None = None,
) DotDict

Update a given deployment.

class nvcf.api.task.TaskAPI(api_client: Client = None)
cancel(task_id)

Cancel a task.

Parameters:

task_id -- The task to cancel

create(
name: str,
container_image: str | None = None,
container_args: str | None = None,
container_environment_variables: list[str] | None = None,
gpu_specification: GPUSpecification | None = None,
models: list[str] | None = None,
resources: list[str] | None = None,
tags: list[str] | None = None,
description: str | None = None,
max_runtime_duration: Duration | None = None,
max_queued_duration: Duration | None = None,
termination_grace_period_duration: Duration | None = None,
result_handling_strategy: str = 'UPLOAD',
result_location: list[str] | None = None,
secrets: list[str] | None = None,
helm_chart: str | None = None,
logs_telemetry_id: str | None = None,
metrics_telemetry_id: str | None = None,
traces_telemetry_id: str | None = None,
)

Create a task with the specification provided by input.

Parameters:
  • name -- Display name of the task.

  • container_image -- Container image.

  • container_args -- Container args.

  • container_environment_variables -- Container environment variables.

  • gpu_specification -- GPU specifications.

  • models -- NGC models.

  • resources -- NGC resources.

  • tags -- Optional list of tags to create the function with.

  • max_runtime_duration -- Maximum runtime duration for task. Defaults to forever.

  • max_queued_duration -- Maximum queued duration for task. Defaults to 72 hours.

  • termination_grace_period_duration -- Grace period after termination. Defaults to 1 hour.

  • description -- Description of the task.

  • result_handling_strategy -- How results should be handled.

  • result_location -- Where results should be stored. Required if result_handling_strategy is UPLOAD.

  • secrets -- Optional secret key/value pairs. Form: ["key1:value1", "key2:value2"].

  • helm_chart -- Helm Chart URL.

  • logs_telemetry_id -- UUID of telemetry log endpoint to map to task.

  • metrics_telemetry_id -- UUID of telemetry metrics endpoint to map to task.

  • traces_telemetry_id -- UUID of telemetry traces endpoint to map to task.

Raises:
  • InvalidArgumentError -- If result handling strategy is set to upload and required fields aren't provided.

  • ResourceNotFound -- If the image or resource cannot be found.

Returns:

Function Response provided by NVCF

Return type:

DotDict

delete(task_id)

Delete a task.

Parameters:

task_id -- The task to delete.

events(
task_id: str,
limit: int = 100,
) Iterable[DotDict]

Get a list of the task's events.

Returns:

Iterable of DotDict of task events.

Return type:

Iterable[DotDict]

info(task_id: str) DotDict

Get information about a given task.

Parameters:

task_id -- The task to get information about.

Returns:

DotDict of task information.

Return type:

dict

list(limit: int = 100) list[DotDict]

List tasks available to the organization currently set.

Returns:

A list of task DotDicts.

logs(
task_id: str,
limit: int = 100,
start_time: datetime | None = None,
end_time: datetime | None = None,
duration: timedelta | None = None,
) Iterable[DotDict]

Task deployment logs.

Returns:

Iterable of DotDict of task logs.

Return type:

Iterable[DotDict]

results(
task_id: str,
limit: int = 100,
) Iterable[DotDict]

Get a list of the task's results.

Returns:

Iterable of DotDict of task results

Return type:

Iterable[DotDict]

Examples

Import the NGCSDK

from ngcsdk import Client
from nvcf.api.deployment_spec import TargetedDeploymentSpecification
from nvcf.api.invocation_handler import HTTPSInvocationHandler

Create and configure your client - put in your api_key, org_name, team name

key = "nvapi-***"
clt = Client()
clt.configure(key)

List your current functions

functions = clt.cloud_function.functions.list(access_filter=["private"])["functions"]
functions[0]

Create a new function with existing NGC Resources

genslm_function = clt.cloud_function.functions.create(name="peek-sdk-demo-sdxl", inference_url="echo",\
                                             container_image="stg.nvcr.io/cv5p43s0htqh/echo:latest", models=["nvidia/genslm:latest"],\
                                             container_environment_variables=["key:vavlue", "CUSTOM:ENV_VAR"],\
                                               inference_port=8000, container_args="bash exec")

Delete Function

clt.cloud_function.functions.delete(function_id=function_id, function_version_id=function_version_id)

Deployment

List Available GPUS

clt.cloud_function.gpus.list().keys()
l40s_instance = clt.cloud_function.gpus.info("L40S").L40S[0]
l40s_instance.name, l40S_instance.currentInstances

Create a deployment

function_id, function_version_id = genslm_function.get("function").get("id"), genslm_function.get("function").get("versionId")
dep_specs = [TargetedDeploymentSpecification(gpu="L40S", instance_type=l40s_instance.name, min_instances=1, max_instances=1)]
genslm_deployment = clt.cloud_function.functions.deployments.create(function_id=function_id, function_version_id=function_version_id, targeted_deployment_specifications=dep_specs)

Undeploy

clt.cloud_function.functions.deployments.delete(function_id=function_id, function_version_id=function_version_id)

Function Authorizations

clt.cloud_function.functions.authorizations.add(function_id=function_id, function_version_id, nca_id="")
clt.cloud_function.functions.authorizations.clear(function_id=function_id, function_version_id)

Function Invocations

## You can always provide a new key, or you can use the one you provided in the client configuration
SAK = "nvapi-XXX"

Asset management

clt.cloud_function.assets.list(SAK)
resp = clt.cloud_function.assets.upload("/path/to/image.png", "description", SAK)
clt.cloud_function.assets.delete(resp.get("asset_id"), SAK)

HTTPS Blocking Invocation

payload = {
  "messages": [
    {
      "role": "user",
      "content": "Give me some python code to read a file named temp.txt"
    }
  ],
  "temperature": 0.2,
  "top_p": 0.7,
  "max_tokens": 512,
  "stream": False
}
clt.cloud_function.functions.invoke(function_id="eb1100de-60bf-4e9a-8617-b7d4652e0c37", payload=payload, starfleet_api_key=SAK)

Multiple Invocations

To reuse an HTTP session you can use the context manager.

with HTTPSInvocationHandler(starfleet_api_key=SAK) as invocation_handler:
    # Non-stream
    invocation_handler.make_invocation_request("0acb2d4a-47ed-45b8-935c-d93cba5b7485", payload)
    # Stream
    for line in invocation_handler.make_streaming_invocation_request(
        function_id="845ba840-8d13-4151-9682-0b4cdbc93d7b", data=payload
    ):
        print(line.decode("utf-8"))

Using assets in invocations

r = clt.cloud_function.assets.upload("/path/to/image.png", "description", SAK)
asset_id = r["asset_id"]

payload = {
    "inputs": [
        {
            "name": "image",
            "shape": [1],
            "datatype": "BYTES",
            "data": [f"{asset_id}"],
        },
    ],
    "outputs": [{"name": "image_generated", "datatype": "BYTES", "shape": [1]}],
}

resp = clt.cloud_function.functions.invoke(function_id, payload, starfleet_api_key=SAK, asset_ids=[asset_id], output_zip_path="/path/to/output.zip")

HTTPS Streaming Invocation

payload = {
  "messages": [
    {
      "role": "user",
      "content": "Give me some python code to read a file named temp.txt"
    }
  ],
  "temperature": 0.2,
  "top_p": 0.7,
  "max_tokens": 512,
  "stream": True
}
invoke_request = clt.cloud_function.functions.invoke_stream(function_id="eb1100de-60bf-4e9a-8617-b7d4652e0c37", payload=payload, starfleet_api_key=SAK)
for chunk in invoke_request:
    print(chunk.decode('utf-8'))

GRPC Invocation

MODEL_INFER_REQUEST = ModelInferRequest(model_name="simple_identity", inputs=[
    ModelInferRequest.InferInputTensor(
            name="INPUT0", datatype="BYTES",
            shape=[1, 1],
            contents=InferTensorContents(
                    bytes_contents=[b"hello world"]))], outputs=[
    ModelInferRequest.InferRequestedOutputTensor(name="OUTPUT0")])

MODEL_STREAM_REQUEST = ModelInferRequest(
    model_name="DialoGPT",
    inputs=[
        ModelInferRequest.InferInputTensor(
            name="new_inputs",
            datatype="BYTES",
            shape=[1, 1],
            contents=InferTensorContents(bytes_contents=[bytes("hello world", "utf-8")]),
        ),
    ]
,    outputs=[ModelInferRequest.InferRequestedOutputTensor(name="response")],
)
clt.cloud_function.functions.invoke_grpc_triton(function_id = "7dc17808-b067-41dd-bf32-78d242379b6f", function_request = MODEL_INFER_REQUEST, starfleet_api_key=SAK, debug=True)

Multiple Invocations

To reuse an HTTP session you can use the context manager.

with TritonGRPCInvocationHandler(SAK) as invocation_handler:
    # Blocking
    invocation_handler.make_invocation_request(
        function_id=function_id,
        function_request=ModelInferRequest,
        function_version_id=function_version_id,
        )
    # Streaming
    for resp in clt.cloud_function.functions.invoke_stream_grpc_triton(function_id, MODEL_INFER_REQUEST, SAK, function_version_id):
        print(resp)

Tasks

from isodate import parse_duration
from nvcf.api.deployment_spec import GPUSpecification
clt.cloud_function.tasks.list()
max_runtime_dur = parse_duration("PT1H")
output = clt.cloud_function.tasks.create("sdk-example", container_image="stg.nvcr.io/ax3ysqem02xw/tasks_sample:0.0.4", gpu_specification=GPUSpecification(gpu="T10", instance_type="g6.full", backend="GFN"), max_runtime_duration=max_runtime_dur, result_handl\
ing_strategy="NONE")
clt.cloud_function.tasks.info(output.task.id)
for res in clt.cloud_function.tasks.results(output.task.id):
    print(res