Benchmarks

After deploying a model, you can utilize the xCloud SDK to conduct performance benchmarks and assess your model's latency and throughput.

To accomplish this, simply execute the following code snippet:

from xcloud import benchmark_endpoint

benchmark_endpoint(
    url="http://my-deployment-predictor-default.64d68f0219efcfdf47fb2bc5.inference.xcloud.stochastic.ai/v1/models/custom-model:predict", # Specify the deployment's inference endpoint URL here
    headers={
        "x-api-key": "NA" # Insert the deployment API key if applicable
    },
    json_request_body={
        "instances": ["I love xCloud"]
    },
    http_method="POST",
    concurrent_workers=32,
    send_requests_duration=180
)

info

You can customize the following parameters for configuration:

url: the URL to which the requests will be sent.
headers: headers included in the request.
json_request_body: JSON content transmitted to the model.
http_method: HTTP method used for the request.
concurrent_workers: the number of concurrent requests. This simulates the simultaneous usage of your model by multiple users.
send_requests_duration: the benchmark's duration in seconds. It will continue to send requests until this specified time elapses.
timeout_per_request: timeout duration per request in seconds.
output_file: output path for saving the results in a CSV file. If no file is specified, the results will be displayed in the standard output.