Skip to main content

Benchmarks

After deploying a model, you can utilize the xCloud SDK to conduct performance benchmarks and assess your model's latency and throughput.

To accomplish this, simply execute the following code snippet:

from xcloud import benchmark_endpoint

benchmark_endpoint(
url="http://my-deployment-predictor-default.64d68f0219efcfdf47fb2bc5.inference.xcloud.stochastic.ai/v1/models/custom-model:predict", # Specify the deployment's inference endpoint URL here
headers={
"x-api-key": "NA" # Insert the deployment API key if applicable
},
json_request_body={
"instances": ["I love xCloud"]
},
http_method="POST",
concurrent_workers=32,
send_requests_duration=180
)
info

You can customize the following parameters for configuration:

  • url: the URL to which the requests will be sent.
  • headers: headers included in the request.
  • json_request_body: JSON content transmitted to the model.
  • http_method: HTTP method used for the request.
  • concurrent_workers: the number of concurrent requests. This simulates the simultaneous usage of your model by multiple users.
  • send_requests_duration: the benchmark's duration in seconds. It will continue to send requests until this specified time elapses.
  • timeout_per_request: timeout duration per request in seconds.
  • output_file: output path for saving the results in a CSV file. If no file is specified, the results will be displayed in the standard output.