Benchmarks
After deploying a model, you can utilize the xCloud SDK to conduct performance benchmarks and assess your model's latency and throughput.
To accomplish this, simply execute the following code snippet:
from xcloud import benchmark_endpoint
benchmark_endpoint(
url="http://my-deployment-predictor-default.64d68f0219efcfdf47fb2bc5.inference.xcloud.stochastic.ai/v1/models/custom-model:predict", # Specify the deployment's inference endpoint URL here
headers={
"x-api-key": "NA" # Insert the deployment API key if applicable
},
json_request_body={
"instances": ["I love xCloud"]
},
http_method="POST",
concurrent_workers=32,
send_requests_duration=180
)
info
You can customize the following parameters for configuration:
- url: the URL to which the requests will be sent.
- headers: headers included in the request.
- json_request_body: JSON content transmitted to the model.
- http_method: HTTP method used for the request.
- concurrent_workers: the number of concurrent requests. This simulates the simultaneous usage of your model by multiple users.
- send_requests_duration: the benchmark's duration in seconds. It will continue to send requests until this specified time elapses.
- timeout_per_request: timeout duration per request in seconds.
- output_file: output path for saving the results in a CSV file. If no file is specified, the results will be displayed in the standard output.