Getting started with xCloud
1. Before you begin
- Install the xCloud Python SDK.
pip install xcloud --upgrade
- Configure xCloud with your API key and default workspace. You can get them in
Settings (menu) > Credentials
2. Deploy your first optimized model
- Deploy the model. You will deploy the
tiiuae/falcon-7b-instruct
model from the HuggingFace Hub using an L4 GPU.
from xcloud import Deployment, MachineType, DeploymentContainerSpecs, DeploymentOptimizationSpecs, ModelConfig, ModelType, DTYPES, GenerationParams
from xcloud import DeploymentsClient
deployment_name = "falcon-opt"
container_specs = DeploymentContainerSpecs(
machine_type=MachineType.GPU_L4_1,
spot=True,
optimization_specs=DeploymentOptimizationSpecs(
model_type=ModelType.FALCON,
model_config=ModelConfig(
model_path="tiiuae/falcon-7b-instruct",
tokenizer_path="tiiuae/falcon-7b-instruct",
dtype=DTYPES.FP16,
generation_params=GenerationParams(max_tokens=1024)
)
)
)
deployment = Deployment(deployment_name=deployment_name, container_specs=container_specs)
deployment = DeploymentsClient.create_deployment(deployment)
- Wait until the deployment is ready.
DeploymentsClient.wait_until_deployment_is_ready(deployment_name=deployment_name)
- Do the inference.
deployment = DeploymentsClient.get_deployment_by_name(deployment_name)
import requests
prompt = "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:"
response = requests.post(
url=deployment.inference.infer_endpoint,
json={"instances": [prompt]}
)
response.raise_for_status()
response.json()