Using your own VPC

tip

Requirements before starting to work with deployments.

Models API

xCloud allows you to easily fine-tune and deploy some open source models. The distinction between deployments and model APIs lies in the fact that with the latter there's no requirement for you to craft custom inference code or be concerned about optimizations and related aspects – we manage all of that on your behalf. Your unique task would be to configure the fine-tuning job, like the number of epochs and the data that you want to use, and the deployment.

The current supported models that you can deploy or use as base models to do your fine-tunings are:

xCloud-LLaMA-V2-7B: it uses meta-llama/Llama-2-7b-hf from HuggingFace
xCloud-LLaMA-V2-13B: it uses meta-llama/Llama-2-13b-hf from HuggingFace
xCloud-LLaMA-V2-7B-Chat: it uses meta-llama/Llama-2-7b-chat-hf from HuggingFace
xCloud-LLaMA-V2-13B-Chat: it uses meta-llama/Llama-2-13b-chat-hf from HuggingFace

On this tutorial we will fine-tune the xCloud-LLaMA-V2-7B model on the Dolly dataset, an open-source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

1. Finetuning

1.1. Prepare your dataset

Your initial task involves converting the Dolly dataset into the xCloud format. For that you will have to accomplish the following steps. You can check the format itself in the info box that you can find below.

Load the Dolly dataset using the HF datasets library and write a function to format it.

from datasets import load_dataset

dataset = load_dataset("databricks/databricks-dolly-15k")

def format_samples(sample):
    sample["input"] = "### Instruction ###\n{}\n\n{}".format(sample['instruction'], sample['context'])
    sample["output"] = "Response: {}".format(sample['response'])

    return sample

Get a random sample and print it to confirm your satisfaction with the format.

from random import randrange

example = format_samples(dataset['train'][randrange(len(dataset))])
print(example)

Finally, format your dataset and save it in your local file system in the JSONL format.

# Format dataset
format_dataset = dataset.map(
    format_samples,
    remove_columns=dataset["train"].column_names,
)

# Save it in JSONL format
format_dataset['train'].to_json("dolly_xcloud.jsonl")

info

xCloud dataset format for unsupervised fine-tuning

The file that you provide should be a JSONL file with the field text in every JSON document. Below you can find an example:

{"text": "This is a example"} 
{"text": "Unservised fine-tuning example"}

xCloud dataset format for instruction fine-tuning

The file you supply must be in JSONL format, containing input and output fields in each JSON document. The input field represents the input you'd feed to the model during inference, while the output field should reflect the model's intended inference output. Below, an illustrative example is provided:

{"input":"### Instruction ###\nWhich is a species of fish? Tope or Rope\n\n","output":"Response: Tope"}
{"input":"### Instruction ###\nWhy can camels survive for long without water?\n\n","output":"Response: Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time."}

1.2. Configure the fine-tuning job

Upload your dataset to S3. In this guide we will be using the following S3 base path to save all the artifacts s3://xcloud/on_premise_model_apis/.

Create the fine-tuning job. You will have to specify the path where you have saved your dataset (dataset_cloud_path), where you want to save the fine-tuning code that will be automatically uploaded to your S3 path (saving_finetuning_code_cloud_path), where you want to save the fine-tuned model (saving_model_cloud_path), the credentials to access S3 (credentials) and finally your cloud link (link_name).

from xcloud import (
    OnPremiseModelsAPIFinetuning, 
    ModelsAPIFinetuningType, 
    ModelsAPIModelFamily, 
    OnPremiseModelsAPIClient,
    Credentials,
    Cloud
)

import os

finetuning_name = "my-first-finetuning"
link_name = "my-link"

finetuning = OnPremiseModelsAPIFinetuning(
    finetuning_name=finetuning_name,
    finetuning_type=ModelsAPIFinetuningType.INSTRUCTION_FINETUNING,
    model_family=ModelsAPIModelFamily.LLAMA_V2_7B,
    num_epochs=1,
    dataset_cloud_path="s3://xcloud/on_premise_model_apis/dolly_xcloud.jsonl",
    saving_finetuning_code_cloud_path="s3://xcloud/on_premise_model_apis/code",
    saving_model_cloud_path="s3://xcloud/on_premise_model_apis/finetuned_model",
    credentials=Credentials(
        cloud=Cloud.AWS,
        aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
        aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
        aws_region=os.environ["AWS_DEFAULT_REGION"]
    ),
    link_name=link_name
)

finetuning = OnPremiseModelsAPIClient.create_finetuning(finetuning=finetuning)

Check the status of the fine-tuning job.

finetuning = OnPremiseModelsAPIClient.get_finetuning_by_name(finetuning_name=finetuning_name)
print(finetuning.status)

info

Below you can find the properties of a fine-tuning object.

finetuning_name: the name for your fine-tuning job.
finetuning_type: you can select between ModelsAPIFinetuningType.UNSUPERVISED_FINETUNING and ModelsAPIFinetuningType.INSTRUCTION_FINETUNING
model_family: the base model that will be used for the fine-tuning. You can select between:
- ModelsAPIModelFamily.LLAMA_V2_7B: it uses meta-llama/Llama-2-7b-hf from HuggingFace
- ModelsAPIModelFamily.LLAMA_V2_13B: it uses meta-llama/Llama-2-13b-hf from HuggingFace
- ModelsAPIModelFamily.LLAMA_V2_7B_CHAT: it uses meta-llama/Llama-2-7b-chat-hf from HuggingFace
- ModelsAPIModelFamily.LLAMA_V2_13B_CHAT: it uses meta-llama/Llama-2-13b-chat-hf from HuggingFace
num_epochs: the number of epochs that you want to train your model.
dataset_cloud_path: the cloud path of the dataset used for the fine-tuning.
saving_finetuning_code_cloud_path: S3 path where the fine-tuning code will automatically updated.
saving_model_cloud_path: S3 path where the fine-tuned model will be saved.
credentials: the credentials to access S3. The credentials are not stored in our databases, that's why when you fetch a fine-tuning job the credentials will be empty.
link_name: compute cluster link in which the fine-tuning job will be executed.
status: status of the fine-tuning job.
metadata: contains information related to the creation, cancellation, and deletion of the finetuning job.

2. Deployments

2.1. Configure the deployment

Create the deployment.

from xcloud import (
    OnPremiseModelsAPIDeployment, 
    DeploymentSpecs, 
    Scaling, 
    ModelsAPIModelFamily, 
    Credentials, 
    Cloud
)

deployment_name = "my-first-deployment"

deployment = OnPremiseModelsAPIDeployment(
    deployment_name=deployment_name,
    deployment_specs=DeploymentSpecs(
        authentication=True,
        scaling=Scaling(min_replicas=1, max_replicas=1)
    ),
    model_family=ModelsAPIModelFamily.LLAMA_V2_7B,
    model_cloud_path="s3://xcloud/on_premise_model_apis/finetuned_model",
    credentials=Credentials(
        cloud=Cloud.AWS,
        aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
        aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
        aws_region=os.environ["AWS_DEFAULT_REGION"]
    ),
    link_name="test-link"
)

from xcloud import OnPremiseModelsAPIClient

deployment = OnPremiseModelsAPIClient.create_deployment(deployment)

Get your deployment.

from xcloud import OnPremiseModelsAPIClient

deployment = OnPremiseModelsAPIClient.get_deployment_by_name(deployment_name)

info

A deployment object has the following properties:

workspace_id: workspace in which the deployment was created.
deployment_name: the deployment name.
status: status of the deployment.
model_family: the model family (type) that will be deployed. It is needed to apply specific optimizations to that model.
model_cloud_path: S3 path containing the model weights. You can use the weights of a fine-tuned model. If this property is None, the default weights (depending on the model family) will be used.
credentials: the credentials to access S3. The credentials are not stored in our databases, that's why when you fetch a deployment job the credentials will be empty.
link_name: compute cluster link where the deployment will be running.
deployment_specs: deployment configurations like the autoscaling, auth and dynamic batching.
- batcher (dynamic batching): you can define the maximum batch size and waiting time (in milliseconds). The server will wait for a maximum of max_latency to create a batch of up to max_batch_size samples.
- scaling: You can set the min_replicas that will always be running, and the max_replicas that will scale up based on the chosen metric. Two available metrics for autoscaling deployments are:
  - SCALE_METRIC.CONCURRENCY: represents the number of simultaneous requests that each replica can process. With a target_scaling_value of 1, if there are 3 concurrent requests, the deployment will scale up to 3 replicas.
  - SCALE_METRIC.RPS: sets a target for requests per second per replica.
- authentication: when set to True, an API key will be generated for the deployment, which must be used for making inferences.
inference: information needed to do the inference
- ready_endpoint: to check if the deployment is ready to start receiving inferences.
- infer_endpoint: the endpoint where the inferences should be sent.
- api_key: API key in case the authentication is enabled.
metadata: contains information related to the creation, cancellation, and deletion of the deployment.

Wait until the deployment is ready

OnPremiseModelsAPIClient.wait_until_deployment_is_ready(deployment_name=deployment_name)

Do the inference.

import requests

response = requests.post(
    url=deployment.inference.infer_endpoint,
    headers={
        "x-api-key": deployment.inference.api_key
    },
    json={
        "instances": ["### Instruction ###\nGenerate a joke\n\nResponse:"]
    }
)

response.raise_for_status()

response.json()

You can delete the deployment if you no longer want to access it.

deleted_deployment = OnPremiseModelsAPIClient.delete_deployment(deployment_name=deployment_name)

Models API

1. Finetuning​

1.1. Prepare your dataset​

1.2. Configure the fine-tuning job​

2. Deployments​

2.1. Configure the deployment​

1. Finetuning

1.1. Prepare your dataset

1.2. Configure the fine-tuning job

2. Deployments

2.1. Configure the deployment