codeflare_sdk.ray.client package

Submodules

codeflare_sdk.ray.client.ray_jobs module

The ray_jobs sub-module contains methods needed to submit jobs and connect to Ray Clusters that were not created by CodeFlare. The SDK acts as a wrapper for the Ray Job Submission Client.

class codeflare_sdk.ray.client.ray_jobs.RayJobClient(address: str | None = None, create_cluster_if_needed: bool = False, cookies: Dict[str, Any] | None = None, metadata: Dict[str, Any] | None = None, headers: Dict[str, Any] | None = None, verify: str | bool | None = True)[source]

Bases: object

A wrapper class for the Ray Job Submission Client, used for interacting with Ray clusters to manage job submissions, deletions, and other job-related information.

Args:
address (Optional[str]):

The Ray cluster’s address, which may be either the Ray Client address, HTTP address of the dashboard server on the head node, or “auto” / “localhost:<port>” for a local cluster. This is overridden by the RAY_ADDRESS environment variable if set.

create_cluster_if_needed (bool):

If True, a new cluster will be created if not already running at the specified address. By default, Ray requires an existing cluster.

cookies (Optional[Dict[str, Any]]):

HTTP cookies to send with requests to the job server.

metadata (Optional[Dict[str, Any]]):

Global metadata to store with all jobs, merged with job-specific metadata during job submission.

headers (Optional[Dict[str, Any]]):

HTTP headers to send with requests to the job server, can be used for authentication.

verify (Optional[Union[str, bool]]):

If True, verifies the server’s TLS certificate. Can also be a path to trusted certificates. Default is True.

delete_job(job_id: str) -> (<class 'bool'>, <class 'str'>)[source]

Deletes a job by job ID.

Args:
job_id (str):

The unique identifier of the job to delete.

Returns:
tuple(bool, str):

A tuple with deletion status and a message.

get_address() str[source]

Retrieves the address of the connected Ray cluster.

Returns:
str:

The Ray cluster’s address.

get_job_info(job_id: str)[source]

Fetches information about a job by job ID.

Args:
job_id (str):

The unique identifier of the job.

Returns:
JobInfo:

Information about the job’s status, progress, and other details.

get_job_logs(job_id: str) str[source]

Retrieves the logs for a specific job by job ID.

Args:
job_id (str):

The unique identifier of the job.

Returns:
str:

Logs output from the job.

get_job_status(job_id: str) str[source]

Fetches the current status of a job by job ID.

Args:
job_id (str):

The unique identifier of the job.

Returns:
str:

The job’s status.

list_jobs() List[JobDetails][source]

Lists all current jobs in the Ray cluster.

Returns:
List[JobDetails]:

A list of job details for each current job in the cluster.

stop_job(job_id: str) -> (<class 'bool'>, <class 'str'>)[source]

Stops a running job by job ID.

Args:
job_id (str):

The unique identifier of the job to stop.

Returns:
tuple(bool, str):

A tuple with the stop status and a message.

submit_job(entrypoint: str, job_id: str | None = None, runtime_env: Dict[str, Any] | None = None, metadata: Dict[str, str] | None = None, submission_id: str | None = None, entrypoint_num_cpus: int | float | None = None, entrypoint_num_gpus: int | float | None = None, entrypoint_memory: int | None = None, entrypoint_resources: Dict[str, float] | None = None) str[source]

Submits a job to the Ray cluster with specified resources and returns the job ID.

Args:
entrypoint (str):

The command to execute for this job.

job_id (Optional[str]):

Deprecated, use submission_id. A unique job identifier.

runtime_env (Optional[Dict[str, Any]]):

The runtime environment for this job.

metadata (Optional[Dict[str, str]]):

Metadata associated with the job, merged with global metadata.

submission_id (Optional[str]):

Unique ID for the job submission.

entrypoint_num_cpus (Optional[Union[int, float]]):

The quantity of CPU cores to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it. Defaults to 0.

entrypoint_num_gpus (Optional[Union[int, float]]):

The quantity of GPUs to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it. Defaults to 0.

entrypoint_memory (Optional[int]):

The quantity of memory to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it. Defaults to 0.

entrypoint_resources (Optional[Dict[str, float]]):

The quantity of custom resources to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it.

Returns:
str:

The unique identifier for the submitted job.

tail_job_logs(job_id: str) Iterator[str][source]

Continuously streams the logs of a job.

Args:
job_id (str):

The unique identifier of the job.

Returns:
Iterator[str]:

An iterator that yields log entries in real-time.

Module contents