codeflare_sdk.ray.client package
Submodules
codeflare_sdk.ray.client.ray_jobs module
The ray_jobs sub-module contains methods needed to submit jobs and connect to Ray Clusters that were not created by CodeFlare. The SDK acts as a wrapper for the Ray Job Submission Client.
- class codeflare_sdk.ray.client.ray_jobs.RayJobClient(address: str | None = None, create_cluster_if_needed: bool = False, cookies: Dict[str, Any] | None = None, metadata: Dict[str, Any] | None = None, headers: Dict[str, Any] | None = None, verify: str | bool | None = True)[source]
Bases:
object
A wrapper class for the Ray Job Submission Client, used for interacting with Ray clusters to manage job submissions, deletions, and other job-related information.
- Args:
- address (Optional[str]):
The Ray cluster’s address, which may be either the Ray Client address, HTTP address of the dashboard server on the head node, or “auto” / “localhost:<port>” for a local cluster. This is overridden by the RAY_ADDRESS environment variable if set.
- create_cluster_if_needed (bool):
If True, a new cluster will be created if not already running at the specified address. By default, Ray requires an existing cluster.
- cookies (Optional[Dict[str, Any]]):
HTTP cookies to send with requests to the job server.
- metadata (Optional[Dict[str, Any]]):
Global metadata to store with all jobs, merged with job-specific metadata during job submission.
- headers (Optional[Dict[str, Any]]):
HTTP headers to send with requests to the job server, can be used for authentication.
- verify (Optional[Union[str, bool]]):
If True, verifies the server’s TLS certificate. Can also be a path to trusted certificates. Default is True.
- delete_job(job_id: str) -> (<class 'bool'>, <class 'str'>)[source]
Deletes a job by job ID.
- Args:
- job_id (str):
The unique identifier of the job to delete.
- Returns:
- tuple(bool, str):
A tuple with deletion status and a message.
- get_address() str [source]
Retrieves the address of the connected Ray cluster.
- Returns:
- str:
The Ray cluster’s address.
- get_job_info(job_id: str)[source]
Fetches information about a job by job ID.
- Args:
- job_id (str):
The unique identifier of the job.
- Returns:
- JobInfo:
Information about the job’s status, progress, and other details.
- get_job_logs(job_id: str) str [source]
Retrieves the logs for a specific job by job ID.
- Args:
- job_id (str):
The unique identifier of the job.
- Returns:
- str:
Logs output from the job.
- get_job_status(job_id: str) str [source]
Fetches the current status of a job by job ID.
- Args:
- job_id (str):
The unique identifier of the job.
- Returns:
- str:
The job’s status.
- list_jobs() List[JobDetails] [source]
Lists all current jobs in the Ray cluster.
- Returns:
- List[JobDetails]:
A list of job details for each current job in the cluster.
- stop_job(job_id: str) -> (<class 'bool'>, <class 'str'>)[source]
Stops a running job by job ID.
- Args:
- job_id (str):
The unique identifier of the job to stop.
- Returns:
- tuple(bool, str):
A tuple with the stop status and a message.
- submit_job(entrypoint: str, job_id: str | None = None, runtime_env: Dict[str, Any] | None = None, metadata: Dict[str, str] | None = None, submission_id: str | None = None, entrypoint_num_cpus: int | float | None = None, entrypoint_num_gpus: int | float | None = None, entrypoint_memory: int | None = None, entrypoint_resources: Dict[str, float] | None = None) str [source]
Submits a job to the Ray cluster with specified resources and returns the job ID.
- Args:
- entrypoint (str):
The command to execute for this job.
- job_id (Optional[str]):
Deprecated, use submission_id. A unique job identifier.
- runtime_env (Optional[Dict[str, Any]]):
The runtime environment for this job.
- metadata (Optional[Dict[str, str]]):
Metadata associated with the job, merged with global metadata.
- submission_id (Optional[str]):
Unique ID for the job submission.
- entrypoint_num_cpus (Optional[Union[int, float]]):
The quantity of CPU cores to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it. Defaults to 0.
- entrypoint_num_gpus (Optional[Union[int, float]]):
The quantity of GPUs to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it. Defaults to 0.
- entrypoint_memory (Optional[int]):
The quantity of memory to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it. Defaults to 0.
- entrypoint_resources (Optional[Dict[str, float]]):
The quantity of custom resources to reserve for the execution of the entrypoint command, separately from any tasks or actors launched by it.
- Returns:
- str:
The unique identifier for the submitted job.