codeflare_sdk.ray.rayjobs package

Submodules

codeflare_sdk.ray.rayjobs.config module

The config sub-module contains the definition of the ManagedClusterConfig dataclass, which is used to specify resource requirements and other details when creating a Cluster object.

class codeflare_sdk.ray.rayjobs.config.ManagedClusterConfig(head_cpu_requests: int | str = 2, head_cpu_limits: int | str = 2, head_memory_requests: int | str = 8, head_memory_limits: int | str = 8, head_accelerators: ~typing.Dict[str, str | int] = <factory>, head_tolerations: ~typing.List[~kubernetes.client.models.v1_toleration.V1Toleration] | None = None, worker_cpu_requests: int | str = 1, worker_cpu_limits: int | str = 1, num_workers: int = 1, worker_memory_requests: int | str = 2, worker_memory_limits: int | str = 2, worker_tolerations: ~typing.List[~kubernetes.client.models.v1_toleration.V1Toleration] | None = None, envs: ~typing.Dict[str, str] = <factory>, image: str = '', image_pull_secrets: ~typing.List[str] = <factory>, labels: ~typing.Dict[str, str] = <factory>, worker_accelerators: ~typing.Dict[str, str | int] = <factory>, accelerator_configs: ~typing.Dict[str, str] = <factory>, annotations: ~typing.Dict[str, str] = <factory>, volumes: list[~kubernetes.client.models.v1_volume.V1Volume] = <factory>, volume_mounts: list[~kubernetes.client.models.v1_volume_mount.V1VolumeMount] = <factory>)[source]

Bases: object

This dataclass is used to specify resource requirements and other details for RayJobs. The cluster name and namespace are automatically derived from the RayJob configuration.

Args:
head_accelerators:

A dictionary of extended resource requests for the head node. ex: {“nvidia.com/gpu”: 1}

head_tolerations:

List of tolerations for head nodes.

num_workers:

The number of workers to create.

worker_tolerations:

List of tolerations for worker nodes.

envs:

A dictionary of environment variables to set for the cluster.

image:

The image to use for the cluster.

image_pull_secrets:

A list of image pull secrets to use for the cluster.

labels:

A dictionary of labels to apply to the cluster.

worker_accelerators:

A dictionary of extended resource requests for each worker. ex: {“nvidia.com/gpu”: 1}

accelerator_configs:

A dictionary of custom resource mappings to map extended resource requests to RayCluster resource names. Defaults to DEFAULT_ACCELERATORS but can be overridden with custom mappings.

annotations:

A dictionary of annotations to apply to the Job.

volumes:

A list of V1Volume objects to add to the Cluster

volume_mounts:

A list of V1VolumeMount objects to add to the Cluster

accelerator_configs: Dict[str, str]
add_file_volumes(secret_name: str, mount_path: str = '/home/ray/files')[source]

Add file volume and mount references to cluster configuration.

Args:

secret_name: Name of the Secret containing files mount_path: Where to mount files in containers (default: /home/ray/scripts)

annotations: Dict[str, str]
build_file_secret_spec(job_name: str, namespace: str, files: Dict[str, str]) Dict[str, Any][source]

Build Secret specification for files

Args:

job_name: Name of the RayJob (used for Secret naming) namespace: Kubernetes namespace files: Dictionary of file_name -> file_content

Returns:

Dict: Secret specification ready for Kubernetes API

build_file_volume_specs(secret_name: str, mount_path: str = '/home/ray/files') Tuple[Dict[str, Any], Dict[str, Any]][source]

Build volume and mount specifications for files

Args:

secret_name: Name of the Secret containing files mount_path: Where to mount files in containers

Returns:

Tuple of (volume_spec, mount_spec) as dictionaries

build_ray_cluster_spec(cluster_name: str) Dict[str, Any][source]

Build the RayCluster spec from ManagedClusterConfig for embedding in RayJob.

Args:

self: The cluster configuration object (ManagedClusterConfig) cluster_name: The name for the cluster (derived from RayJob name)

Returns:

Dict containing the RayCluster spec for embedding in RayJob

envs: Dict[str, str]
head_accelerators: Dict[str, str | int]
head_cpu_limits: int | str = 2
head_cpu_requests: int | str = 2
head_memory_limits: int | str = 8
head_memory_requests: int | str = 8
head_tolerations: List[V1Toleration] | None = None
image: str = ''
image_pull_secrets: List[str]
labels: Dict[str, str]
num_workers: int = 1
validate_secret_size(files: Dict[str, str]) None[source]
volume_mounts: list[V1VolumeMount]
volumes: list[V1Volume]
worker_accelerators: Dict[str, str | int]
worker_cpu_limits: int | str = 1
worker_cpu_requests: int | str = 1
worker_memory_limits: int | str = 2
worker_memory_requests: int | str = 2
worker_tolerations: List[V1Toleration] | None = None

codeflare_sdk.ray.rayjobs.pretty_print module

This sub-module exists primarily to be used internally by the RayJob object (in the rayjob sub-module) for pretty-printing job status and details.

codeflare_sdk.ray.rayjobs.pretty_print.print_job_status(job_info: RayJobInfo)[source]

Pretty print the job status in a format similar to cluster status.

codeflare_sdk.ray.rayjobs.pretty_print.print_no_job_found(job_name: str, namespace: str)[source]

Print a message when no job is found.

codeflare_sdk.ray.rayjobs.rayjob module

RayJob client for submitting and managing Ray jobs using the kuberay python client.

class codeflare_sdk.ray.rayjobs.rayjob.RayJob(job_name: str, entrypoint: str, cluster_name: str | None = None, cluster_config: ManagedClusterConfig | None = None, namespace: str | None = None, runtime_env: RuntimeEnv | Dict[str, Any] | None = None, ttl_seconds_after_finished: int = 0, active_deadline_seconds: int | None = None, local_queue: str | None = None)[source]

Bases: object

A client for managing Ray jobs using the KubeRay operator.

This class provides a simplified interface for submitting and managing RayJob CRs (using the KubeRay RayJob python client).

delete()[source]

Delete the Ray job. Returns True if deleted successfully or if already deleted.

resubmit()[source]

Resubmit the Ray job.

status(print_to_console: bool = True) Tuple[CodeflareRayJobStatus, bool][source]

Get the status of the Ray job.

Args:

print_to_console (bool): Whether to print formatted status to console (default: True)

Returns:

Tuple of (CodeflareRayJobStatus, ready: bool) where ready indicates job completion

stop()[source]

Suspend the Ray job.

submit() str[source]

codeflare_sdk.ray.rayjobs.runtime_env module

codeflare_sdk.ray.rayjobs.runtime_env.create_file_secret(job: RayJob, files: Dict[str, str], rayjob_result: Dict[str, Any])[source]

Create Secret with owner reference for local files.

codeflare_sdk.ray.rayjobs.runtime_env.create_secret_from_spec(job: RayJob, secret_spec: Dict[str, Any], rayjob_result: Dict[str, Any] = None) str[source]

Create Secret from specification via Kubernetes API.

Args:

secret_spec: Secret specification dictionary rayjob_result: The result from RayJob creation containing UID

Returns:

str: Name of the created Secret

codeflare_sdk.ray.rayjobs.runtime_env.extract_all_local_files(job: RayJob) Dict[str, str] | None[source]

Prepare local files for Secret upload.

  • If runtime_env has local working_dir: zip entire directory into single file

  • If single entrypoint file (no working_dir): extract that file

  • If remote working_dir URL: return None (pass through to Ray)

Returns:

Dict with either: - {“working_dir.zip”: <base64_encoded_zip>} for zipped directories - {“script.py”: <file_content>} for single files - None for remote working_dir or no files

codeflare_sdk.ray.rayjobs.runtime_env.parse_requirements_file(requirements_path: str) List[str] | None[source]

Parse a requirements.txt file and return list of dependencies.

Args:

requirements_path: Path to requirements.txt file

Returns:

List of pip dependencies

codeflare_sdk.ray.rayjobs.runtime_env.process_pip_dependencies(job: RayJob, pip_spec) List[str] | None[source]

Process pip dependencies from runtime_env.

Args:

pip_spec: Can be a list of packages, a string path to requirements.txt, or dict

Returns:

List of pip dependencies

codeflare_sdk.ray.rayjobs.runtime_env.process_runtime_env(job: RayJob, files: Dict[str, str] | None = None) str | None[source]

Process runtime_env field to handle env_vars, pip dependencies, and working_dir.

Returns:

Processed runtime environment as YAML string, or None if no processing needed

codeflare_sdk.ray.rayjobs.status module

The status sub-module defines Enums containing information for Ray job deployment states and CodeFlare job states, as well as dataclasses to store information for Ray jobs.

class codeflare_sdk.ray.rayjobs.status.CodeflareRayJobStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Defines the possible reportable states of a CodeFlare Ray job.

COMPLETE = 1
FAILED = 3
RUNNING = 2
SUSPENDED = 4
UNKNOWN = 5
class codeflare_sdk.ray.rayjobs.status.RayJobDeploymentStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Defines the possible deployment states of a Ray job (from the KubeRay RayJob API).

COMPLETE = 'Complete'
FAILED = 'Failed'
RUNNING = 'Running'
SUSPENDED = 'Suspended'
UNKNOWN = 'Unknown'
class codeflare_sdk.ray.rayjobs.status.RayJobInfo(name: str, job_id: str, status: RayJobDeploymentStatus, namespace: str, cluster_name: str, start_time: str | None = None, end_time: str | None = None, failed_attempts: int = 0, succeeded_attempts: int = 0)[source]

Bases: object

For storing information about a Ray job.

cluster_name: str
end_time: str | None = None
failed_attempts: int = 0
job_id: str
name: str
namespace: str
start_time: str | None = None
status: RayJobDeploymentStatus
succeeded_attempts: int = 0

Module contents