codeflare_sdk.ray.rayjobs package

Submodules

codeflare_sdk.ray.rayjobs.config module

The config sub-module contains the definition of the ManagedClusterConfig dataclass, which is used to specify resource requirements and other details when creating a Cluster object.

class codeflare_sdk.ray.rayjobs.config.ManagedClusterConfig(head_cpu_requests: int | str = 2, head_cpu_limits: int | str = 2, head_memory_requests: int | str = 8, head_memory_limits: int | str = 8, head_accelerators: ~typing.Dict[str, str | int] = <factory>, head_tolerations: ~typing.List[~kubernetes.client.models.v1_toleration.V1Toleration] | None = None, worker_cpu_requests: int | str = 1, worker_cpu_limits: int | str = 1, num_workers: int = 1, worker_memory_requests: int | str = 2, worker_memory_limits: int | str = 2, worker_tolerations: ~typing.List[~kubernetes.client.models.v1_toleration.V1Toleration] | None = None, envs: ~typing.Dict[str, str] = <factory>, image: str = '', image_pull_secrets: ~typing.List[str] = <factory>, labels: ~typing.Dict[str, str] = <factory>, worker_accelerators: ~typing.Dict[str, str | int] = <factory>, accelerator_configs: ~typing.Dict[str, str] = <factory>, annotations: ~typing.Dict[str, str] = <factory>, volumes: list[~kubernetes.client.models.v1_volume.V1Volume] = <factory>, volume_mounts: list[~kubernetes.client.models.v1_volume_mount.V1VolumeMount] = <factory>)[source]

Bases: object

This dataclass is used to specify resource requirements and other details for RayJobs. The cluster name and namespace are automatically derived from the RayJob configuration.

Args:

head_accelerators:: A dictionary of extended resource requests for the head node. ex: {“nvidia.com/gpu”: 1}
head_tolerations:: List of tolerations for head nodes.
num_workers:: The number of workers to create.
worker_tolerations:: List of tolerations for worker nodes.
envs:: A dictionary of environment variables to set for the cluster.
image:: The image to use for the cluster.
image_pull_secrets:: A list of image pull secrets to use for the cluster.
labels:: A dictionary of labels to apply to the cluster.
worker_accelerators:: A dictionary of extended resource requests for each worker. ex: {“nvidia.com/gpu”: 1}
accelerator_configs:: A dictionary of custom resource mappings to map extended resource requests to RayCluster resource names. Defaults to DEFAULT_ACCELERATORS but can be overridden with custom mappings.
annotations:: A dictionary of annotations to apply to the Job.
volumes:: A list of V1Volume objects to add to the Cluster
volume_mounts:: A list of V1VolumeMount objects to add to the Cluster

accelerator_configs: Dict[str, str]

add_file_volumes(secret_name: str, mount_path: str = '/home/ray/files')[source]

Add file volume and mount references to cluster configuration.

Args:: secret_name: Name of the Secret containing files mount_path: Where to mount files in containers (default: /home/ray/scripts)

annotations: Dict[str, str]

build_file_secret_spec(job_name: str, namespace: str, files: Dict[str, str]) → Dict[str, Any][source]

Build Secret specification for files

Args:: job_name: Name of the RayJob (used for Secret naming) namespace: Kubernetes namespace files: Dictionary of file_name -> file_content
Returns:: Dict: Secret specification ready for Kubernetes API

build_file_volume_specs(secret_name: str, mount_path: str = '/home/ray/files') → Tuple[Dict[str, Any], Dict[str, Any]][source]

Build volume and mount specifications for files

Args:: secret_name: Name of the Secret containing files mount_path: Where to mount files in containers
Returns:: Tuple of (volume_spec, mount_spec) as dictionaries

build_ray_cluster_spec(cluster_name: str) → Dict[str, Any][source]

Build the RayCluster spec from ManagedClusterConfig for embedding in RayJob.

Args:: self: The cluster configuration object (ManagedClusterConfig) cluster_name: The name for the cluster (derived from RayJob name)
Returns:: Dict containing the RayCluster spec for embedding in RayJob

envs: Dict[str, str]

head_accelerators: Dict[str, str | int]

head_cpu_limits: int | str = 2

head_cpu_requests: int | str = 2

head_memory_limits: int | str = 8

head_memory_requests: int | str = 8

head_tolerations: List[V1Toleration] | None = None

image: str = ''

image_pull_secrets: List[str]

labels: Dict[str, str]

num_workers: int = 1

validate_secret_size(files: Dict[str, str]) → None[source]

volume_mounts: list[V1VolumeMount]

volumes: list[V1Volume]

worker_accelerators: Dict[str, str | int]

worker_cpu_limits: int | str = 1

worker_cpu_requests: int | str = 1

worker_memory_limits: int | str = 2

worker_memory_requests: int | str = 2

worker_tolerations: List[V1Toleration] | None = None

codeflare_sdk.ray.rayjobs.pretty_print module

This sub-module exists primarily to be used internally by the RayJob object (in the rayjob sub-module) for pretty-printing job status and details.

codeflare_sdk.ray.rayjobs.pretty_print.print_job_status(job_info: RayJobInfo)[source]: Pretty print the job status in a format similar to cluster status.

codeflare_sdk.ray.rayjobs.pretty_print.print_no_job_found(job_name: str, namespace: str)[source]: Print a message when no job is found.

codeflare_sdk.ray.rayjobs.rayjob module

RayJob client for submitting and managing Ray jobs using the kuberay python client.

class codeflare_sdk.ray.rayjobs.rayjob.RayJob(job_name: str, entrypoint: str, cluster_name: str | None = None, cluster_config: ManagedClusterConfig | None = None, namespace: str | None = None, runtime_env: RuntimeEnv | Dict[str, Any] | None = None, ttl_seconds_after_finished: int = 0, active_deadline_seconds: int | None = None, local_queue: str | None = None, priority_class: str | None = None)[source]

Bases: object

A client for managing Ray jobs using the KubeRay operator.

This class provides a simplified interface for submitting and managing RayJob CRs (using the KubeRay RayJob python client).

delete()[source]: Delete the Ray job. Returns True if deleted successfully or if already deleted.

resubmit()[source]: Resubmit the Ray job.

status(print_to_console: bool = True) → Tuple[CodeflareRayJobStatus, bool][source]

Get the status of the Ray job.

Args:: print_to_console (bool): Whether to print formatted status to console (default: True)
Returns:: Tuple of (CodeflareRayJobStatus, ready: bool) where ready indicates job completion

stop()[source]: Suspend the Ray job.

submit() → str[source]

codeflare_sdk.ray.rayjobs.runtime_env module

codeflare_sdk.ray.rayjobs.runtime_env.create_file_secret(job: RayJob, files: Dict[str, str], rayjob_result: Dict[str, Any])[source]: Create Secret with owner reference for local files.

codeflare_sdk.ray.rayjobs.runtime_env.create_secret_from_spec(job: RayJob, secret_spec: Dict[str, Any], rayjob_result: Dict[str, Any] = None) → str[source]

Create Secret from specification via Kubernetes API.

Args:: secret_spec: Secret specification dictionary rayjob_result: The result from RayJob creation containing UID
Returns:: str: Name of the created Secret

codeflare_sdk.ray.rayjobs.runtime_env.extract_all_local_files(job: RayJob) → Dict[str, str] | None[source]

Prepare local files for Secret upload.

If runtime_env has local working_dir: zip entire directory into single file
If single entrypoint file (no working_dir): extract that file
If remote working_dir URL: return None (pass through to Ray)

Returns:: Dict with either: - {“working_dir.zip”: <base64_encoded_zip>} for zipped directories - {“script.py”: <file_content>} for single files - None for remote working_dir or no files

codeflare_sdk.ray.rayjobs.runtime_env.parse_requirements_file(requirements_path: str) → List[str] | None[source]

Parse a requirements.txt file and return list of dependencies.

Args:: requirements_path: Path to requirements.txt file
Returns:: List of pip dependencies

codeflare_sdk.ray.rayjobs.runtime_env.process_pip_dependencies(job: RayJob, pip_spec) → List[str] | None[source]

Process pip dependencies from runtime_env.

Args:: pip_spec: Can be a list of packages, a string path to requirements.txt, or dict
Returns:: List of pip dependencies

codeflare_sdk.ray.rayjobs.runtime_env.process_runtime_env(job: RayJob, files: Dict[str, str] | None = None) → str | None[source]

Process runtime_env field to handle env_vars, pip dependencies, and working_dir.

Returns:: Processed runtime environment as YAML string, or None if no processing needed

codeflare_sdk.ray.rayjobs.status module

The status sub-module defines Enums containing information for Ray job deployment states and CodeFlare job states, as well as dataclasses to store information for Ray jobs.

class codeflare_sdk.ray.rayjobs.status.CodeflareRayJobStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Defines the possible reportable states of a CodeFlare Ray job.

COMPLETE = 1

FAILED = 3

RUNNING = 2

SUSPENDED = 4

UNKNOWN = 5

class codeflare_sdk.ray.rayjobs.status.RayJobDeploymentStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Defines the possible deployment states of a Ray job (from the KubeRay RayJob API).

COMPLETE = 'Complete'

FAILED = 'Failed'

RUNNING = 'Running'

SUSPENDED = 'Suspended'

UNKNOWN = 'Unknown'

class codeflare_sdk.ray.rayjobs.status.RayJobInfo(name: str, job_id: str, status: RayJobDeploymentStatus, namespace: str, cluster_name: str, start_time: str | None = None, end_time: str | None = None, failed_attempts: int = 0, succeeded_attempts: int = 0)[source]

Bases: object

For storing information about a Ray job.

cluster_name: str

end_time: str | None = None

failed_attempts: int = 0

job_id: str

name: str

namespace: str

start_time: str | None = None

status: RayJobDeploymentStatus

succeeded_attempts: int = 0

codeflare_sdk.ray.rayjobs package

Submodules

codeflare_sdk.ray.rayjobs.config module

codeflare_sdk.ray.rayjobs.pretty_print module

codeflare_sdk.ray.rayjobs.rayjob module

codeflare_sdk.ray.rayjobs.runtime_env module

codeflare_sdk.ray.rayjobs.status module

Module contents