codeflare_sdk.ray.rayjobs package
Submodules
codeflare_sdk.ray.rayjobs.config module
The config sub-module contains the definition of the ManagedClusterConfig dataclass, which is used to specify resource requirements and other details when creating a Cluster object.
- class codeflare_sdk.ray.rayjobs.config.ManagedClusterConfig(head_cpu_requests: int | str = 2, head_cpu_limits: int | str = 2, head_memory_requests: int | str = 8, head_memory_limits: int | str = 8, head_accelerators: ~typing.Dict[str, str | int] = <factory>, head_tolerations: ~typing.List[~kubernetes.client.models.v1_toleration.V1Toleration] | None = None, worker_cpu_requests: int | str = 1, worker_cpu_limits: int | str = 1, num_workers: int = 1, worker_memory_requests: int | str = 2, worker_memory_limits: int | str = 2, worker_tolerations: ~typing.List[~kubernetes.client.models.v1_toleration.V1Toleration] | None = None, envs: ~typing.Dict[str, str] = <factory>, image: str = '', image_pull_secrets: ~typing.List[str] = <factory>, labels: ~typing.Dict[str, str] = <factory>, worker_accelerators: ~typing.Dict[str, str | int] = <factory>, accelerator_configs: ~typing.Dict[str, str] = <factory>, annotations: ~typing.Dict[str, str] = <factory>, volumes: list[~kubernetes.client.models.v1_volume.V1Volume] = <factory>, volume_mounts: list[~kubernetes.client.models.v1_volume_mount.V1VolumeMount] = <factory>)[source]
Bases:
object
This dataclass is used to specify resource requirements and other details for RayJobs. The cluster name and namespace are automatically derived from the RayJob configuration.
- Args:
- head_accelerators:
A dictionary of extended resource requests for the head node. ex: {“nvidia.com/gpu”: 1}
- head_tolerations:
List of tolerations for head nodes.
- num_workers:
The number of workers to create.
- worker_tolerations:
List of tolerations for worker nodes.
- envs:
A dictionary of environment variables to set for the cluster.
- image:
The image to use for the cluster.
- image_pull_secrets:
A list of image pull secrets to use for the cluster.
- labels:
A dictionary of labels to apply to the cluster.
- worker_accelerators:
A dictionary of extended resource requests for each worker. ex: {“nvidia.com/gpu”: 1}
- accelerator_configs:
A dictionary of custom resource mappings to map extended resource requests to RayCluster resource names. Defaults to DEFAULT_ACCELERATORS but can be overridden with custom mappings.
- annotations:
A dictionary of annotations to apply to the Job.
- volumes:
A list of V1Volume objects to add to the Cluster
- volume_mounts:
A list of V1VolumeMount objects to add to the Cluster
- accelerator_configs: Dict[str, str]
- add_file_volumes(secret_name: str, mount_path: str = '/home/ray/files')[source]
Add file volume and mount references to cluster configuration.
- Args:
secret_name: Name of the Secret containing files mount_path: Where to mount files in containers (default: /home/ray/scripts)
- annotations: Dict[str, str]
- build_file_secret_spec(job_name: str, namespace: str, files: Dict[str, str]) Dict[str, Any] [source]
Build Secret specification for files
- Args:
job_name: Name of the RayJob (used for Secret naming) namespace: Kubernetes namespace files: Dictionary of file_name -> file_content
- Returns:
Dict: Secret specification ready for Kubernetes API
- build_file_volume_specs(secret_name: str, mount_path: str = '/home/ray/files') Tuple[Dict[str, Any], Dict[str, Any]] [source]
Build volume and mount specifications for files
- Args:
secret_name: Name of the Secret containing files mount_path: Where to mount files in containers
- Returns:
Tuple of (volume_spec, mount_spec) as dictionaries
- build_ray_cluster_spec(cluster_name: str) Dict[str, Any] [source]
Build the RayCluster spec from ManagedClusterConfig for embedding in RayJob.
- Args:
self: The cluster configuration object (ManagedClusterConfig) cluster_name: The name for the cluster (derived from RayJob name)
- Returns:
Dict containing the RayCluster spec for embedding in RayJob
- envs: Dict[str, str]
- head_accelerators: Dict[str, str | int]
- head_cpu_limits: int | str = 2
- head_cpu_requests: int | str = 2
- head_memory_limits: int | str = 8
- head_memory_requests: int | str = 8
- head_tolerations: List[V1Toleration] | None = None
- image: str = ''
- image_pull_secrets: List[str]
- labels: Dict[str, str]
- num_workers: int = 1
- volume_mounts: list[V1VolumeMount]
- volumes: list[V1Volume]
- worker_accelerators: Dict[str, str | int]
- worker_cpu_limits: int | str = 1
- worker_cpu_requests: int | str = 1
- worker_memory_limits: int | str = 2
- worker_memory_requests: int | str = 2
- worker_tolerations: List[V1Toleration] | None = None
codeflare_sdk.ray.rayjobs.pretty_print module
This sub-module exists primarily to be used internally by the RayJob object (in the rayjob sub-module) for pretty-printing job status and details.
- codeflare_sdk.ray.rayjobs.pretty_print.print_job_status(job_info: RayJobInfo)[source]
Pretty print the job status in a format similar to cluster status.
codeflare_sdk.ray.rayjobs.rayjob module
RayJob client for submitting and managing Ray jobs using the kuberay python client.
- class codeflare_sdk.ray.rayjobs.rayjob.RayJob(job_name: str, entrypoint: str, cluster_name: str | None = None, cluster_config: ManagedClusterConfig | None = None, namespace: str | None = None, runtime_env: RuntimeEnv | Dict[str, Any] | None = None, ttl_seconds_after_finished: int = 0, active_deadline_seconds: int | None = None, local_queue: str | None = None)[source]
Bases:
object
A client for managing Ray jobs using the KubeRay operator.
This class provides a simplified interface for submitting and managing RayJob CRs (using the KubeRay RayJob python client).
- status(print_to_console: bool = True) Tuple[CodeflareRayJobStatus, bool] [source]
Get the status of the Ray job.
- Args:
print_to_console (bool): Whether to print formatted status to console (default: True)
- Returns:
Tuple of (CodeflareRayJobStatus, ready: bool) where ready indicates job completion
codeflare_sdk.ray.rayjobs.runtime_env module
- codeflare_sdk.ray.rayjobs.runtime_env.create_file_secret(job: RayJob, files: Dict[str, str], rayjob_result: Dict[str, Any])[source]
Create Secret with owner reference for local files.
- codeflare_sdk.ray.rayjobs.runtime_env.create_secret_from_spec(job: RayJob, secret_spec: Dict[str, Any], rayjob_result: Dict[str, Any] = None) str [source]
Create Secret from specification via Kubernetes API.
- Args:
secret_spec: Secret specification dictionary rayjob_result: The result from RayJob creation containing UID
- Returns:
str: Name of the created Secret
- codeflare_sdk.ray.rayjobs.runtime_env.extract_all_local_files(job: RayJob) Dict[str, str] | None [source]
Prepare local files for Secret upload.
If runtime_env has local working_dir: zip entire directory into single file
If single entrypoint file (no working_dir): extract that file
If remote working_dir URL: return None (pass through to Ray)
- Returns:
Dict with either: - {“working_dir.zip”: <base64_encoded_zip>} for zipped directories - {“script.py”: <file_content>} for single files - None for remote working_dir or no files
- codeflare_sdk.ray.rayjobs.runtime_env.parse_requirements_file(requirements_path: str) List[str] | None [source]
Parse a requirements.txt file and return list of dependencies.
- Args:
requirements_path: Path to requirements.txt file
- Returns:
List of pip dependencies
codeflare_sdk.ray.rayjobs.status module
The status sub-module defines Enums containing information for Ray job deployment states and CodeFlare job states, as well as dataclasses to store information for Ray jobs.
- class codeflare_sdk.ray.rayjobs.status.CodeflareRayJobStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Defines the possible reportable states of a CodeFlare Ray job.
- COMPLETE = 1
- FAILED = 3
- RUNNING = 2
- SUSPENDED = 4
- UNKNOWN = 5
- class codeflare_sdk.ray.rayjobs.status.RayJobDeploymentStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Defines the possible deployment states of a Ray job (from the KubeRay RayJob API).
- COMPLETE = 'Complete'
- FAILED = 'Failed'
- RUNNING = 'Running'
- SUSPENDED = 'Suspended'
- UNKNOWN = 'Unknown'
- class codeflare_sdk.ray.rayjobs.status.RayJobInfo(name: str, job_id: str, status: RayJobDeploymentStatus, namespace: str, cluster_name: str, start_time: str | None = None, end_time: str | None = None, failed_attempts: int = 0, succeeded_attempts: int = 0)[source]
Bases:
object
For storing information about a Ray job.
- cluster_name: str
- end_time: str | None = None
- failed_attempts: int = 0
- job_id: str
- name: str
- namespace: str
- start_time: str | None = None
- status: RayJobDeploymentStatus
- succeeded_attempts: int = 0