Introducing Suspended Jobs
Author: Adhityaa Chandrasekar (Google)
Jobs are a crucial part of Kubernetes' API. While other kinds of workloads such as Deployments, ReplicaSets, StatefulSets, and DaemonSets solve use-cases that require Pods to run forever, Jobs are useful when Pods need to run to completion. Commonly used in parallel batch processing, Jobs can be used in a variety of applications ranging from video rendering and database maintenance to sending bulk emails and scientific computing.
While the amount of parallelism and the conditions for Job completion are configurable, the Kubernetes API lacked the ability to suspend and resume Jobs. This is often desired when cluster resources are limited and a higher priority Job needs to execute in the place of another Job. Deleting the lower priority Job is a poor workaround as Pod completion history and other metrics associated with the Job will be lost.
With the recent Kubernetes 1.21 release, you will be able to suspend a Job by
updating its spec. The feature is currently in alpha and requires you to
enable the SuspendJob
feature gate
on the API server
and the controller manager
in order to use it.
API changes
We introduced a new boolean field suspend
into the .spec
of Jobs. Let's say
I create the following Job:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job
spec:
suspend: true
parallelism: 2
completions: 10
template:
spec:
containers:
- name: my-container
image: busybox
command: ["sleep", "5"]
restartPolicy: Never
Jobs are not suspended by default, so I'm explicitly setting the suspend
field
to true in the .spec
of the above Job manifest. In the above example, the
Job controller will refrain from creating Pods until I'm ready to start the Job,
which I can do by updating suspend
to false.
As another example, consider a Job that was created with the suspend
field
omitted. The Job controller will happily create Pods to work towards Job
completion. However, before the Job completes, if I explicitly set the field to
true with a Job update, the Job controller will terminate all active Pods that
are running and will wait indefinitely for the flag to be flipped back to false.
Typically, Pod termination is done by sending a SIGTERM signal to all container
processes in the Pod; the graceful termination period
defined in the Pod spec will be honoured. Pods terminated this way will not be
counted as failures by the Job controller.
It is important to understand that succeeded and failed Pods from the past will continue to exist after you suspend a Job. That is, that they will count towards Job completion once you resume it. You can verify this by looking at Job's status before and after suspension.
Read the documentation for a full overview of this new feature.
Where is this useful?
Let's say I'm the operator of a large cluster. I have many users submitting Jobs to the cluster, but not all Jobs are created equal — some Jobs are more important than others. Cluster resources aren't infinite either, so all users must share resources. If all Jobs were created in the suspended state and placed in a pending queue, I can achieve priority-based Job scheduling by resuming Jobs in the right order.
As another motivational use-case, consider a cloud provider where compute resources are cheaper at night than in the morning. If I have a long-running Job that takes multiple days to complete, being able to suspend the Job in the morning and then resume it in the evening every day can reduce costs.
Since this field is a part of the Job spec, CronJobs automatically get this feature for free too.
References and next steps
If you're interested in a deeper dive into the rationale behind this feature and the decisions we have taken, consider reading the enhancement proposal. There's more detail on suspending and resuming jobs in the documentation for Job.
As previously mentioned, this feature is currently in alpha and is available
only if you explicitly opt-in through the SuspendJob
feature gate.
If this is a feature you're interested in, please consider testing suspended
Jobs in your cluster and providing feedback. You can discuss this enhancement on GitHub.
The SIG Apps community also meets regularly
and can be reached through Slack or the mailing list.
Barring any unexpected changes to the API, we intend to graduate the feature to
beta in Kubernetes 1.22, so that the feature becomes available by default.