cara-server that watches for
Pending projects and assigns each one to a suitable node. When a match is
found, the scheduler writes the node’s name to status.nodeRef and advances
the project’s phase to Scheduled. The agent on that node then picks up the
project and starts the containers.
How the scheduler works
Project enters Pending
When you create a project with
caractrl apply, the control plane accepts
the manifest and sets status.phase to Pending.Scheduler picks up the project
The scheduler watches for
Pending projects continuously, processing each
one promptly and re-checking all pending projects every 30 seconds as a
fallback.Scheduler selects a node
For each
Pending project the scheduler finds nodes whose state is Ready
and unschedulable is false, then picks one. It writes the node’s name
to status.nodeRef and advances the phase to Scheduled.If no Ready nodes are available, the scheduler retries automatically.Node eligibility criteria
The scheduler considers a node eligible if all of the following are true:state = Ready
The control plane has verified that the node’s last heartbeat arrived
within the past 90 seconds.
unschedulable = false
The node’s
spec.unschedulable field has not been set to true by an
administrator.Sufficient allocatable resources
The node’s
status.allocatable (capacity minus system-reserved amounts)
has enough headroom after accounting for already-running projects.The current scheduling algorithm selects the first eligible node from the
list of Ready nodes. A resource-aware, affinity-weighted algorithm
is planned for a future release.
Phase progression
The table below shows which component is responsible for each phase transition and what it means for your workload:| Transition | Actor | What happens |
|---|---|---|
Created → Pending | API server | Manifest is accepted and stored. |
Pending → Scheduled | Scheduler | status.nodeRef is written; the agent will pick up the project on its next poll. |
Scheduled → Running | Agent | All containers started successfully. |
Scheduled / Running → Failed | Agent | The agent could not start or maintain the containers. See status.conditions for the error. |
Running / Failed → Terminating | API server | Deletion request received; agent is tearing down. |
Terminating → Terminated | Agent | All Docker resources removed. The record is deleted from the store shortly after. |
Throughput metrics and scheduling
Each node’sstatus.network.throughput field reports the last measured
download and upload speeds (e.g. "120Mbps") and the time of the test. The
scheduler uses these values to estimate:
- Download speed — how long it will take the node to pull a container image or restore a backup before starting.
- Upload speed — RPO feasibility for workloads that write data back to object storage.
Throughput measurements are taken by the agent at startup and periodically
thereafter. They are advisory inputs to the scheduling algorithm, not hard
constraints. A node is never excluded from scheduling solely on the basis
of measured throughput.
Influencing scheduling
Prevent scheduling onto a node
Prevent scheduling onto a node
Set
spec.unschedulable: true in the node’s manifest and re-apply it.
The scheduler will skip the node for all future assignments. Projects
already running on the node are not affected.Use labels for workload placement (future)
Use labels for workload placement (future)
Node labels (set under
metadata.labels) are recorded in the store and
will be used by affinity rules in a future scheduling algorithm. Applying
labels now does not affect placement in the current release, but it is
good practice to label nodes by zone, hardware tier, or other dimensions
so you are ready when affinity support ships.What to do if a project stays Pending
If your project stays inPending for more than a few seconds, work through
the following checks:
Check node states
List all nodes and look at their state column.If every node is
NotReady, the scheduler has no eligible targets. Check
that cara-agent is running on at least one node and that it can reach
cara-server.Check for unschedulable nodes
A node in Look for
Ready state with spec.unschedulable: true is still excluded.
Inspect individual nodes:spec.unschedulable: true. If present, either clear the flag or
bring another node online.Check allocatable resources
If nodes are Reduce the project’s resource requests or free capacity by removing other
projects from the cluster.
Ready but the project still does not schedule, the
scheduler may have determined that no node has sufficient headroom. Check
status.allocatable on each node and compare it against the resource
requests in your project:What happens when a project fails
If the agent cannot start the project or detects a terminal error after start, it patchesstatus.phase to Failed and writes a Phase condition
with the reason and a human-readable message.
If the node running a project becomes NotReady, the control plane writes a
NotReadyAt condition to mark the start of a grace period. After the grace
period expires, the control plane can force-terminate the project on the
failed node and return it to Pending so the scheduler can place it
elsewhere.