The peering process terminates by creating a new virtual node in the home cluster. According to the Liqo terminology, the new virtual node is called big node, as it represents (and aggregates) a subset of the resources available in the foreign cluster. Conversely, the home cluster becomes what we call a big cluster, as it represents a cluster whose resources span (transparently) across multiple physical clusters.
In Kubernetes, each physical node is managed by the kubelet, a process running in the hosting machine that interfaces with the Kubernetes API server and handles the lifecycle of the pods scheduled on it.
Since the foreign cluster is represented by a virtual node in the home cluster, offloading a pod on the remote cluster corresponds to scheduling it on a specific node. When the virtual kubelet creates a new pod in the foreign cluster, the foreign scheduler elects one remote, physical node as host for the received pod, while the remote, physical kubelet managing that pod takes care of the containers’ execution.
According to this mechanism, the offloading of a pod to the foreign cluster is fully compliant with the Kubernetes control plane: the home cluster can control all the remote jobs by interacting with the big node that represents the remote cluster.
The big node is a virtual node, so it cannot have a real kubelet process such as normal physical nodes.
Liqo leverages a custom version of the virtual kubelet project for the management of the virtual node.
In a nutshell, a cluster that peers with N
foreign clusters has N
big nodes representing the clusters and runs N
instances of the virtual kubelet process.
Generally speaking, a standard kubelet is in charge of accomplishing two tasks:
Similarly, the virtual kubelet is in charge of:
Also, our implementation provides a feature we called “reflection”, described here.
The virtual kubelet has to face the namespace replication problem to make the pods in a certain namespace suitable to be offloaded. Further details can be found in the dedicated section.
The virtual node is created with a specific taint. To make a pod available to be scheduled on a virtual node that taint must be tolerated.
The toleration is added by the Liqo MutatingWebhook watching the pods created in all the namespaces labeled with the label liqo.io/enabled="true"
.
By default, the Kubernetes scheduler selects the eligible node with the highest score (scores are computed on several parameters, among which the available resources).