Default Policy and Related Metrics in CATS

Computing-Aware Traffic Steering (CATS) is proposed to support steering the traffic among different service sites according to both the real-time network and computing resource status as mentioned in . It requires the network to be aware of computing resource information and select a service instance based on the joint metric of computing and networking. The basic idea of CATS is that the network can be notified about the computing information, and change the steering policy accordingly. It is the first question that what should be aware of by the network in the system. There are many computing information in the computing domain, and we need to select one or several of them as the default ones. This document describes the considerations and requirements of the computing information designation, and suggests a two-stage strategy for the metric definition. Firstly, we can define some default metrics that is necessary for enabling the basic steering mechanism in CATS. Afterwards, other metrics can be added as needed, which requires that the metric designation should be extensible. The following paragraphs are organized as below. We firstly review the procedure of traffic steering in CATS, and show where the computing information is needed. After that, we analyses some requirements and the principles of computing metric designation. Then, we describe some computing metric considerations in CATS. Finally, we discuss the default policy on the decision point.

This document makes use of the following terms: A traffic engineering approach that takes into account the dynamic nature of computing resources and network state to optimize service-specific traffic forwarding towards a given service contact instance. Various relevant metrics may be used to enforce such computing-aware traffic steering policies. An offering that is made available by a provider by orchestrating a set of resources (networking, compute, storage, etc.). An instance of running resources according to a given service logic. Used to uniquely identify a service, at the same time identifying the whole set of service instances that each represents the same service behavior, no matter where those service instances are running. The ability of nodes with computing resource achieve specific result output through data processing, including but not limited to computing, communication, memory and storage capability.

It is assumed that the same service can be provided in multiple places in the CATS framework. In the different service instances, it is common that they have different kinds of computing resources, and different utilization rate of the computing resources. In the CATS framework, the decision point, which is supposed to be a node in the network, should be aware of the network status and the computing status, and accordingly choose a proper service point for the client. A general procedure to steer the CATS traffic is described as below. The CATS packets have an anycast destination address, which is also called the service ID, and it is announced by the different service points. Firstly, the service points need to collect some specific computing information that need to be sent into the network following a uniform format so that the decision point can understand the computing information. In this step, only necessary computing information needs to be considered, so as to avoid exposing too much information of the service points. Secondly, the service points send the computing information into the network by some means, and update it periodic or on demand. Thirdly, the decision point receives the computing information, and makes a decision for the specific service related to the service ID. Hence, the route for the service ID on the Ingress is established or updated. Fourthly, the traffic for the service ID reaching the Ingress node would be identified and steered according to the route policy in the previous step. Fifthly, the Egress node receives the packets of the traffic for the service ID from the Ingress, and traffic would be identified and forwarded to the corresponding service point. The scheduling of traffic in CATS is similar to the task allocation and scheduling in cloud computing. Normally, it is a multi-objective optimization. In task allocation strategies of cloud computing systems, the main optimization objectives considered are task execution time, execution cost, and load balancing. Execution time and execution cost are used to evaluate the user satisfaction, and load balancing is used to measure the reliability and availability of the system.

After the computing metrics are generated, they are notified to the decision point in the network to influence the traffic steering. However, the decision point in the network, for example the Ingress Node, only cares about how to use to capability values to do the traffic steering, but does not care about the way how the capability values are generated. From the aspect of services, they need an evaluating system to generate one or more capability values. To achieve the best LB result, different services or service types may have different ways to evaluate the capability. From the aspect the decision point in the network, it only needs to understand the way to use the values, and trigger the implement of the related policy. Some requirements related to the design of the computing resource modeling are listed below. The optimization objective of the policy in the decision point may be various. For example, it may be the lowest latency of the sum of the network delay and the computing delay, or it may be an overall better load balance result, in which the clients would prefer the service points that could support more clients. The update frequency of the computing metrics may be various. Some of the metrics may be more dynamic, and some are relatively static. The notification ways of the computing metrics may be various. According to its update frequency, we may choose different ways to update the metric. Metric merging process should be supported when multiple service instances are behind the same Egress.

The target in CATS mainly concerns about the service point selection and traffic steering in Layer3, in which we do not need all computing information of the service points. Hence, we can start with simple cases in the work of the computing resource modeling in CATS. Some design principles can be considered. Simplicity: The computing metrics in CATS SHOULD be few and simple, so as to avoid exposing too much information of the service points. Scalability: The computing metrics in CATS SHOULD be evolveable for the future extensions. Interoperability: The computing metrics in CATS SHOULD be vendor-independent, and OS-independent. Stability: The computing metrics in CATS SHOULD NOT incur too much overhead in protocol design, and it can be stabilized to be used. Accuracy: The computing metrics in CATS SHOULD be effective for path selection decision making, and the accuracy SHOULD be guaranteed.

Various metrics can be considered in CATS, and perhaps different services would need different metrics. However, we can start with simple cases. In CATS, a straightforward intent is to minimal the total delay in the network domain and the computing domain. Thus, we can have a start point for the metric designation in CATS considering only the delay information. In this case, the decision point can collect the network delay and the computing delay, and make a decision about the optimal service point accordingly. The advantage of this method is that it is simple and easy to start; meanwhile, the network metric and the computing metric have the same unit of measure. The network delay can be the latency between the Ingress node and Egress node in the network. The computing delay can be generated by the server, which is the delay that server in the service point to process the CATS service. It means “the estimate of the duration of my processing of request”, and it is usually an average value for the service request. The optimization objective of traffic steering in this scenario is the minimal total delay for the client. Another metric that can be considered is the service capability, which is the ability that the server in the service point to process the CATS service. For example, one server can support 100 simultaneous sessions and another can support 10,000 simultaneous sessions. The value can be generated by the server when deploying the service instance. The metric can work alone. In this scenario, the decision point can do a Load Balance job according to the service capability. For example, the decision process can be load balancing after pruning the service points with poor network latency metrics. Also, the metric can work with the computing delay metric. For example, in this scenario, we can prune the service points with poor total latency metrics before the load balancing. In future, we can also consider other metrics, which may be more dynamic. Besides, for some other optimization objectives, we can consider other metrics, even metrics about energy consumption. However, in this cases, the decision point needs to consider more dimensions of metrics. A suggestion is that we should firstly make sure the service point is available, which means the service point can still accept more sessions, and then select an optimal target service point according to the optimization objective.

In this section, for the convenience of description, we assume that the decision point is on the Ingress, and the Egress would gather the computing metrics of the corresponding service points and notify the Ingress. After receiving the notification about the computing metrics, the Ingress can generate one or more routing policies accordingly for forwarding the packets of CATS traffic from the user. In this document, we suggest that the routing policies should include a default one at least. The packets of CATS traffic received by the Ingress should have an anycast IP as the destination address, which will trigger the routing policy pre-configured. To enable the basic cooperation in CATS, we need one or a set of default computing metrics to be notified into the network. All the CATS Ingresses need to understand the default metrics and trigger the same or similar operations inside the router. It is to say that Ingress should enable some default routing policy after receiving the default metrics. The detailed procedures inside the Ingresses are vendor-specific, for example, which default policy is enabled for a specific service on the router. By comparison, other metrics would be optionally, although perhaps they can obtain a better or more preferred LB result than the default ones. If the Ingress receives the additional metrics and can understand them, it can use the optional metrics to update the forwarding policy for the routes of the anycast IP. There are two kinds of forwarding treatments on the Ingress. Although they are implementations inside the equipment, we give a general description about them here, because they are related to the default metric selection. The first one is that the Ingress will deploy several routes for the anycast IP, but among them only one is active, and others are for backup and are set to inactive. The second one it that the Ingress can have multiple active routes for the anycast IP, and each route has a dedicated weight, so that a load balancing can be done within the Ingress. The advantage of the first one is that it can select a best service instance for the client according to the network and computing status. However, its disadvantage is that the Ingress will forward all the traffic of the new clients to a single service point before the policy is updated, which will potentially cause the service point to become busy. For the second one, it may achieve a better LB result. If the first one is used for a specific service, the routing policy on the Ingress may be the minimum total delay first. In this case, the Ingress needs to collect the network delay and the computing delay, and add them together to select the best Egress. If the second one is used for a specific service, the routing policy on the Ingress may be load balancing based on service capability. An initial proposal of the default metrics for the default policies is that we can always send the two metrics mentioned in the last paragraph, i.e., the computing delay and the service capability. At least one of them should be valid. The bits of the computing delay or the service capability are set to all "zero" will be considered invalid, and other values are considered valid. Meanwhile, the bits of the computing delay or the service capability are set to all "one" stands for the service point is temporary busy, and the Ingress should not send new clients to that service point. Alternatively, we can also add another simple metric to indicate the busy or not status. However, this metric is relatively more dynamic than the former two. If the Ingress receives the busy indicator from an Egress, it can change the routing policy accordingly. For example, it can change the path to the related Egress into an inactive one, or change the weight of the path to the related Egress into zero. After the change, the path to the related Egress will not be used for forwarding the CATS traffic from any new clients, i.e., it is deleted from the routing policies temporarily. After a certain time, which can be pre-configured, or receiving a new busy or not indication, the Ingress would update the status or the weight of the path to the related Egress. Thus, the path to the related Egress will be added into the routing policies again.

TBD.

The author would like to thank Adrian Farrel, Joel Halpern, Tony Li, Thomas Fossati, Dirk Trossen, Linda Dunbar for their valuable suggestions to this document.

The following people have substantially contributed to this document: