<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-xu-idr-bgp-route-broker-01"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Route Broker">BGP Route Broker for Hyperscale
    SDN</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>xuxiaohu_ietf@hotmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shraddha Hegde" initials="S." surname="Hegde">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shraddha@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Srihari Sangli  " initials="S." surname="Sangli">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>ssangli@juniper.net</email>

        <uri/>
      </address>
    </author>

    <date day="1" month="August" year="2023"/>

    <abstract>
      <t>This document describes an optimized BGP route reflector mechanism,
      referred to as a BGP route broker, so as to use BGP-based IP VPN as an
      overlay routing protocol for hyperscale data center network
      virtualization environments, also known as Software-Defined Network
      (SDN) environments.</t>
    </abstract>
  </front>

  <middle>
    <section title="Problem Statement">
      <t>BGP/MPLS IP VPN has been successfully deployed in world-wide service
      provider networks for two decades and therefore it has been proved to be
      scalable enough in large-scale networks. Here, the BGP/MPLS IP VPN means
      both BGP/MPLS IPv4 VPN <xref target="RFC4364"/> and BGP/MPLS IPv6 VPN
      <xref target="RFC4659"/> . In addition, BGP/MPLS IP VPN-based data
      center network virtualization approaches described in <xref
      target="RFC7814"/>, especially in the virtual PE model described in
      <xref target="I-D.ietf-bess-virtual-pe"/> have been widely deployed in
      small to medium-sized data centers for network virtualization purpose,
      also known as Software Defined Network (SDN). Examples include but not
      limited to OpenContrail.</t>

      <t>When it comes to hyperscale cloud data centers typically housing tens
      of thousands of servers which in turn are virtualized as Virtual
      Machines (VMs) or containers, it usually means there would be at least
      tens of thousands of virtual PEs, millions of VPNs and tens of millions
      of VPN routes from the network virtualization perspective provided the
      virtual PE model as mentioned above (a.k.a., a host-based network
      virtualization model) is used. That means a significant challenge on
      both the BGP session capacity and the VPN routing table capacity of any
      given BGP router.</t>

      <t>It&rsquo;s no doubt that the route reflection mechanism should be
      considered in order to address the BGP scaling issues as mentioned
      above. Assume a typical one-level route reflector architecture is used,
      it's straightforward to divide all the VPNs supported by a data center
      into multiple route reflectors with each route reflector being
      preconfigured with a block of route targets associated with partial
      VPNs. In other words, there is no need to have any one route reflector
      maintain all the VPN routes for all the VPNs supported by the data
      center. For redundancy, more than one route reflector may be
      preconfigured with the same block of route targets.</t>

      <t>Provided each virtual PE had been attached with at least one VPN
      corresponding to a given route reflector, that particular route
      reflector would have to establish BGP sessions with all virtual PEs, it
      would become a huge BGP session pressure on route reflectors.Now assume
      that another level (bottom-level) of route reflectors is introduced
      between the existing level (top-level) of router reflectors and the
      virtual PEs. Each top-level route reflectors would establish BGP
      sessions with all bottom-level route reflectors rather than all virtual
      PE routers. In addition, bottom-level just need to establish BGP
      sessions with a subset of all virtual PEs respectively. As a result, the
      scaling issue of the BGP session capacity is solved through the above
      partition mechanism. However, if the collection of VPNs attached to
      those route reflector clients (i.e., virtual PEs) belonging to a given
      bottom-level route reflector covers the all VPNs supported by the data
      center, that particular bottom-level route reflector would have to hold
      all the VPNs and all the VPN routes. It means a huge challenge on that
      particular route reflector.</t>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in
        BCP14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section title="Solution Overview">
      <t>Assume the number of BGP sessions to be established on each
      bottom-level route reflectors can not be reduced further due to some
      reasons (e.g., it becomes unacceptable to manage too many
      route-reflectors), the number of VPN routes to be maintained on each
      bottom-level route reflectors should be alleviated by some means.</t>

      <t>By learning from the message queue mechanisms (e.g., RabbitMQ and
      RocketMQ), those bottom-level route reflectors, referred to as route
      brokers in the following text, work as follows: they just need to
      maintain the route target membership information of their BGP peers and
      reflect VPN routes on demands without the requirement of maintaining VPN
      routes permanently.</t>
    </section>

    <section title="Route Target Membership Advertisement Process">
      <t>Top-level route reflectors, referred to as route servers, advertise
      route target membership information according to the preconfigured block
      of Route Targets. As such, route brokers know the VPNs associated with
      each of them. The route target membership information received form
      route servers SHOULD NOT be reflected by route brokers to any other iBGP
      peers further.</t>

      <t>Virtual PEs, referred to as route broker clients, advertise route
      target membership information according to the block of Route Targets
      which are dynamically configured. The route target membership
      information received from route broker clients would be deemed by route
      brokers as an implicit route request for all the VPN routes for the VPNs
      associated to the corresponding route targets, and only need to be
      reflected towards the corresponding route servers which are associated
      with the VPNs associated with the advertised route targets.</t>
    </section>

    <section title="Proactive Route Distribution Process">
      <t>Upon receiving a route update message from a route server which
      contains VPN routes for a given VPN, route brokers would reflect the
      received routes to those of its route broker clients which are
      associated with that VPN. Upon receiving a route update message from a
      route broker client which contains VPN routes for a given VPN, route
      brokers would reflect the received routes to the other iBGP peers
      (including route servers and route broker clietns) which are associated
      with that VPN.</t>

      <t>Once the route reflection is finished, the above routes would be
      deleted.</t>
    </section>

    <section title="Passive Route Distribution Process">
      <t>Upon receiving an implicit route request for all the VPN routes for
      one or more VPNs (via the route target membership information
      advertisement) from a route broker client, route brokers SHOULD reflect
      that request to the corresponding route servers which are associated
      with the VPNs pertaining to the advertised route targets
      respectively.</t>

      <t>Upon receiving the implicit route request reflected from the BGP
      broker, route servers SHOULD respond with the corresponding VPN routes
      to that broker which in turn reflects the received VPN routes to the
      route broker client. Once route reflection is finished, the received VPN
      routes would be deleted.</t>

      <t>To alleviate the route request processing pressure on route servers,
      route brokers COULD optionally cache the VPN routes returned from route
      servers as a response to an implicit route request for a period of time
      which is configurable. The cached routes could be directly used when
      responding to the forthcoming route request for those routes.</t>
    </section>

    <section title="BGP Session Failure Notification">
      <t>When a route broker loses the BGP connection with a given route
      broker client, it SHOULD send a Notification message towards all route
      servers to indicate the failure of the BGP connection with that route
      broker client.</t>

      <t>Upon receiving the above Notification message, route servers would
      withdraw all VPN routes with the BGP next-hop address being the failed
      route broker client.</t>

      <t>The BGP router ID of the failed route broker client could be carried
      in a TLV, which in turn is carried in a Notification message with error
      code of TBD.</t>
    </section>

    <section title="BGP Withdraw of all Routes of a VPN">
      <t>When all router servers which are configured with the same route
      target list are down, route brokers SHOULD notify their router broker
      clients to withdraw all the VPN routes for the VPNs assoicated with any
      route target within the above route target list.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>TBD</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Jie Dong for the discussion and
      review of this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.4364"?>

      <?rfc include="reference.RFC.4659"?>

      <?rfc include="reference.RFC.7814"?>

      <?rfc include="reference.I-D.ietf-bess-virtual-pe"?>
    </references>
  </back>
</rfc>
