Monday, June 28, 2021

Cleaning Up RPKI Invalids In The AFRINIC Service Region

Introduction

With the growing adoption of RPKI based ROV to make the internet more secure and resilient, it is becoming increasingly important for IP network operators or IP prefix owners to ensure that their BGP advertisements are not seen as RPKI Invalid on the internet. Recent years have seen a significant increase in IP Transit providers (across tiers) and IXPs, implementing ROV and filtering Invalid prefix-origin AS pairs.

This article describes an initiative taken to alert and propose corrective measures to network operators with RPKI Invalid prefix-origin pairs in the AFRINIC service region.

Background

Although the concept dates back a decade, in the past 3 years Resource Public Key Infrastructure (RPKI) based Route Origin Validation (ROV) (RFC7115) has been on the lips of every network operator that cares about the resilience of the internet. Many Tier1 and Tier2 networks have implemented ROV fully enabled by dropping RPKI Invalid prefixes.

A prefix received via Border Gateway Protocol (BGP) can either be RPKI Valid, NotFound, or Invalid (RFC6811). RPKI Invalid prefix-origin pairs can occur due to a BGP "route leak" (RFC7908) or incorrect creation of Route Origin Authorizations (ROAs). These are in many cases due to a "fat finger", design oversight or not fully understanding the implications of certain ROA configurations.

At the time of implementing this initiative, there were just above 100 RPKI Invalid prefixes (IPv4) being announced that either belong to AFRINIC Autonomous System Numbers (ASNs) and are announced by non-AFRINIC ASNs, OR are announced incorrectly by the rightful AFRINIC ASN. The latter case occurs when a prefix is deemed RPKI Invalid due to the MaxLength property of a ROA - A network owner may have created a ROA with a MaxLength that is less than the length of the prefix being advertised. This is usually an internal mistake that the operator may not be aware of.

A network operator may advertise a prefix with different lengths to different upstream providers or BGP peers based on a routing policy that best suites their business. Having the more specific prefix (longer length) being dropped by networks with RPKI based ROV enabled, means that the intended routing behavior becomes sub-optimal.

For example, if a network operator has multiple paths for a prefix, they could advertise the longest length (eg A.B.C.D/24) to the preferred shorter and cheaper path, and advertise a less specific prefix length to the secondary path (eg A.B.C.D/16). If the network operator has a ROA with MaxLength of 16 for example, the A.B.C.D/24 advertisement would be RPKI Invalid due to MaxLength. Upstream network operators who receive both prefixes from different sources, but have RPKI based ROV fully enabled, would therefore only use the secondary path. This is an oversimplified example but I hope it demonstrates the unintended cost implications for the network operator and/or the compromised end-user experience that can arise from this oversight.

There is therefore some value to be gained in alerting network operators of instances of their RPKI Invalid prefix-origin pairs and, cleaning them up is good for the general health of the internet.

Methodology

The procedure followed to implement this initiative can be broken down to 4 simple steps:

  • Collecting a list of AFRINIC RPKI Invalids
  • Gathering contact details of ASNs involved
  • Contacting the network operators with diagnosis and suggested remedies
  • Monitoring operator feedback or changes to status

Collecting a List of RPKI Invalids for AFRINIC Service Region

There are a number of platforms that one can use to analyse RPKI statistics depending on specific research requirements. NLnet Labs maintains a useful list on their RPKI readthedocs Resources page. My case required a tool that would allow me to filter a list of RPKI Invalid prefix-origin pairs by Regional Internet Registry (RIR). My closest match was the US National Institute of Standards and Technology (NIST) RPKI Monitor.

Figure1 : Screenshot of parts of the NIST RPKI Monitoring Tool Menu with "Invalid Prefix-Origin Pairs"

When I initially wanted to start this project, I found that the NIST tool's feature for filtering by RIR was buggy. After a number of calls for help on various social media platforms such as Twitter and the RPKI Community on Discord, someone at NIST must have heard my cry and fixed it as part of release version 2 of the tool in May 2021.



The NIST data used in this initiative was collected on and for the 22nd of June 2021. A bonus match of my requirements from the NIST Monitoring tool, is that its analysis features include the ability to expand an RPKI Invalid Prefix-Origin Pair to show the covering prefixes (Figure2 below). That saved me some digging time, but a sanity check I did was to confirm the existence of the associated ROA through another web-based RPKI Validator like AFRINIC's deployment of the Routinator Validator.

Figure2: A Detailed View of The Analysis Feature of Each RPKI Invalid Prefix-Origin Pair on the NIST RPKI Monitoring Tool

Further details of each involved ASN and peering relationships were extracted from bgp.he.net. It's also important to note that the above exercise was only performed for IPv4.

Collecting Network Operator Contact Details

An indirect achievement of this initiative was being able to verify whether the network operator's PeeringDB records included a working e-mail address. This speaks to Action3 of the Mutually Agreed Norms for Routing Security (MANRS) for Network Operators, which requires an operator at minimum to have up-to-date contact information on PeeringDB.

In a case where a network operator had no contact information on PeeringDB, I scraped through the operator's IRR objects (RFC2650).

Contacting Network Operators

With a diagnosis of the problem and some options for the operator to consider to remedy the problem, I contacted each operator via e-mail. The following is an example of an email I sent to an operator whose RPKI Invalid was due to MaxLength:

 

Figure3: A screenshot of a typical e-mail sent to relevant network operators. Specific details have been removed to protect the reputation of the network operators involved.

Monitoring

Monitoring just included regularly checking the RPKI status on NIST Monitoring tool for these recorded prefixes.

Results

As mentioned earlier, there were 107 AFRINIC RPKI Invalid prefix-origin pairs when the data was initially collected. 39 of the pairs were due to the ROA MaxLength. These 39 were deemed low-hanging fruit as they required the least complicated solutions. In many of the cases an ASN would have multiple pairs of these RPKI Invalid prefix-origin and therefore needed to be contacted once for multiple prefixes, significantly reducing the amount of work for the 39.

There were instances where a network operator that owns multiple ASNs has the complexity of advertising different subnets from different ASNs but having created a single ROA covering the different subnets. As you can guess, these are some large networks where organizational complexity can also influence such messy situations. I wasn't holding my breath for a response on this one, but I proposed that they undo the quick&easy shortcut of defining a ROA with MaxLength 24. This MaxLength=24 method is common because it requires less planning and maintenance, however it can come back to bite you when your network design gets sticky as in this case and it also makes you vulnerable to ASN spoofing attacks. 

There were some networks without PeeringDB records. Through IRR records, I could gather some e-mail addresses and also asked the operators to update their PeeringDB contact details in addition to correcting their RPKI data.

Only 10 networks were contacted. Some networks were just not worth contacting in the rest of the list - From studying the ASNs involved, you could detect cases of badly managed IP brokering or undesirable dealings between ASNs that could stray me from the objective of this initiative. Perhaps an investigation and post for the day I can afford bodyguards...

Out of the 10 networks contacted, only 2 had responded to my e-mail at the time of writing this article. One of the messages was in good faith and was from someone who also cares about the health of the internet. Both networks promised to address the issues raised.

Update(1 July): The 2 network operators who responded to my e-mail had fixed their ROAs. In addition to that, 20 out of 25 RPKI Invalid prefixes belonging to one ASN had been fixed even though the network operator never responded to my e-mail. I sent them a "Thank You" e-mail anyway.

Update(5 July): An additional network operator heeded my call and fixed 1 out of the 5 RPKI Invalid prefixes I had asked them to look into.

Discussion and Conclusion

The low count of RPKI Invalid prefix-origin pairs found in this exercise should not be celebrated as a sign of cleanliness in the region. RPKI adoption in the AFRINIC service region is still fairly low. For example, RPKI NotFound makes up 89% of the total unique prefix-origin pairs in the AFRINIC region compared to the global 68.9% for IPv4.


Figure4: AFRINIC Prefix-Origin Pairs; source NIST RPKI Monitor

This exercise assisted to verify that network operators in the AFRINIC region who have implemented Action4 of the recommended MANRS actions, have done so in a way that does not harm their business.

Beyond this initiative, I think regularly checking the status of prefix-origin pairs in your region is necessary to further promote the adoption of RPKI based ROV. It ensures that Network Operators don't have self-inflicted business harming configurations in their network and end up joining the RPKI dark side.

Ideally, this should be a regular exercise done by upstream network operators who have ROV enabled to ensure that their downstream customer networks don't have any RPKI Invalid Prefix-origin pairs. I'm guessing that if an operator is contacted by another operator, they'd heed the call better than when contacted by some random internet individual.