Showing posts with label ROV. Show all posts
Showing posts with label ROV. Show all posts

Monday, June 28, 2021

Cleaning Up RPKI Invalids In The AFRINIC Service Region

Introduction

With the growing adoption of RPKI based ROV to make the internet more secure and resilient, it is becoming increasingly important for IP network operators or IP prefix owners to ensure that their BGP advertisements are not seen as RPKI Invalid on the internet. Recent years have seen a significant increase in IP Transit providers (across tiers) and IXPs, implementing ROV and filtering Invalid prefix-origin AS pairs.

This article describes an initiative taken to alert and propose corrective measures to network operators with RPKI Invalid prefix-origin pairs in the AFRINIC service region.

Background

Although the concept dates back a decade, in the past 3 years Resource Public Key Infrastructure (RPKI) based Route Origin Validation (ROV) (RFC7115) has been on the lips of every network operator that cares about the resilience of the internet. Many Tier1 and Tier2 networks have implemented ROV fully enabled by dropping RPKI Invalid prefixes.

A prefix received via Border Gateway Protocol (BGP) can either be RPKI Valid, NotFound, or Invalid (RFC6811). RPKI Invalid prefix-origin pairs can occur due to a BGP "route leak" (RFC7908) or incorrect creation of Route Origin Authorizations (ROAs). These are in many cases due to a "fat finger", design oversight or not fully understanding the implications of certain ROA configurations.

At the time of implementing this initiative, there were just above 100 RPKI Invalid prefixes (IPv4) being announced that either belong to AFRINIC Autonomous System Numbers (ASNs) and are announced by non-AFRINIC ASNs, OR are announced incorrectly by the rightful AFRINIC ASN. The latter case occurs when a prefix is deemed RPKI Invalid due to the MaxLength property of a ROA - A network owner may have created a ROA with a MaxLength that is less than the length of the prefix being advertised. This is usually an internal mistake that the operator may not be aware of.

A network operator may advertise a prefix with different lengths to different upstream providers or BGP peers based on a routing policy that best suites their business. Having the more specific prefix (longer length) being dropped by networks with RPKI based ROV enabled, means that the intended routing behavior becomes sub-optimal.

For example, if a network operator has multiple paths for a prefix, they could advertise the longest length (eg A.B.C.D/24) to the preferred shorter and cheaper path, and advertise a less specific prefix length to the secondary path (eg A.B.C.D/16). If the network operator has a ROA with MaxLength of 16 for example, the A.B.C.D/24 advertisement would be RPKI Invalid due to MaxLength. Upstream network operators who receive both prefixes from different sources, but have RPKI based ROV fully enabled, would therefore only use the secondary path. This is an oversimplified example but I hope it demonstrates the unintended cost implications for the network operator and/or the compromised end-user experience that can arise from this oversight.

There is therefore some value to be gained in alerting network operators of instances of their RPKI Invalid prefix-origin pairs and, cleaning them up is good for the general health of the internet.

Methodology

The procedure followed to implement this initiative can be broken down to 4 simple steps:

  • Collecting a list of AFRINIC RPKI Invalids
  • Gathering contact details of ASNs involved
  • Contacting the network operators with diagnosis and suggested remedies
  • Monitoring operator feedback or changes to status

Collecting a List of RPKI Invalids for AFRINIC Service Region

There are a number of platforms that one can use to analyse RPKI statistics depending on specific research requirements. NLnet Labs maintains a useful list on their RPKI readthedocs Resources page. My case required a tool that would allow me to filter a list of RPKI Invalid prefix-origin pairs by Regional Internet Registry (RIR). My closest match was the US National Institute of Standards and Technology (NIST) RPKI Monitor.

Figure1 : Screenshot of parts of the NIST RPKI Monitoring Tool Menu with "Invalid Prefix-Origin Pairs"

When I initially wanted to start this project, I found that the NIST tool's feature for filtering by RIR was buggy. After a number of calls for help on various social media platforms such as Twitter and the RPKI Community on Discord, someone at NIST must have heard my cry and fixed it as part of release version 2 of the tool in May 2021.



The NIST data used in this initiative was collected on and for the 22nd of June 2021. A bonus match of my requirements from the NIST Monitoring tool, is that its analysis features include the ability to expand an RPKI Invalid Prefix-Origin Pair to show the covering prefixes (Figure2 below). That saved me some digging time, but a sanity check I did was to confirm the existence of the associated ROA through another web-based RPKI Validator like AFRINIC's deployment of the Routinator Validator.

Figure2: A Detailed View of The Analysis Feature of Each RPKI Invalid Prefix-Origin Pair on the NIST RPKI Monitoring Tool

Further details of each involved ASN and peering relationships were extracted from bgp.he.net. It's also important to note that the above exercise was only performed for IPv4.

Collecting Network Operator Contact Details

An indirect achievement of this initiative was being able to verify whether the network operator's PeeringDB records included a working e-mail address. This speaks to Action3 of the Mutually Agreed Norms for Routing Security (MANRS) for Network Operators, which requires an operator at minimum to have up-to-date contact information on PeeringDB.

In a case where a network operator had no contact information on PeeringDB, I scraped through the operator's IRR objects (RFC2650).

Contacting Network Operators

With a diagnosis of the problem and some options for the operator to consider to remedy the problem, I contacted each operator via e-mail. The following is an example of an email I sent to an operator whose RPKI Invalid was due to MaxLength:

 

Figure3: A screenshot of a typical e-mail sent to relevant network operators. Specific details have been removed to protect the reputation of the network operators involved.

Monitoring

Monitoring just included regularly checking the RPKI status on NIST Monitoring tool for these recorded prefixes.

Results

As mentioned earlier, there were 107 AFRINIC RPKI Invalid prefix-origin pairs when the data was initially collected. 39 of the pairs were due to the ROA MaxLength. These 39 were deemed low-hanging fruit as they required the least complicated solutions. In many of the cases an ASN would have multiple pairs of these RPKI Invalid prefix-origin and therefore needed to be contacted once for multiple prefixes, significantly reducing the amount of work for the 39.

There were instances where a network operator that owns multiple ASNs has the complexity of advertising different subnets from different ASNs but having created a single ROA covering the different subnets. As you can guess, these are some large networks where organizational complexity can also influence such messy situations. I wasn't holding my breath for a response on this one, but I proposed that they undo the quick&easy shortcut of defining a ROA with MaxLength 24. This MaxLength=24 method is common because it requires less planning and maintenance, however it can come back to bite you when your network design gets sticky as in this case and it also makes you vulnerable to ASN spoofing attacks. 

There were some networks without PeeringDB records. Through IRR records, I could gather some e-mail addresses and also asked the operators to update their PeeringDB contact details in addition to correcting their RPKI data.

Only 10 networks were contacted. Some networks were just not worth contacting in the rest of the list - From studying the ASNs involved, you could detect cases of badly managed IP brokering or undesirable dealings between ASNs that could stray me from the objective of this initiative. Perhaps an investigation and post for the day I can afford bodyguards...

Out of the 10 networks contacted, only 2 had responded to my e-mail at the time of writing this article. One of the messages was in good faith and was from someone who also cares about the health of the internet. Both networks promised to address the issues raised.

Update(1 July): The 2 network operators who responded to my e-mail had fixed their ROAs. In addition to that, 20 out of 25 RPKI Invalid prefixes belonging to one ASN had been fixed even though the network operator never responded to my e-mail. I sent them a "Thank You" e-mail anyway.

Update(5 July): An additional network operator heeded my call and fixed 1 out of the 5 RPKI Invalid prefixes I had asked them to look into.

Discussion and Conclusion

The low count of RPKI Invalid prefix-origin pairs found in this exercise should not be celebrated as a sign of cleanliness in the region. RPKI adoption in the AFRINIC service region is still fairly low. For example, RPKI NotFound makes up 89% of the total unique prefix-origin pairs in the AFRINIC region compared to the global 68.9% for IPv4.


Figure4: AFRINIC Prefix-Origin Pairs; source NIST RPKI Monitor

This exercise assisted to verify that network operators in the AFRINIC region who have implemented Action4 of the recommended MANRS actions, have done so in a way that does not harm their business.

Beyond this initiative, I think regularly checking the status of prefix-origin pairs in your region is necessary to further promote the adoption of RPKI based ROV. It ensures that Network Operators don't have self-inflicted business harming configurations in their network and end up joining the RPKI dark side.

Ideally, this should be a regular exercise done by upstream network operators who have ROV enabled to ensure that their downstream customer networks don't have any RPKI Invalid Prefix-origin pairs. I'm guessing that if an operator is contacted by another operator, they'd heed the call better than when contacted by some random internet individual.

Monday, July 6, 2020

RPKI ROV Sickle in Full Swing

The month of June saw a spike in Resource Public Key Infrastructure (RPKI) and BGP Route Origin Validation (ROV) activity. This compelled me to curate some of the events that caught my eye. By increased activity, I'm referring to both increased incidents of routing security improvement and, social occasions focused on promoting RPKI and raising awareness. However, this blogpost is about the latter. The purpose began as drafting a personal summary of all the compressed RPKI knowledge consumed. It's now evolved into what I think could be useful for those that are new to the concept and may have missed some of these important resources that are shared freely on the internet.

TL;DR
You can skip reading the rest of this article and dig straight into the following fantastic four:

Since the break out of the COVID-19 global pandemic, it had been a while since techies that help keep the beast that is the global internet breathing, came together to share knowledge of this magnitude. On the 19th of June, some of these champions live-streamed a demonstration of how quick and easy it can be to set up the different components of an RPKI-fluent network. The line up represented a balanced mix of vendors and organizations that play significant roles in the make-up of the internet. This included Apstra, Arista, Cisco, Cloudflare, Juniper Networks, NLnet Labs, Nokia, Orhan Ergun LLC and RIPE NCC.




Moderated by Orhan Ergun and Jeff Tantsura, the Zoom session was also live streamed via YouTube and Facebook - a move that should be celebrated for making the live session more accessible to the global community. I personally experienced challenges joining the Zoom session so this proved handy.

With a goal of achieving the following in 2 hours, the session turned out much shorter than expected:
  • Creating Route Origin Authorizations (ROAs) via your Regional Internet Registry (RIR) member portal
  • Installing a validator in your network
  • Configuring your router to enable RPKI and perform BGP ROV

Nathalie Trenaman from RIPE NCC kicked off with creating a ROA via their member portal. She also shared some best practices, like how you should be conservative with maxLength of your ROAs, all within 5 minutes. Her introduction and background about RPKI (which I think was planned to be presented by Jeff Tantsura) actually took longer than creating a ROA. Assuming that most networks are familiar with RPKI terminology and a bit of its theory, the excuse of not having time to create ROA(s) was proved to be invalid. 🧀🧀🧀

Validator installation demos included RIPE NCC's RPKI Validator, routinator 3000 from NLNetlabs and Cloudflare's OctoRPKI. Yes, no FORT. Another representative from RIPE NCC, Ties de Kock demonstrated the RIPE Validator and RTR Server installations. After installation of the 2 packages, making some necessary configuration adjustments and starting up the services which took less than 5 minutes, he explained that the validator can take about 15 minutes to be ready with data downloaded from the different repositories.

Alex Band then took the reigns to demonstrate a routinator installation. Unlike RIPE Validator, there are no packages to download for installation and routinator is built from source code - a conscious decision they made because of its frequent releases. To set up the environment for routinator, you need to install rsync, the C toolchain and Rust (the programming language that routinator is written in). While waiting for the routinator installation to complete, Alex spoke about their transition from using rsync to https for rrdp. The final step is to start the application as an RTR server and wait about 10 minutes for data to be downloaded from the difference repositories. The demonstration took roughly 7 minutes.

Loius Poinsignon from Clouflare then demonstrated OctoRPKI. Similar to the one from RIPE NCC, the validator and RTR server are separate packages in the form of OctoRPKI and GoRTR respectively. Loius explained that this has the advantage of having GoRTR running on a different machine closer to your routers while the validator is hosted elsewhere safe. His demonstration was also within 7 minutes.

Enabling Routers to Speak RPKI

In what seemed like a friendly tournament of the router vendors, there were demonstrations from Juniper Networks by Melchior Aelmans, Nokia's Greg Hankins, Florian Hibler from Arista, and Cisco's duo Vinay Shankarkumar and Jakob Heitz. Setting up their routers to a point of dropping Invalids and showing off some troubleshooting commands took less than 10 minutes each. I've summarized how long each activity took below. Of course a beginner would take slightly longer than that because they'd have to RTFM first, but these numbers still highlight how smooth the process has become. This in no way trivializes the process.

Creating ROA
5 minutesInstalling Routinator Validator7 minutes
Installing RIPE Validator5 minutes
Installing OctoRPKI Validator7 minutes
Configuring JUNOS for ROV
5 minutes
Configuring Arista EOS for ROV 8 minutes
 *Configuring Cisco IOS XR for ROV16minutes
Configuring Nokia SR OS for ROV5 minutes

* The Cisco session had Q&A during the demonstration unlike the others who held it off until their configurations were completed

The total amount of time reflected in the table is only about an hour of a 1-hour-45-minute session. There are many gems of Q&A throughout the session including an interesting wrap up. I strongly encourage a watch of the entire video.

Mikrotik roller coaster

It was a pity that there wasn’t a Mikrotik Zero To Hero demonstration given that earlier in the month the industry was excited about the announcement that RouterOS had a beta release of an RPKI ROV ready image and Massimiliano Stucchi, under his personal ASN AS58280 had taken it for ride a few days before the RIPE event. These were big news for the industry given that Mikrotik has a large footprint in small and startup networks, especially in developing countries and continents.

Unfortunately, this Mikrotik party was crushed in another RPKI studded session held a week later. Enter "InterCommunity: Securing Global Routing" featuring Melchior Aelmans (this time wearing a moderator mask), Abdul Awal of the Bangladesh National DataCentre, Mark Tinka of SEACOM, Kevin Blumberg from TORIX, Jorge Cano from NIC.mx, and Tashi Phuntsho from APNIC. It was during Tashi's slot that it was revealed that the RPKI implementation on RouterOSv7.0beta8 is broken, by pointing us to a Mikrotik Forum discussion where his team raised the issues experienced and Mikrotik also confirmed the bug.




This Internet Society (ISOC) #ICOMM2020 event involved the panelists sharing their experiences from their various contributions in making global routing more secure. Awal's presentation on his project RPKI Deployment in South Asia was a good place to start and he dropped many gems on how to mobilize RPKI adoption in your region. He also published a write-up on the same topic in June.

Mark Tinka (Head of Engineering at Seacom) took us back to when they tried to implement RPKI in 2014 only to discover bugs in Cisco IOS XE and that not many networks had deployed ROV. Dropping invalids while your competitors allow such traffic through can put you at a business disadvantage. Together with another IP Transit network provider in the same region, Workonline Communications, they went live with RPKI in April 2019. He found that IOS XR and JUNOS worked well while IOS XE was still buggy. This was supported by Kevin Blumberg (President of the Toronto Internet Exchange) who shared their history of ticking all the boxes of the MANRS actions as the largest IXP in Canada. Echoing what was shared by many in the Zero To Hero event, Kevin runs 2 RPKI validator servers in parallel and wants to see how they differentiate for the foreseeable future.

Representing the open source software development efforts of NIC Mexico, Jorge Cano took us through an introduction to FORT Zero To Hero gap-filler which is their most recent contribution to the community. Again the running of 2 or more validators in your network was stressed.

Similar to Mark Tinka, Tashi Phuntsho took us back to January 2014 where they started to implement RPKI ROV using IOS XE and things broke spectacularly to the point of upsetting the local king who happened to be their customer 😂. Tashi stressed the need to test your validators by showing us an example of the asymmetry between Validated ROA Payloads (VRPs), extracted a couple of hours before the webinar, from FORT and Routinator. Please watch the video and see how significant the difference can be. He also shares some of the brilliant outreach work they've been doing.

It's worth mentioning that Tashi was also involved in an online APNIC webinar (Securing Internet Routing Tutorial) on the same topic earlier in the month. I personally didn't attend it but I'm sure it was useful given the topics listed in the agenda. If you have a link to the actual content and have permission to distribute it please leave it in the comments below and I'll have it added here.

Short Break (not an advertisement/sponsored content):
There was also a major update to Krill. Check out Krill Gains Powerful ROA Management Based on BGP Routing by Alex Band.
I also need to make mention of this Excuse Me, Your BGP Is Leaking episode by the #theInternetReport which has some interesting global routing security incidents and RPKI news for June.

Measuring Route Origin Validation

Last and certainly not least, there's a brilliant stock-take of how far we've come with ROV by the internet veteran and APNIC Chief Scientist Geoff Huston. This new way of measuring ROV is where I should respectively end this post. Geoff Huston and Joao Damas challenge you to think deeper about the numbers we often see being reported and the implications of dropping Invalids from an end-user perspective. It's definitely worth your time. Also look out for the surprising findings about Africa in his article.
Bonus: You can couple Geoff's blog post with a him being interviewed by Mehmet Akcin in the same month where he drops some fascinating insights.

blogger-node#show post summary

More awareness could have been raised in my continent/region about the Zero to Hero RIPE event. There was an alert shared about it on the ZANOG mailing list but I don't think there was anything beyond that. The session covered all the basics and, with an AFRINIC ROA creation example perhaps, it would be a great addition to the workshops that are being run in promotion of RPKI by various organizations in the region. If you're from my region, you had to have been on the RIPE mailing lists or eagerly following feeds about RPKI on social media to know about it.

The #ICOMM2020 event was well marketed in my region. I'm guessing this is because the organizing team consisted of ISOC representatives from Africa. It would be great to see Mikrotik release a working version of their code (one that also doesn't break IPv6) very soon. Overall, I wish that the momentum gained in the past month is not just an annual climax and hope that it will be kept up until that all-invalids-dropped day.

May your Valids live long and prosper!