Skip to content

Instantly share code, notes, and snippets.

@eightyknots
Last active March 20, 2024 20:05
Show Gist options
  • Star 78 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save eightyknots/4372d1166a192d5e9754 to your computer and use it in GitHub Desktop.
Save eightyknots/4372d1166a192d5e9754 to your computer and use it in GitHub Desktop.
AvReg: Aviation regex match toolkit

AvReg: The Aviation RegEx Match Toolkit

General Tips

  • The PCRE flavour of RegEx is used here.
  • Append the i modifier to the end of the regex to make any pattern case-insensitive.

Aircraft

Purpose Description RegEx Example
Registration National registration /^[A-Z]-[A-Z]{4}|[A-Z]{2}-[A-Z]{3}|N[0-9]{1,5}[A-Z]{0,2}$/ N390HA
IATA aircraft type 3-character type code /^[A-Z0-9]{3}$/ 32N
ICAO aircraft type (Typically) 4-character type code /^[A-Z]{1}[A-Z0-9]{1,3}$/ A32N

Airline Codes

Purpose Description RegEx Example
IATA code Commercial service mark /^[A-Z][\d]|[\d][A-Z]|[A-Z]{2}$/ CX
ICAO code Operational service mark /^[A-Z]{3}$/ CPA
Ticketing prefix eTicket operator code /^[0-9]{3}$/ 160

Airport Codes

Purpose Description RegEx Example
IATA code Commercial service mark /^[A-Z]{3}$/ LHR
ICAO code Operational service mark /^[A-Z]{4}$/ EGLL
FAA code US FAA-specific locator /^[A-Z0-9]{3,4}$/ L67

Air Navigation & Communication

Notes:

  • Privately-owned Canadian NDBs may utilize a letter and number combination.
  • VFR squawk codes are generally 1200 in North America and 7000 in Europe.
Purpose Description RegEx Example
NDB Non-directional beacon identifier /^[A-Z]{1,3}$/ TD
VOR VHF omnidirectional range ident /^[A-Z]{3}$/ APU
INT Airway intersection waypoint /^[A-Z]{5}$/ PRAWN
Squawk Code Unique transponder octal code /^[0-7]{4}$/ 0318
Distress If match, aircraft is in distress /^7[567]00$/ 7700
VFR Squawk XPDR code for aircraft under VFR /^(1200)|(7000)$/ 1200
Runways Standard runway identifiers 01-36 /^(0?[1-9]|[1-2]\d|3[0-6])[LCR]?$/ 36L

Ticketing & Business Operations

Note that for PNR record identifiers, some GDS providers and operators use 5-character PNR idents, but most use 6-character ones. Additionally, for readibility purposes, some airlines and systems will skip 0, 1, I, L, and O.

Purpose Description RegEx Example
PNR identifier Passenger record locator /^[A-Z0-9]{5,6}$/ J5XTP2
E-ticket number Ticketing and itinerary identifier /^[0-9]{3}(-)?[0-9]{10}$/ 160-4837291830

Flight Operations

Purpose Description RegEx Example
Flight number IATA (marketing) flight number /^([A-Z][\d]|[\d][A-Z]|[A-Z]{2})(\d{1,})$/ BA026
Callsign ICAO (operational) flight number /^[A-Z]{3}[A-Z0-9]{1,}$/ BAW319K
@DonDebonair
Copy link

According to my colleague, IATA airline codes can also contain 3 characters, and the first character should always be A-Z. Do you know if this is official?

@martin-ro
Copy link

https://en.wikipedia.org/wiki/Airline_codes
https://en.wikipedia.org/wiki/List_of_airline_codes
The first character can be numeric, for example 4O or 7A.
The shortest one I know would be BA1, but can also be something like LH123G.

@eightyknots
Copy link
Author

@dandydev I don't think so. Do you have any samples?

@eightyknots
Copy link
Author

Added Runway Identifiers regex. For runway numbers < 10, the preceding 0 is by default marked as optional. To make this more strict if your use-case requires a left pad 0, remove the ?.

@Criscrosbf
Copy link

Hi!

This is a great resource, congrats.
The only thing missing is the note about Easyjet PNR format difference:
They are typically 6 characters in length though Easyjet currently uses record locators which are either 6 or 7 characters.
https://en.wikipedia.org/wiki/Record_locator

Best,

@eightyknots
Copy link
Author

eightyknots commented Apr 5, 2020

@Criscrosbf I'm a little wary about changing that especially if it only seems to be for one carrier. While it does look like EZY/U2 PNRs are 7 characters long, it seems like they just prepend an "E" in front of a regular PNR to form a booking reference.

If you do want to validate that though, you can use this regex specifically for Easyjet: /^E[A-Z0-9]{6}$/

@asdf913
Copy link

asdf913 commented Jul 28, 2020

Thanks you for you contribution.
The regular expressions are very useful for my work.

@Spookyguy
Copy link

Spookyguy commented Aug 16, 2020

Thanks for your contribution. It was helpful to me as well, but I am missing a regex for an aircraft registration number. As for now I am using

/^[A-Z]-[A-Z]{4}|[A-Z]{2}-[A-Z]{3}|N[0-9]{1,5}[A-Z]{0,2}$/

But I am sure, this one could be refined further.

@eightyknots
Copy link
Author

Thanks for the contrib!

But I am sure, this one could be refined further.

This is fine and works. I'm wondering if the dash should be optional depending on some people's internal use cases. I agree for displaying per ICAO standards it should have the dash, but for sorting reasons, maybe not? Will likely add this as a few use cases, including one just for US N-numbers.

@kasidev
Copy link

kasidev commented Oct 17, 2020

Hi
I am thinking about adding a REGEX for SITA type load messages and movement messages. Is this the right repo for this kind of thing?

Examples:
_Movement message

MVT
AZ1074/08.HBXXX.ZRH
AD1454/1458 EA1555 FRA
DL46/02/0005/0004
PX023
DLA/02B//

Loadmessage

LDM
AZ464/08.HBXXX.C10Y92.2/3
-LCY.28/14/0/0.0.T262.1/262.PAX/1/41.JMP/0.CRW/0.PAD/0/0
SI LCY BAG 262 POS 0 FRE 0
PAX WEIGHTS USED M88 F70 C35 I0
SERVICE WEIGHT ADJUSTMENT WEIGHT/INDEX
ADD
NIL
DEDUCTIONS
NIL
LCY C 0 M 0 B 18/ 262 O 0 T 0_

@clarkewing
Copy link

Concerning the ICAO Aircraft type, I believe we should be using the following instead:

/^[A-Z]{1}[A-Z0-9]{2,3}$/

When checking the ICAO's own website, you can find type codes consisting of only three characters (for example: SW3)

@eightyknots
Copy link
Author

eightyknots commented Mar 26, 2021

Thanks for the contrib @clarkewing! Looks like you are right - ICAO also allows A9 so I've updated the regex accordingly.

@danielbellhv
Copy link

NDB should be /^[A-Z]{1,3}$/

@laurensnl
Copy link

Thanks for sharing this! Very helpful!

@0x80
Copy link

0x80 commented Mar 25, 2022

The regex for flight code /^[A-Z0-9]{3,}$/ seems too loose. It will happily accept 123 or 1234.

I am not familiar with the official specs for airline IATA codes, but I suspect the regex could enforce that in the first two characters there should be at least one alphabetic.

This is the best I could come up with /^([A-Z][\d]|[\d][A-Z]|[A-Z]{2})(\d{1,})$/. This contains two capture groups one for IATA and one for flight number.

It will match on things like KL1234, 9F123, K91 but reject anything that is just numbers. See https://regex101.com/r/nsec7z/1

@eightyknots
Copy link
Author

The regex for flight code /^[A-Z0-9]{3,}$/ seems too loose. It will happily accept 123 or 1234.

I am not familiar with the official specs for airline IATA codes, but I suspect the regex could enforce that in the first two characters there should be at least one alphabetic.

This is the best I could come up with /^([A-Z][\d]|[\d][A-Z]|[A-Z]{2})(\d{1,})$/. This contains two capture groups one for IATA and one for flight number.

It will match on things like KL1234, 9F123, K91 but reject anything that is just numbers. See https://regex101.com/r/nsec7z/1

This replacement is acceptable. I've updated the gist with this regex for IATA flight number as well as IATA airline code. Thanks for contributing, @0x80!

NDB should be /^[A-Z]{1,3}$/

@danielbellhv, thanks for catching that! Updated.

@obotor
Copy link

obotor commented Jan 20, 2023

Hi!
For aircraft registration, the dash is optional - it is only written for readability.
However the regex misses some "exotic" registration schemes, such as S7 for Seychelles or 9H for Malta. Then you also may encounter such weird codes as 2- (Guernesey), RDPL- (Laos) or A40 (Oman)...
I suggest the following regex: /^[A-Z]-[A-Z]{4}|([A-Z]{2}|[A-Z1-9][A-Z]|[A-Z][A-Z1-9])-[A-Z]{3}|N[0-9]{1,5}[A-Z]{0,2}$/ but this will not cater for the weirdest codes... Your call to include these!

@Tricky-D
Copy link

Tricky-D commented Feb 13, 2023

According to the rules of the FAA, US Registration (Tail Codes) must start with an "N", then 1-5 numbers, or 1-4 numbers followed by one letter, or 1-3 numbers followed by two letters. The first number cannot be zero and the letters "I" and "O" are not used.

So, taking all that into account here is the Regular Expression I came up with for US Aircraft Registrations...
[N][1-9] (\d{0,4} | \d{0,3}[A-HJ-NP-Z] | \d{0,2}[A-HJ-NP-Z]{2})

...

@mtowers
Copy link

mtowers commented Feb 21, 2023

The FAA airport code regex is incomplete. FAA identifiers can be 3-5 alphanumeric characters (with some exceptions). Although, I can't find any examples of a five-character location.

I think something like this is a little closer: /(\b[A-Z0-9]{3,4}\b)+/

As per Wikipedia on FAA identifiers:

The Federal Aviation Administration location identifier (FAA LID) is a three- to five-character alphanumeric code identifying aviation related facilities inside the United States, though some codes are reserved for, and are managed by other entities.[1]: §1–2-1 

For nearly all major airports, the assigned identifiers are alphabetic three-letter codes, such as ORD for Chicago O’Hare International Airport. Minor airfields are typically assigned a mix of alphanumeric characters, such as 8N2 for Skydive Chicago Airport and 0B5 for Turners Falls Airport. Private airfields are assigned a four-character identifier, such as 1CA9 for Los Angeles County Fire Department Heliport. The location identifiers are coordinated with the Transport Canada Identifiers described below.

In general, the FAA has authority to assign all three-letter identifiers (except those beginning with the letters K, N, W, and Y), all three and four character alphanumeric identifiers, and five-letter identifiers for the United States and its jurisdictions. The Department of the Navy assigns three-letter identifiers beginning with the letter N for the exclusive use of that Department. Transport Canada assigns three character identifiers beginning with Y. The block beginning with letter Q is under international telecommunications jurisdiction, but is used internally by FAA Technical Operations to identify National Airspace equipment not covered by any other identifying code system. The block beginning with Z identifies United States Air Route Traffic Control Centers.[1]: §1–2-2

@eightyknots
Copy link
Author

@mtowers - Thank you, I've updated the FAA identifiers per your suggestion with the exception of using $^ tokens instead of word boundaries. This is just to keep everything else in line.

@Tricky-D & @obotor - Thanks for your inputs as well. I will review & make update shortly on registration numbers. It may come down to splitting US since we can be more specific with FAA registrations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment