Skip to content

Instantly share code, notes, and snippets.

@rob-murray
Last active February 4, 2024 02:12
Show Gist options
  • Save rob-murray/01d43581114a6b319034732bcbda29e1 to your computer and use it in GitHub Desktop.
Save rob-murray/01d43581114a6b319034732bcbda29e1 to your computer and use it in GitHub Desktop.
A regex to check UK Company numbers - ie the reference assigned to companies in the UK via Companies House
/^((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU))))$/
@Kiwidave68
Copy link

@mrbrianevans I think it might be because it's doing a lazy check - matches on the first 8 so doesn't bother looking any further?

@tankbob
Copy link

tankbob commented Mar 13, 2023

You need an extra set of brackets in the regex as otherwise the terminating $ can be ignored by the first match as it's left as part of the or condition

/^(((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))))$/

@grayfox99
Copy link

You need an extra set of brackets in the regex as otherwise the terminating $ can be ignored by the first match as it's left as part of the or condition

/^(((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))))$/

Thanks for this, totally solved the problem!

@Prometheus3375
Copy link

Prometheus3375 commented Aug 7, 2023

I think some companies can be still missing.

PC000001
PC000002
PC000003
PC000004
PC000005
PC000006
OE000001
OE000002
OE000003
OE000004
OE000005
OE000006
OE000007
OE000008
OE000009
OE000010
OE000011
OE000012

Search here.
Actually, many companies have OE prefix.

Edit: there are more unmatched prefixes.
'SP', 'NO', 'PC', 'RC', 'CE', 'SR', 'IP', 'NP', 'IC', 'CS', 'OE', 'SI'
I would like to attach a list of unmatched IDs (around 75k), but Github tell that such file is not support (it is a .txt file, ~700Kb size).

@Prometheus3375
Copy link

I've written this page with a list of the meanings of the various prefixes: https://chguide.co.uk/general/company-number.html . I have also found a quite comprehensive list in some old docs:

This page has description for PC prefix, but the provided regex does not include it.
In addition, in the official documentation SI prefix listed in the section with prefixes for companies with no available data (only name is available).

image

@mrbrianevans
Copy link

Had another go at it:

/^(((AC|CE|CS|FC|FE|GE|GS|IC|LP|NC|NF|NI|NL|NO|NP|OC|OE|PC|R0|RC|SA|SC|SE|SF|SG|SI|SL|SO|SR|SZ|ZC|\d{2})\d{6})|((IP|SP|RS)[A-Z\d]{6})|(SL\d{5}[\dA]))$/;

The rules around registered societies (starting with IP|SP|RS prefix) could probably be tightened up, this is a more permissive regex, matches all the company numbers in the bulk CSV file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment