Skip to content

Instantly share code, notes, and snippets.

@rob-murray
Last active February 4, 2024 02:12
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rob-murray/01d43581114a6b319034732bcbda29e1 to your computer and use it in GitHub Desktop.
Save rob-murray/01d43581114a6b319034732bcbda29e1 to your computer and use it in GitHub Desktop.
A regex to check UK Company numbers - ie the reference assigned to companies in the UK via Companies House
/^((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU))))$/
@rob-murray
Copy link
Author

I can't find a specific spec for this and have pieced this together from various sources - it might not be 100% correct

@mrbrianevans
Copy link

I've tested the below regex on a database of 13.6 million real company numbers, and they all match. It was sourced from a combination of looking at various documentation and also trial-and-error testing on the database of company numbers.

/^((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU))))$/

This is made up of "normal" companies, which end in 6 numeric digits:

(AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R)

Registered societies which are prefixed with RS and end in 0,1,2 or 3 alphabet letters:

(RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS)

And scottish limited partnerships and northern ireland companies which are prefixed with SL or NI and can optionally end in an A:

(NI|SL)\d{5}[\dA]

And england or wales LLPs which are prefixed with OC and can end in various alphabet letters:

OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))

@rob-murray
Copy link
Author

amazing @mrbrianevans thanks 👍

@Kiwidave68
Copy link

@mrbrianevans This is very useful! I have a couple of question about it though.

  1. For "normal" companies, is it valid to have
  • 7 digits followed by C or R; e.g. 1234567C
  • 2 letters, 5 digits and C or R; e.g. AC12345C
  1. For English /Welsh LLPs, is this intentional: [\dP]? So OCPPPPOC would be valid?

@mrbrianevans
Copy link

Hi there @Kiwidave68 ,

  1. normal companies ending in R or C
  • I only found one company that is 7 digits followed by R. Can't find any information about it though, so not sure the reason. 0013694R WOODFORD AND BRAMHALL ROYAL BRITISH LEGION CLUB LIMITED. I put it in the regex so that this valid company would validate, but because its only one instance, it is possible that it was an error or perhaps it was an old notation for registered societies? I'm not too sure.
  • Yes it is possible to have 2 letters, 5 digits and then a C or R. See IP13694R
  • I believe credit union company numbers end with C . These are usually 7 digits followed by C, but could have a 2 letter prefix as well, see RS00592C. These company numbers are extremely rare, and I think nowadays they are issued with an IP prefix, but they do exist for a few historical companies.
  1. I don't think that P is correct. There was just a single company in my database that had a P in the middle of their company number, but the company is now dissolved and none others have that strange letter in their company number. It can probably be safely removed from the Regex. (OC0P223R BANN AREA TRAINING SERVICES LIMITED)

I've written this page with a list of the meanings of the various prefixes: https://chguide.co.uk/general/company-number.html . I have also found a quite comprehensive list in some old docs:
company number prefixes

Do let me know if you have any more feedback on the Regex, I am very keen to get it to be as accurate as possible. Thanks for taking the time to look at it.

@rob-murray
Copy link
Author

@mrbrianevans @Kiwidave68 Thanks for improving this - I'm not working at the org that used this regex at the moment so haven't been able to check all this but I'll update the regex in the gist with this for new people coming here or feel free to fork and I'll point them there

@Kiwidave68
Copy link

@mrbrianevans Thanks for the speedy response :) We'll be using this for user input validation, so good to get it a right as possible, although I suspect 99% of our input will be 'normal' companies.

@Kiwidave68
Copy link

@rob-murray @mrbrianevans Another tweak would be the addition of (?!.{9}) to the beginning to limit the value to 8 characters:
/^(?!.{9})((AC...

@mrbrianevans
Copy link

@Kiwidave68 I'm not sure its necessary to add that to the beginning, because its already checking the full length by having the ^ at the start and $ at the end and specifying the allowed length of each segment. Do you have an example of a 9 character string that matches my regex? You can test it out on here: https://regexr.com/734i1
I suppose it might depend on how you use the regex in your application, but if the logic is that the entire string must match the regex, then it should be fine without the (?!.{9}).

@Kiwidave68
Copy link

@mrbrianevans OK, I was testing it on https://regex101.com/ and with some C# unit tests, and 123456789 isn't rejected. It is rejected on your link however. I guess we can just do a length check in code :) I'm not a regex guru, so happy to go with your original one :)

@mrbrianevans
Copy link

mrbrianevans commented Nov 24, 2022

Okay @Kiwidave68 . Indeed you are right, I tried in JavaScript and it does validate the 9 digit number. Not sure why that is, can't immediately see the flaw.

> /^\d{8}$/.test('123456789')
false

In theory it should work like above simplified example, which rejects the 9 digit number.

Checking length seperately might be the best option though, because many people omit leading zeros in company numbers, so they need to be normalised to 8 characters length before validation. Eg 09226141 can be written 9226141.

@Kiwidave68
Copy link

@mrbrianevans I think it might be because it's doing a lazy check - matches on the first 8 so doesn't bother looking any further?

@tankbob
Copy link

tankbob commented Mar 13, 2023

You need an extra set of brackets in the regex as otherwise the terminating $ can be ignored by the first match as it's left as part of the or condition

/^(((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))))$/

@grayfox99
Copy link

You need an extra set of brackets in the regex as otherwise the terminating $ can be ignored by the first match as it's left as part of the or condition

/^(((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))))$/

Thanks for this, totally solved the problem!

@Prometheus3375
Copy link

Prometheus3375 commented Aug 7, 2023

I think some companies can be still missing.

PC000001
PC000002
PC000003
PC000004
PC000005
PC000006
OE000001
OE000002
OE000003
OE000004
OE000005
OE000006
OE000007
OE000008
OE000009
OE000010
OE000011
OE000012

Search here.
Actually, many companies have OE prefix.

Edit: there are more unmatched prefixes.
'SP', 'NO', 'PC', 'RC', 'CE', 'SR', 'IP', 'NP', 'IC', 'CS', 'OE', 'SI'
I would like to attach a list of unmatched IDs (around 75k), but Github tell that such file is not support (it is a .txt file, ~700Kb size).

@Prometheus3375
Copy link

I've written this page with a list of the meanings of the various prefixes: https://chguide.co.uk/general/company-number.html . I have also found a quite comprehensive list in some old docs:

This page has description for PC prefix, but the provided regex does not include it.
In addition, in the official documentation SI prefix listed in the section with prefixes for companies with no available data (only name is available).

image

@mrbrianevans
Copy link

Had another go at it:

/^(((AC|CE|CS|FC|FE|GE|GS|IC|LP|NC|NF|NI|NL|NO|NP|OC|OE|PC|R0|RC|SA|SC|SE|SF|SG|SI|SL|SO|SR|SZ|ZC|\d{2})\d{6})|((IP|SP|RS)[A-Z\d]{6})|(SL\d{5}[\dA]))$/;

The rules around registered societies (starting with IP|SP|RS prefix) could probably be tightened up, this is a more permissive regex, matches all the company numbers in the bulk CSV file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment