Skip to content

Instantly share code, notes, and snippets.

@lirsacc
Last active October 22, 2017 17:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lirsacc/d1e561dbea601de576f9ffd4b1b3dc4d to your computer and use it in GitHub Desktop.
Save lirsacc/d1e561dbea601de576f9ffd4b1b3dc4d to your computer and use it in GitHub Desktop.
Interview brain teaser: subset of London tube stations

Interview brain teaser: subset of London tube stations

Question was asked during the interview and this is implementation with notes left to be done separately.

Question

Find the minimal subset of London tube stations whose characters cover all the letters of the alphabet. (Or more generally find the minimal subset of terms in a corpus that covers a given alphabet).

Result of running index.js

- Solution 1: found 6 terms => Cutty Sark for Maritime Greenwich, Belsize Park, Ladbroke Grove, Oxford Circus, Russell Square, St. John's Wood
- Solution 2: found 5 terms => Belsize Park, Willesden Junction, Vauxhall, Heron Quays, Cutty Sark for Maritime Greenwich
Brute force testing for subset length 2
Working ...
Found no solution for subset length 2
Brute force testing for subset length 3
Working ...
Found no solution for subset length 3
Brute force testing for subset length 4
Working ...
Found no solution for subset length 4
#!/usr/bin/env node
/**
* Interview brain teaser
* ======================
*
* Run: `./index.js`. Make sure node is installed, should work on versions >=6.
*
* Find the minimal subset of tube stations whose characters cover all the
* letters of the alphabet. (Or more generally find the minimal subset of terms
* in a corpus that covers a given alphabet).
*
* We consider 'minimal' as being the smallest subset, which could likely
* have multiple solutions depending on the corpus. As both solutions are
* based on scoring the terms in the corpus, we should be able to to adapt to
* an extended definition of minimal by adapting the scoring function to
* differentiate between equal scores.
*
*/
const TUBE_STATIONS = require('./tube_stations');
const ALPHABET = 'abcdefghijklmnopqrstuvwxyz';
const extractCharacterSet = term => new Set(term.toLowerCase().replace(/[^a-z]/g, ''));
/**
* Implementation of the first approach:
* - Extract characters into a set and sort terms in the corpus by the number
* of unique characters
* - Extract the first one and resort the remaining corpus ignoring the
* characters that have already been found.
* - Repeat until we have seen all the characters of the alphabet.
*
* This can be summed up as extracting terms based on a scoring / weight
* system where the score is adapted after each term has been extracted.
*
* [WARNING] This first solution does not yield an optimal result, as it is
* biased towards longer words and will result in a state where we have
* eliminated the most common characters and the remaining terms only contain
* 1 unseen character (usually rarer ones, for instance x, q, z, which are
* unlikely to be present in the same string).
*
* @param {string[]} corpus
* @returns string[]
*/
function findMinimalSubset(corpus) {
const seenCharacters = new Set();
const subset = [];
const transformed = corpus.map(term => [term, extractCharacterSet(term)]);
transformed.sort(([, s1], [, s2]) => s1.size - s2.size);
while (seenCharacters.size < ALPHABET.length && transformed.length > 0) {
const [term, set] = transformed.pop();
for (const char of set) seenCharacters.add(char);
subset.push(term);
transformed.forEach(([, set_]) => {
for (const char of seenCharacters) set_.delete(char);
});
transformed.sort(([, s1], [, s2]) => s1.size - s2.size);
}
if (seenCharacters.size != ALPHABET.length) {
throw new Error('Cannot cover the alphabet...');
}
return subset;
}
const res1 = findMinimalSubset(TUBE_STATIONS);
console.log(`- Solution 1: found ${res1.length} terms => ${res1.join(', ')}`);
/**
* Alternate solution: instead of using the count of unique characters,
* we weigh the terms according to the frequency of letters where a rare
* letter like Z counts for less than a common one like A.
* (Mentioned at by interviewer)
*
* This solution is more optimal than the previous one by shifting the bias
* towards rarer words.
*
* @param {string[]} corpus
* @returns string[]
*/
function findMinimalSubsetFrequencyStrategy(corpus) {
const seenCharacters = new Set();
const subset = [];
const occurences = {};
const transformed = corpus.map(term => [term, extractCharacterSet(term)]);
// Pre-populate characters counts
transformed.forEach(([, set]) => {
for (const char of set) {
occurences[char] = (occurences[char] || 0) + 1;
}
});
const weight = set => Array.from(set).reduce((sum, char) => {
const shouldBeCounted = !seenCharacters.has(char) && occurences[char];
return sum + (shouldBeCounted ? 1 / occurences[char] : 0);
}, 0);
transformed.sort(([, s1], [, s2]) => weight(s1) - weight(s2));
while (seenCharacters.size < ALPHABET.length && transformed.length > 0) {
const [term, set] = transformed.pop();
for (const char of set) seenCharacters.add(char);
subset.push(term);
transformed.sort(([, s1], [, s2]) => weight(s1) - weight(s2));
}
if (seenCharacters.size != ALPHABET.length) {
throw new Error('Cannot cover the alphabet...');
}
return subset;
}
const res2 = findMinimalSubsetFrequencyStrategy(TUBE_STATIONS);
console.log(`- Solution 2: found ${res2.length} terms => ${res2.join(', ')}`);
/**
* Check the validity of the result by brute forcing the problem: look at
* all combinations of a given length and check if any of them contains the
* full alphabet.
*
* [WARNING] This has bad performance and is just here as a brute force
* verifier which has not been optimised. May even run out of memory depending
* on the machine as there are 4728720 3-combinations of 306 and 358200540
* 4-combinations. (Hence this would unlikely be a valid solution for
* production). In fact the 4-combinations one took so long that we sped
* it up by extracting item 21 of TUBE_STATIONS which is 'Belsize Park' and
* should intuitively be in any solution of the problem as it is the only one
* containing the letter 'z'.
*
* @param {string[]} corpus
* @param {number} subsetLength
*/
function bruteForceConfirmation(subsetLength) {
console.log('Brute force testing for subset length', subsetLength);
console.log('Working ...');
const extracted = [...TUBE_STATIONS.slice(0, 21), ...TUBE_STATIONS.slice(0, 22)]
combinations = kCombinations(TUBE_STATIONS.map((x, i) => i), subsetLength - 1);
for (let index = 0; index < combinations.length; index++) {
const set = new Set([21, ...combinations[index]].map(i => TUBE_STATIONS[i]).reduce((chars, term) => {
return chars + term.toLowerCase().replace(/[^a-z]/g, '');
}, ''));
if (set.size === ALPHABET.length) {
console.log('Found one solution for subset length', subsetLength, combinations[index]);
return true;
}
}
console.log('Found no solution for subset length', subsetLength);
return false;
}
bruteForceConfirmation(2);
bruteForceConfirmation(3);
bruteForceConfirmation(4);
/**
* Simple k-combination generator, no performance guarantees.
* - Assumes there are no duplicates in the source list
* - Normally would use a library, but let's keep this simple
*
* @param {string[]} source
* @param {number} k
* @returns {string[][]}
*/
function kCombinations(source, k) {
// Easy escape hatches.
if (k > source.length || k <= 0) return [];
if (k === 1) return source.map(term => [term]);
if (k === source.length) return [source];
const combinations = [];
for (let index = 0; index < source.length - k + 1; index++) {
kCombinations(source.slice(index + 1), k - 1).forEach(c => combinations.push([source[index], ...c]));
}
return combinations;
}
// List of tube stations taken from Wikipedia
module.exports = [
'Acton Town', 'Aldgate', 'Aldgate East', 'All Saints', 'Alperton', 'Amersham', 'Angel',
'Archway', 'Arnos Grove', 'Arsenal', 'Baker Street', 'Balham', 'Bank', 'Barbican', 'Barking',
'Barkingside', 'Barons Court', 'Bayswater', 'Beckton', 'Beckton Park', 'Becontree',
'Belsize Park', 'Bermondsey', 'Bethnal Green', 'Blackfriars', 'Blackhorse Road', 'Blackwall',
'Bond Street', 'Borough', 'Boston Manor', 'Bounds Green', 'Bow Church', 'Bow Road', 'Brent Cross',
'Brixton', 'Bromley-by-Bow', 'Buckhurst Hill', 'Burnt Oak', 'Caledonian Road', 'Camden Town',
'Canada Water', 'Canary Wharf', 'Canary Wharf', 'Canning Town', 'Cannon Street', 'Canons Park',
'Chalfont & Latimer', 'Chalk Farm', 'Chancery Lane', 'Charing Cross', 'Chesham', 'Chigwell',
'Chiswick Park', 'Chorleywood', 'Clapham Common', 'Clapham North', 'Clapham South', 'Cockfosters',
'Colindale', 'Colliers Wood', 'Covent Garden', 'Crossharbour', 'Croxley', 'Custom House',
'Cutty Sark for Maritime Greenwich', 'Cyprus', 'Dagenham East', 'Dagenham Heathway', 'Debden',
'Deptford Bridge', 'Devons Road', 'Dollis Hill', 'Ealing Broadway', 'Ealing Common',
'Earl\'s Court', 'East Acton', 'East Finchley', 'East Ham', 'East India', 'East Putney',
'Eastcote', 'Edgware', 'Edgware Road', 'Edgware Road', 'Elephant & Castle', 'Elm Park',
'Elverson Road', 'Embankment', 'Epping', 'Euston', 'Euston Square', 'Fairlop', 'Farringdon',
'Finchley Central', 'Finchley Road', 'Finsbury Park', 'Fulham Broadway', 'Gallions Reach',
'Gants Hill', 'Gloucester Road', 'Golders Green', 'Goldhawk Road', 'Goodge Street', 'Grange Hill',
'Great Portland Street', 'Greenford', 'Green Park', 'Greenwich', 'Gunnersbury', 'Hainault',
'Hammersmith', 'Hammersmith', 'Hampstead', 'Hanger Lane', 'Harlesden', 'Harrow & Wealdstone',
'Harrow-on-the-Hill', 'Hatton Cross', 'Heathrow Terminals 1, 2, 3', 'Heathrow Terminal 4',
'Heathrow Terminal 5', 'Hendon Central', 'Heron Quays', 'High Barnet', 'Highbury & Islington',
'Highgate', 'High Street Kensington', 'Hillingdon', 'Holborn', 'Holland Park', 'Holloway Road',
'Hornchurch', 'Hounslow Central', 'Hounslow East', 'Hounslow West', 'Hyde Park Corner',
'Ickenham', 'Island Gardens', 'Kennington', 'Kensal Green', 'Kensington (Olympia)',
'Kentish Town', 'Kenton', 'Kew Gardens', 'Kilburn', 'Kilburn Park', 'King George V',
'Kingsbury', 'King\'s Cross St. Pancras', 'Knightsbridge', 'Ladbroke Grove', 'Lambeth North',
'Lancaster Gate', 'Langdon Park', 'Latimer Road', 'Leicester Square', 'Lewisham', 'Leyton',
'Leytonstone', 'Limehouse', 'Liverpool Street', 'London Bridge', 'London City Airport',
'Loughton', 'Maida Vale', 'Manor House', 'Mansion House', 'Marble Arch', 'Marylebone', 'Mile End',
'Mill Hill East', 'Monument', 'Moorgate', 'Moor Park', 'Morden', 'Mornington Crescent',
'Mudchute', 'Neasden', 'Newbury Park', 'North Acton', 'North Ealing', 'North Greenwich',
'North Harrow', 'North Wembley', 'Northfields', 'Northolt', 'Northwick Park', 'Northwood',
'Northwood Hills', 'Notting Hill Gate', 'Oakwood', 'Old Street', 'Osterley', 'Oval',
'Oxford Circus', 'Paddington', 'Park Royal', 'Parsons Green', 'Perivale', 'Piccadilly Circus',
'Pimlico', 'Pinner', 'Plaistow', 'Pontoon Dock', 'Poplar', 'Preston Road', 'Prince Regent',
'Pudding Mill Lane', 'Putney Bridge', 'Queen\'s Park', 'Queensbury', 'Queensway',
'Ravenscourt Park', 'Rayners Lane', 'Redbridge', 'Regent\'s Park', 'Richmond', 'Rickmansworth',
'Roding Valley', 'Royal Albert', 'Royal Oak', 'Royal Victoria', 'Ruislip', 'Ruislip Gardens',
'Ruislip Manor', 'Russell Square', 'St. James\'s Park', 'St. John\'s Wood', 'St. Paul\'s',
'Seven Sisters', 'Shadwell', 'Shepherd\'s Bush', 'Shepherd\'s Bush Market', 'Sloane Square',
'Snaresbrook', 'South Ealing', 'South Harrow', 'South Kensington', 'South Kenton', 'South Quay',
'South Ruislip', 'South Wimbledon', 'South Woodford', 'Southfields', 'Southgate', 'Southwark',
'Stamford Brook', 'Stanmore', 'Stepney Green', 'Stockwell', 'Stonebridge Park', 'Stratford',
'Sudbury Hill', 'Sudbury Town', 'Swiss Cottage', 'Temple', 'Theydon Bois', 'Tooting Bec',
'Tooting Broadway', 'Tottenham Court Road', 'Tottenham Hale', 'Totteridge & Whetstone',
'Tower Gateway', 'Tower Hill', 'Tufnell Park', 'Turnham Green', 'Turnpike Lane', 'Upminster',
'Upminster Bridge', 'Upney', 'Upton Park', 'Uxbridge', 'Vauxhall', 'Victoria',
'Walthamstow Central', 'Wanstead', 'Warren Street', 'Warwick Avenue', 'Waterloo', 'Watford',
'Wembley Central', 'Wembley Park', 'West Acton', 'West Brompton', 'West Finchley', 'West Ham',
'West Hampstead', 'West Harrow', 'West India Quay', 'West Kensington', 'West Ruislip',
'West Silvertown', 'Westbourne Park', 'Westferry', 'Westminster', 'White City', 'Whitechapel',
'Willesden Green', 'Willesden Junction', 'Wimbledon', 'Wimbledon Park', 'Wood Green', 'Wood Lane',
'Woodford', 'Woodside Park', 'Woolwich Arsenal'
];
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment