valyala/word_chains_solution.cpp

## word_chains_solution.cpp
// This is a solution for the Word Chains problem described
// at http://socialcam.com/jobs/problems .
//
// The problem.
//
// Two words are connected by a word chain if it is possible to change one into
// the other by making a series of single-character changes, such that every
// intermediate form is also a word. For example, CAT and DOG are connected with
// a word chain because CAT, COT, COG and DOG are all words. DEMONIC and
// UMBRELLA are not.
//
// Write a program that takes a list of words (for example /usr/share/dict/words
// on a unix system) and then reads pairs of words on stdin and prints 'YES' if
// the words are connected by a chain, and 'NO' if they are not. The program
// should take the path to the word list as a command-line argument, and should
// then loop, reading pairs of whitespace-delimited words on stdin and printing
// 'YES' or 'NO.'
//
// Only one operation is allowed between words in the chain. That operation may
// consist of changing any single character, deleting any single character
// or inserting any single character at any position. All comparisons should be
// case insensitive.
//
//
// The solution.
//
// The core concept of the solution is subword - a tuple of a word's substring
// and a character position. Word's substring is created from a word by deleting
// a character at the given position. It is possible creating L distinct
// subwords from a word with length L. Word's substring is always one character
// shorter than the corresponding word. Distinct subwords can have equal
// substrings if they differ by character position.
// For example, ('oo', 0), ('fo', 1), ('fo', 2) are all possible subwords
// for the word 'foo'.
//
// If two words have a common subword, then one word can be converted into
// another word by character substitution at the given position.
// If a word matches a subword's substring, then this word can be converted
// into subword's word by character deletion at the given position. Character
// deletion is equivalent to character insertion for the matching word.
// Obviously subwords can be used for building all possible one-character
// transformations for dictionary words irregardless of alphabet used in words.
//
// The solution is split into two phases:
//
// 1. Dictionary words preprocessing.
// 2. Input words lookup.
//
// The first phase has the following steps:
//
// 1. Read all dictionary words into memory.
//    This step has O(W) computational complexity and O(S) space complexity,
//    where:
//    - W is the total number of dictionary words;
//    - S is the total number of chars in all dictionary words.
//
// 2. For each word from the dictionary create all possible subwords. If word's
//    subword matches already existing subword, then connect the corresponding
//    words.
//    This step has O(S) computational complexity and O(S*E1) space coplexity,
//    where:
//    - E1 is the average number of connections found per each subword.
//
// 3. For each word from the dictionary find all subwords with substrings
//    matching the given word. Connect the word to found subwords' words.
//    This step has O(W) computational complexity and O(W*E2) space complexity,
//    where:
//    - E2 is the average number of connections found per each word.
//
// 4. Step 3 creates disconnected set of connected graphs from words.
//    Each connected graph in this set represents all the words, which can be
//    transformed from each other. If words belong to distinct connected graphs,
//    then the transformation is impossible. Let's assign distinct colors
//    for each connected graph and brush each word in each graph with
//    the corresponding color.
//    This step has O(W) computational complexity and O(W) space complexity.
//
// 5. Create a lookup table, which maps from word to color.
//    This step has O(W) computational complexity and O(S) space complexity.
//    Subwords can be deleted after this step.
//
// The second phase consists of one step:
//
// 1. For each word pair verify whether two given words belong to the same
//    graph by comparing their colors obtained from the lookup table.
//    This step has O(P) computational complexity and O(1) space complexity,
//    where:
//    - P is the total number of word pairs.
//
// For words from natural languages E1 and E2 can be considered constants.
// In this case computational complexity of the first phase is O(S).
// Storage required per each word from a natural language can be considered
// constant, so the solution requires O(S) space.
//
// It is worth noting algorithm's computational complexity doesn't depend
// on the size of alphabet used in dictionary words. I.e. words containing
// arbitrary unicode characters will be processed with the same speed as words
// containing only Latin chars. (Though uncode support is missing
// in the current implementation).
//
// This program reads 99K words from /usr/share/dict/words in 0.24 seconds
// on my laptop with the speed of ~413K words/s. Steps 1-4 require 16Mb of RAM
// (162 bytes per word) on 64-bit build. 32-bit build consumes 15Mb of RAM
// (152 bytes per word).
// Step 5 requires only a few Mb of RAM for storing a mapping between dictionary
// words and colors.
// After the dictionary load complete, it processes 50K word pairs from the same
// /usr/share/dict/words file in 0.09 seconds (i.e. ~556K word pairs/s).
//
// The program uses the following optimization tricks, which help increasing
// performance and decreasing memory usage:
//
// - Custom optimized containers are used instead of generic STL containers.
// - Memory for already known quantity of structures is allocated in bulk.
// - Small integer types are used in structs if possible instead of generic wide
//   types.
// - Pointers are substituted by offsets from base address. Offsets have smaller
//   sizes comparing to pointers (especially on x64 builds).
// - Access patterns to memory in loops is close to linear.
// - Unions are used for packing mutually exclusive objects into the same
//   location.
// - Objects are stored by value in data containers. This saves memory
//   by eliminating a pointer to value from data container and eliminates
//   a memory dereference during access to such values.
// - Unneeded memory copies are avoided where possible.
// - Pragma pack() directive is used for tight structures' packing.
//
// Author: Aliaksandr Valialkin <valyala@gmail.com>.


#include <cassert>    // for assert()
#include <cctype>     // for tolower(), isspace()
#include <cstddef>    // for size_t
#include <cstdlib>    // for exit(), EXIT_SUCCESS/EXIT_FAILURE
#include <fstream>    // for ifstream
#include <iostream>   // for cin/cout/cerr
#include <istream>
#include <ostream>
#include <stdint.h>   // for uint*_t


// Denies copy constructor and assignment operator for the given class.
// These operations are implicitly implemented by the compiler unless
// explicilty declared. Sometimes implicit implementation of these operations
// won't match your needs. For example, if a class contains pointers to objects
// it owns, implicit copying will lead to use after free errors.
//
// This macro MUST be placed only at the end of class definition!
#define DENY_COPYING(classname)           \
  private:                                \
    classname(const classname &);         \
    void operator = (const classname &);


// It's unclear how to obtain SIZE_MAX in C++. So define it here.
#ifndef SIZE_MAX
#  define SIZE_MAX ((size_t)-1)
#endif


using namespace std;


namespace
{

// Debug stream.
//
// This stream is completely eliminated if NDEBUG macro is present, so feel free
// scattering debug messages in the code wherever you want. This won't hurt
// performance in optimized builds.
class debug_ostream
{
private:
  ostream &_output;

public:
  explicit debug_ostream(ostream &output) : _output(output) {}

  debug_ostream &operator << (ostream &(*const manipulator)(ostream &))
  {
    (void)manipulator;
#ifndef NDEBUG
    manipulator(_output);
#endif
    return *this;
  }

  template <class T>
  debug_ostream &operator << (const T &v)
  {
    (void)v;
#ifndef NDEBUG
    _output << v;
#endif
    return *this;
  }
};

// Debug stream initialization.
debug_ostream cdbg(cerr);


// Calculates the number of bytes required for storing the value N.
template <size_t N> struct bytes
{
  static const size_t n = 1 + bytes<(N/256)>::n;
};

template<> struct bytes<0>
{
  static const size_t n = 0;
};

// Determines a type suitable for holding the given number of bytes.
template <size_t Bytes> struct size_type;
template<> struct size_type<1> { typedef uint8_t t; };
template<> struct size_type<2> { typedef uint16_t t; };


// Moves object's contents from *src to *dst.
// It differs from assignment operator in two aspects:
// - dst contents is undefined before the move, so there is no need in freeing
//   up dst resources during the move.
// - src contents can be modified during the move, so it can be left in
//   the state, which consumes minimal CPU at destruction time.
//
// These aspects allow writing fast move() specializations.
template <class ValueType>
void move(ValueType *const dst, ValueType *const src)
{
  // This move() implementation is suitable only for POD types.
  // Non-POD types must provide own move() specialization.
  *dst = *src;
}


// Stack implementation optimized for performance and memory usage.
// It packs multiple items in chunks up to ChunkSize items per chunk.
// This improves memory locality and shaves off memory required for 'next'
// pointers comparing to canonical list implementations.
//
// See http://en.wikipedia.org/wiki/Unrolled_linked_list for details.
//
// The stack owns items pushed into it. It uses move() template function
// for moving items from/to the stack.
#pragma pack(push, 1)
template <class ValueType, size_t ChunkSize>
class fast_stack
{
private:
  typedef typename size_type<bytes<ChunkSize>::n>::t chunk_size_t;

  static const chunk_size_t _MAX_CHUNK_SIZE = ((chunk_size_t)-1);

  // Pointer to the next stack chunk.
  fast_stack *_next;

  // Space for items in the current chunk.
  char _v[sizeof(ValueType) * ChunkSize];

  // The number of items in the current chunk.
  chunk_size_t _chunk_size;

  ValueType *_get(const size_t n) const
  {
    assert(_chunk_size <= ChunkSize);
    assert(n < _chunk_size);

    return ((ValueType *)_v) + n;
  }

  fast_stack(fast_stack *const next, ValueType *const v) :
      _next(next), _chunk_size(1)
  {
    assert(ChunkSize > 0);
    assert(ChunkSize <= _MAX_CHUNK_SIZE);

    cdbg << "Allocated stack chunk for " << ChunkSize << " items" << endl;
    move(_get(0), v);
  }

  ~fast_stack()
  {
    for (size_t i = 0; i < _chunk_size; ++i) {
      _get(i)->~ValueType();
    }

    cdbg << "Released stack chunk for " << ChunkSize << " items" << endl;
  }

public:
  // Moves the given item onto the stack using move() template function.
  // May update a pointer to the stack top.
  static void push(fast_stack **const top_p, ValueType *const v)
  {
    assert(top_p != 0);
    fast_stack *top = *top_p;

    if (top == 0 || top->_chunk_size == ChunkSize) {
      *top_p = new fast_stack(*top_p, v);
    }
    else {
      assert(top->_chunk_size < ChunkSize);
      ++(top->_chunk_size);
      move(top->_get(top->_chunk_size - 1), v);
    }
  }

  // Pops the item from the top of the stack to *to_v using move() template
  // function. May update a pointer to the stack top.
  static void pop(fast_stack **const top_p, ValueType *const to_v)
  {
    assert(top_p != 0);
    fast_stack *top = *top_p;
    assert(top != 0);
    assert(top->_chunk_size > 0);

    move(to_v, top->_get(top->_chunk_size - 1));
    --(top->_chunk_size);
    if (top->_chunk_size == 0) {
      *top_p = top->_next;
      delete top;
    }
  }

  // Moves items from the array to the stack using move() template function.
  // Returns the top of the created stack.
  static fast_stack *move_from_array(ValueType *begin, const size_t n)
  {
    assert(begin != 0);

    const ValueType *const end = begin + n;
    fast_stack *top = 0;
    while (begin != end) {
      push(&top, begin);
      ++begin;
    }
    return top;
  }

  // Destroys the given stack.
  static void destroy(fast_stack *top)
  {
    while (top != 0) {
      fast_stack *next = top->_next;
      delete top;
      top = next;
    }
  }

  // Calls visitor for each item on the stack.
  // The order of visited items is unspecified.
  template <class Visitor>
  static void for_each(const fast_stack *top, Visitor *const visitor)
  {
    while (top != 0) {
      assert(top->_chunk_size > 0);
      for (size_t i = 0; i < top->_chunk_size; ++i) {
        ValueType *const v = top->_get(i);
        (*visitor)(v);
      }
      top = top->_next;
    }
  }

  template <class Predicate>
  static ValueType *find_if(const fast_stack *top, Predicate *const predicate)
  {
    while (top != 0) {
      assert(top->_chunk_size > 0);
      for (size_t i = 0; i < top->_chunk_size; ++i) {
        ValueType *const v = top->_get(i);
        if ((*predicate)(const_cast<const ValueType *const>(v))) {
          return v;
        }
      }
      top = top->_next;
    };
    return 0;
  }

  DENY_COPYING(fast_stack)
};
#pragma pack(pop)


// A list optimized for storage space.
// It stores up to InlineChunkSize values in the object memory itself,
// then falls back to dynamically allocated storage.
//
// The list owns added items. It uses move() template function for moving
// items into the list.
#pragma pack(push, 1)
template <class ValueType, size_t InlineChunkSize, size_t ExternalChunkSize>
class compact_list
{
private:

  // This type must be able handling up to InlineChunkSize + 1 distinct integer
  // values. The (InlineChunkSize + 1) value is used as a flag indicating
  // external storage is used instead of inline storage.
  typedef typename size_type<bytes<InlineChunkSize+1>::n>::t
      inline_storage_size_t;

  typedef fast_stack<ValueType, ExternalChunkSize> external_storage_t;

  static const size_t _MAX_INLINE_STORAGE_SIZE = ((inline_storage_size_t)-1);

  union {
    char inline_storage[sizeof(ValueType) * InlineChunkSize];
    external_storage_t *external_storage;
  } _u;

  inline_storage_size_t _inline_storage_size;

  ValueType *_get_from_inline_storage(const size_t n) const
  {
    assert(n < _inline_storage_size);
    return ((ValueType *)_u.inline_storage) + n;
  }

public:
  compact_list() : _inline_storage_size(0)
  {
    assert(InlineChunkSize > 0);
    assert(InlineChunkSize < _MAX_INLINE_STORAGE_SIZE);
  }

  ~compact_list()
  {
    if (_inline_storage_size <= InlineChunkSize) {
      for (size_t i = 0; i < _inline_storage_size; i++) {
        _get_from_inline_storage(i)->~ValueType();
      }
    }
    else {
      assert(_u.external_storage != 0);
      external_storage_t::destroy(_u.external_storage);
    }
  }

  // Moves the given item into the list using move() template function.
  void add(ValueType *const v)
  {
    if (_inline_storage_size < InlineChunkSize) {
      assert(_inline_storage_size < _MAX_INLINE_STORAGE_SIZE);
      ++_inline_storage_size;
      move(_get_from_inline_storage(_inline_storage_size - 1), v);
    }
    else {
      if (_inline_storage_size == InlineChunkSize) {
        cdbg << "Switching from inline to external storage, because list " <<
            "size is greater than " << InlineChunkSize << endl;
        assert(_inline_storage_size < _MAX_INLINE_STORAGE_SIZE);
        ++_inline_storage_size;
        _u.external_storage = external_storage_t::move_from_array(
            _get_from_inline_storage(0), InlineChunkSize);
      }

      assert(_u.external_storage != 0);
      external_storage_t::push(&_u.external_storage, v);
    }
  }

  // Calls visitor for each item on the list.
  // The order of visited items is unspecified.
  template <class Visitor>
  void for_each(Visitor *const visitor) const
  {
    if (_inline_storage_size <= InlineChunkSize) {
      for (size_t i = 0; i < _inline_storage_size; ++i) {
        ValueType *const v = _get_from_inline_storage(i);
        (*visitor)(v);
      }
    }
    else {
      assert(_u.external_storage != 0);
      external_storage_t::for_each(_u.external_storage, visitor);
    }
  }

  template <class Predicate>
  ValueType *find_if(Predicate *const predicate) const
  {
    if (_inline_storage_size <= InlineChunkSize) {
      for (size_t i = 0; i < _inline_storage_size; ++i) {
        ValueType *const v = _get_from_inline_storage(i);
        if ((*predicate)(const_cast<const ValueType *const>(v))) {
          return v;
        }
      }
      return 0;
    }

    assert(_u.external_storage != 0);
    return external_storage_t::find_if(_u.external_storage, predicate);
  }

  DENY_COPYING(compact_list)
};
#pragma pack(pop)


// Simple hash table, which uses less memory comparing to sets and maps
// provided by STL.
//
// ValueType must implement equal_to() and hash() methods.
// Hash table owns objects put into it. It uses move() template function
// for object's movement. Objects put into hash table shouldn't
// modify hash() result.
template <class ValueType, size_t OptimalBucketSize>
class simple_hashtable
{
public:
  typedef compact_list<ValueType, OptimalBucketSize, OptimalBucketSize * 2>
      bucket_t;

private:
  const size_t _buckets_count;
  bucket_t *const _buckets;

  template <class Visitor>
  class _value_visitor
  {
  private:
    Visitor &_visitor;

  public:
    explicit _value_visitor(Visitor &visitor) : _visitor(visitor) {}

    void operator () (ValueType *const v) const
    {
      _visitor(v);
    }
  };

  class _value_finder
  {
  private:
    const ValueType *const _v;
    const simple_hashtable *const _htb;

  public:
    _value_finder(const ValueType *const v,
        const simple_hashtable *const htb) : _v(v), _htb(htb) {}

    bool operator () (const ValueType *const v) const
    {
      assert(_htb->get_bucket(_v) == _htb->get_bucket(v));
      return _v->equal_to(v);
    }
  };

  ValueType *_find(const bucket_t *const bucket, const ValueType *const v) const
  {
    assert(bucket == get_bucket(v));

    const _value_finder vf(v, this);
    return bucket->find_if(&vf);
  }

public:
  explicit simple_hashtable(const size_t buckets_count) :
      _buckets_count(buckets_count), _buckets(new bucket_t[buckets_count])
  {
    assert(_buckets_count > 0);

    cdbg << "*** Allocated " << (_buckets_count * sizeof(_buckets[0])) <<
        " bytes for " << _buckets_count << " hashtable buckets" << endl;
  }

  ~simple_hashtable()
  {
    assert(_buckets != 0);

    delete[] _buckets;

    cdbg << "*** Deleted " << (_buckets_count * sizeof(_buckets[0])) <<
        " bytes allocated for " << _buckets_count << " hashtable buckets" <<
        endl;
  }

  bucket_t *get_bucket(const ValueType *const v) const
  {
    assert(_buckets_count > 0);
    assert(_buckets != 0);

    // TODO: use something faster than modulo operation?
    size_t n = v->hash() % _buckets_count;
    return _buckets + n;
  }

  // Moves the given item into hashtable using move() template function.
  // Returns 0 on success or a pointer to already existing equivalent
  // item on failure.
  ValueType *insert(ValueType *const v)
  {
    bucket_t *const bucket = get_bucket(v);
    ValueType *const tmp = _find(bucket, v);
    if (tmp == 0) {
      bucket->add(v);
    }
    else {
      assert(v->equal_to(tmp));
    }
    return tmp;
  }

  // Returns a pointer to an item with value equal to the given value.
  // Returns 0 if there is no item with the given value.
  ValueType *find(const ValueType *const v) const
  {
    return _find(get_bucket(v), v);
  }

  // Call visitor for each item in the hashtable.
  // The order of visited items is unspecified.
  template <class Visitor>
  void for_each(Visitor *const visitor) const
  {
    assert(_buckets != 0);

    const _value_visitor<Visitor> vv(visitor);
    for (size_t i = 0; i < _buckets_count; ++i) {
      _buckets[i].for_each(vv);
    }
  }

  DENY_COPYING(simple_hashtable)
};


// Jenkin's hash parts.
//
// See http://en.wikipedia.org/wiki/Jenkins_hash_function .
// C++ doesn't provide decent hash implementations, so we have to implement
// it here.
struct jenkins_hash
{
  static uint32_t mix(uint32_t h, const char c)
  {
    h += c;
    h += (h << 10);
    h ^= (h >> 6);
    return h;
  }

  static uint32_t final(uint32_t h)
  {
    h += (h << 3);
    h ^= (h >> 11);
    h += (h << 15);
    return h;
  }
};


// Tiny and fast string implementation.
// It builds strings on top of the provided buffer.
// It doesn't allocate additional memory.
#pragma pack(push, 1)
class tiny_string
{
public:

  // Maximum string size value must fit this type.
  // This type must be unsigned.
  typedef uint8_t ts_size_t;

private:

  // Buffer size value must fit this type.
  // This type must be unsigned.
  typedef uint32_t offset_t;

  static const offset_t _MAX_STRING_SIZE = ((ts_size_t)-1);
  static const offset_t _MAX_INLINE_STRING_SIZE = sizeof(offset_t);
  static const offset_t _MAX_BUF_SIZE = ((offset_t)-2);
  static const offset_t _TMP_STRING_OFFSET = ((offset_t)-1);

  union {
    // Offset of the string in the buffer.
    offset_t offset;

    // Inline string allows eliminating additional memory dereference.
    char inline_string[_MAX_INLINE_STRING_SIZE];
  } _u;

  // The size of the string.
  // Strings shorter than (_MAX_INLINE_STRING_SIZE + 1) are stored
  // in _u.inline_string.
  ts_size_t _size;

  static char *_buf;
  static offset_t _buf_size;
  static offset_t _next_offset;

  // Temporary buffer is used for holding a string read from input stream.
  static char _tmp_buf[_MAX_STRING_SIZE];

  void _verify() const
  {
    assert(_buf != 0);
    assert(_next_offset <= _buf_size);
    if (_size > _MAX_INLINE_STRING_SIZE) {
      if (_u.offset != _TMP_STRING_OFFSET) {
        assert(_u.offset <= _next_offset - _size);
      }
    }
  }

  static void _set_string(tiny_string *const ts, const offset_t offset,
      const ts_size_t size)
  {
    if (size <= _MAX_INLINE_STRING_SIZE) {
      const char *const src = (offset != _TMP_STRING_OFFSET) ? (_buf + offset) : _tmp_buf;
      char *const dst = ts->_u.inline_string;

      // This loop is faster than memcpy().
      for (size_t i = 0; i < size; ++i) {
        dst[i] = src[i];
      }
    }
    else {
      ts->_u.offset = offset;
    }
    ts->_size = size;
  }

public:

  // Sets the buffer, which will be used for strings building.
  // The buffer must contain whitespace-delimited strings.
  static void set_buffer(char *const buf, const size_t buf_size)
  {
    assert(_buf == 0);
    assert(_MAX_BUF_SIZE <= SIZE_MAX);

    if (buf_size > _MAX_BUF_SIZE) {
      cerr << "Cannot reserve " << buf_size << " chars for tiny strings." <<
          "Widen tiny_string::offset_t for handling more chars" << endl;
      ::exit(EXIT_FAILURE);
    }

    _buf = buf;
    assert((offset_t) buf_size == buf_size);
    _buf_size = buf_size;
    _next_offset = 0;

    cdbg << "*** Set a buffer for holding " << _buf_size <<
        " tiny string chars" << endl;
  }

  // Builds the next string from the underlying buffer.
  // Builds the string with zero length if the end of the buffer reached.
  static void get_next_string(tiny_string *const ts)
  {
    assert(_buf != 0);
    assert(_next_offset <= _buf_size);

    // skip leading whitespace
    char c = 0;
    while (_next_offset < _buf_size) {
      c = _buf[_next_offset++];
      if (!::isspace(c)) {
        break;
      }
    }

    if (_next_offset == _buf_size) {
      // end of buffer reached
      _set_string(ts, _TMP_STRING_OFFSET, 0);
      return;
    }

    // lowercase string contents.
    const offset_t s_offset = _next_offset - 1;
    assert(!::isspace(c));
    do {
      _buf[_next_offset - 1] = ::tolower(c);
      c = _buf[_next_offset++];
    } while (_next_offset < _buf_size && !::isspace(c));

    // put trailing character into string.
    if (!::isspace(c)) {
      _buf[_next_offset - 1] = ::tolower(c);
    }

    // calculate string size
    assert(s_offset + 1 < _next_offset);
    const offset_t s_size = _next_offset - s_offset - 1;

    if (s_size > _MAX_STRING_SIZE) {
      cerr << "Input string [";
      cerr.write(_buf + s_offset, s_size);
      cerr << "] containing " << s_size << " chars is longer than " <<
          "the allowed maximum " << _MAX_STRING_SIZE << " chars. Widen " <<
          "tiny_string::ts_size_t for handling longer strings" << endl;
      ::exit(EXIT_FAILURE);
    }

    assert((ts_size_t) s_size == s_size);
    _set_string(ts, s_offset, s_size);
  }

  // Reads the string from input stream.
  // The string is read into a static temporary buffer, so the next call
  // to read() overwrites the previous string in the temporary buffer.
  static void read(istream &input, tiny_string *const ts)
  {
    // skip leading whitespace
    int c;
    do {
      c = input.get();
    } while (::isspace(c));

    if (!input.good()) {
      // end of input stream
      _set_string(ts, _TMP_STRING_OFFSET, 0);
      return;
    }

    // read and lowercase string contents.
    offset_t s_size = 0;
    assert(!::isspace(c));
    do {
      _tmp_buf[s_size++] = ::tolower(c);
      c = input.get();
    } while (s_size < _MAX_STRING_SIZE && !::isspace(c) && input.good());

    if (!::isspace(c) && input.good()) {
      assert(s_size == _MAX_STRING_SIZE);
      cerr << "Tiny string [";
      cerr.write(_tmp_buf, _MAX_STRING_SIZE);
      cerr << "]... cannot be longer than " << _MAX_STRING_SIZE << " chars" <<
          endl;
      ::exit(EXIT_FAILURE);
    }

    _set_string(ts, _TMP_STRING_OFFSET, s_size);
  }

  size_t size() const
  {
    _verify();
    return _size;
  }

  const char *data() const
  {
    _verify();
    if (_size <= _MAX_INLINE_STRING_SIZE) {
      return _u.inline_string;
    }
    return (_u.offset != _TMP_STRING_OFFSET) ? (_buf + _u.offset) : _tmp_buf;
  }

  bool equal_to(const tiny_string *const ts) const
  {
    if (_size != ts->_size) {
      return false;
    }

    const char *const s1 = data();
    const char *const s2 = ts->data();

    for (ts_size_t i = 0; i < _size; ++i) {
      if (s1[i] != s2[i]) {
        return false;
      }
    }

    return true;
  }

  uint32_t hash() const
  {
    uint32_t h = 0;
    const char *const s = data();

    for (ts_size_t i = 0; i < _size; ++i) {
      h = jenkins_hash::mix(h, s[i]);
    }

    return jenkins_hash::final(h);
  }

#ifndef NDEBUG
  void debug_print(ostream &output) const
  {
    output << "[";
    output.write(data(), size());
    output << "]";
  }
#endif
};
#pragma pack(pop)

char *tiny_string::_buf;
tiny_string::offset_t tiny_string::_buf_size;
tiny_string::offset_t tiny_string::_next_offset;

char tiny_string::_tmp_buf[tiny_string::_MAX_STRING_SIZE];

#ifndef NDEBUG
ostream &operator << (ostream &output, const tiny_string &ts)
{
  ts.debug_print(output);
  return output;
}
#endif


// This class holds information related to a dictionary word:
// - A list of words, which can be transformed into the word via a single
//   character substitution, character insertion or character deletion.
// - A color of the word. Initially all the words have no color (zero value).
//   Words are colorified via graph traversing after all dictionary words
//   are populated and all word connections are built.
#pragma pack(push, 1)
class word
{
public:
  // The type of word index.
  // Widen it in order to handle more words.
  // Must be unsigned.
  typedef uint32_t idx_t;

  // The type of word color.
  // Widen it in order to handle more colors.
  // Must be unsigned.
  typedef uint16_t color_t;

  // Null index is used solely for performance optimization purposes.
  static const idx_t NULL_IDX;

  // A stack of words. It is used for traversing graphs of connected words
  // during words brushing.
  // Use large stack chunks for performance reasons.
  typedef fast_stack<idx_t, 4000> stack_t;

private:
  static word *_words;
  static idx_t _words_count;
  static idx_t _next_idx;

  // ((idx_t)-1) is already occupied by NULL_IDX.
  static const idx_t _MAX_WORDS_COUNT = ((idx_t)-2);

  // It is observed that words from /usr/share/dict/words have ~3 connections
  // per word, so store up to 4 connections inline for performance reasons :)
  typedef compact_list<idx_t, 4, 8> connected_words_t;

  // List of connected words.
  connected_words_t _connected_words;

  // String associated with the word.
  const tiny_string _ts;

  // Word's color.
  color_t _color;

  static idx_t _get_idx(const word *const w)
  {
    assert(w >= _words);
    assert(w - _words < _next_idx);

    return w - _words;
  }

  class _words_brusher
  {
  private:
    stack_t **const _ws_p;
    const color_t _color;

  public:
    _words_brusher(stack_t **const ws_p, const color_t color) :
        _ws_p(ws_p), _color(color)
    {
      assert(_ws_p != 0);
      assert(_color != 0);
    }

    void operator () (const idx_t *const idx) const
    {
      word *const w = word::get_by_idx(*idx);
      if (w->get_color() == _color) {
        return;
      }
      w->set_color(_color);

      idx_t tmp = *idx;
      stack_t::push(_ws_p, &tmp);
    }
  };

  class _idx_finder
  {
  private:
    const word::idx_t _idx;

  public:
    _idx_finder(const word::idx_t idx) : _idx(idx) {}

    bool operator () (const word::idx_t *const idx) const
    {
      return (_idx == *idx);
    }
  };

  explicit word(const tiny_string *const ts) :
      _connected_words(), _ts(*ts), _color(0) {}

public:
  static word *get_by_idx(const idx_t idx)
  {
    assert(idx < _next_idx);

    return _words + idx;
  }

  // Reserves space for the given number of words.
  static void reserve_words(const size_t words_count)
  {
    assert(_words == 0);

    if (words_count > _MAX_WORDS_COUNT) {
      cerr << "Cannot allocate space for " << words_count << " words. Maximum " <<
          "number of words is " << _MAX_WORDS_COUNT <<
          "Widen word::idx_t for handling more words" << endl;
      ::exit(EXIT_FAILURE);
    }

    if (words_count > SIZE_MAX / sizeof(_words[0])) {
      cerr << "Cannot allocate space for " << words_count << " words" << endl;
      ::exit(EXIT_FAILURE);
    }

    const size_t buf_size = words_count * sizeof(_words[0]);
    _words = (word *) operator new (buf_size);
    assert((idx_t) words_count == words_count);
    _words_count = words_count;
    _next_idx = 0;

    cdbg << "*** Allocated " << buf_size << " bytes for " << words_count <<
        " words" << endl;
  }

  // Releases space allocated by reserve_words().
  static void release_words()
  {
    assert(_words != 0);
    assert(_words_count <= _MAX_WORDS_COUNT);
    assert(_next_idx <= _words_count);

    for (idx_t i = 0; i < _next_idx; ++i) {
      get_by_idx(i)->~word();
    }

    operator delete (_words);
    _words = 0;

    cdbg << "*** Freed " << (_words_count * sizeof(_words[0])) <<
        " bytes allocated for words" << endl;
  }

  // Adds new word into words storage.
  static void add_word(const tiny_string *const ts)
  {
    assert(_words != 0);
    assert(_words_count <= _MAX_WORDS_COUNT);
    assert(_next_idx < _words_count);

    ++_next_idx;
    word *const w = get_by_idx(_next_idx - 1);
    new (w) word(ts);
    cdbg << "Read " << w << endl;
  }

  bool is_connected_to(const word *const w) const
  {
    const idx_t idx = _get_idx(w);
    const _idx_finder idf(idx);
    const idx_t *const tmp = _connected_words.find_if(&idf);
    if (tmp != 0) {
      assert(*tmp == idx);
      return true;
    }
    return false;
  }

  void connect_to(const word *const w)
  {
    assert(w->_ts.size() == _ts.size() ||
        w->_ts.size() + 1 == _ts.size() ||
        w->_ts.size() == _ts.size() + 1);
    assert(!is_connected_to(w));

    idx_t idx = _get_idx(w);
    _connected_words.add(&idx);
  }

  void brush_connected_words(stack_t **const ws_p, const color_t color)
  {
    assert(color == _color);
    cdbg << "Brushing words connected to " << this << " into color=" <<
        color << endl;
    const _words_brusher wb(ws_p, color);
    _connected_words.for_each(&wb);
  }

  color_t get_color() const
  {
    return _color;
  }

  void set_color(const color_t color)
  {
    cdbg << "Brushing " << this << " with color=" << color << endl;
    assert(_color == 0);
    _color = color;
  }

  const tiny_string *get_string() const
  {
    return &_ts;
  }

  // Calls word_visitor for each word in the dictionary.
  // The order of visited words is unspecified.
  template <class WordVisitor>
  static void for_each(WordVisitor *const word_visitor)
  {
    assert(_words != 0);
    assert(_next_idx <= _words_count);

    for (idx_t i = 0; i < _next_idx; ++i) {
      const idx_t idx = i;
      (*word_visitor)(_words + idx, idx);
    }
  }

#ifndef NDEBUG
  void debug_print(ostream &output) const
  {
    output << "word(" << _ts << ")";
  }
#endif

  DENY_COPYING(word)
};
#pragma pack(pop)

const word::idx_t word::NULL_IDX = ((word::idx_t)-1);

word *word::_words;
word::idx_t word::_words_count;
word::idx_t word::_next_idx;

#ifndef NDEBUG
ostream &operator << (ostream &output, const word *const w)
{
  w->debug_print(output);
  return output;
}
#endif


// This class holds information related to a subword.
// Subword is a tuple (word, position), where subword's string can
// be created by removing a character from the word at the given position.
#pragma pack(push, 1)
class subword
{
private:

  // Word's index.
  word::idx_t _idx;

  // Position of the character to delete in the word.
  tiny_string::ts_size_t _pos;

  word *_word() const
  {
    return word::get_by_idx(_idx);
  }

  const tiny_string *_get_string() const
  {
    return _word()->get_string();
  }

  size_t _substring_size() const
  {
    const tiny_string *const ts = _get_string();

    assert(ts->size() > 0);
    assert(_pos <= ts->size());

    const size_t ts_size = ts->size();
    return _pos == ts_size ? ts_size : ts_size - 1;
  }

  static bool _substring_equal_to(const subword *const sw1,
      const subword *const sw2)
  {
    assert(sw1->_substring_size() == sw2->_substring_size());

    const tiny_string *const ts1 = sw1->_get_string();
    const tiny_string *const ts2 = sw2->_get_string();
    const tiny_string::ts_size_t pos1 = sw1->_pos;
    const tiny_string::ts_size_t pos2 = sw2->_pos;

    assert(pos1 <= pos2);
    assert(pos1 <= ts1->size());
    assert(pos2 <= ts2->size());

    const char *const s1 = ts1->data();
    const char *const s2 = ts2->data();

    for (size_t i = 0; i < pos1; ++i) {
      if (s1[i] != s2[i]) {
        return false;
      }
    }

    for (size_t i = pos1 + 1; i <= pos2; ++i) {
      assert(i < ts1->size());
      assert(i <= ts2->size());
      if (s1[i] != s2[i - 1]) {
        return false;
      }
    }

    for (size_t i = pos2 + 1; i < ts2->size(); ++i) {
      assert(i < ts1->size());
      if (s1[i] != s2[i]) {
        return false;
      }
    }

    return true;
  }

public:
  subword(const word::idx_t idx, const tiny_string::ts_size_t pos) :
      _idx(idx), _pos(pos)
  {
    assert(_pos <= _get_string()->size());
  }

  void connect_to(const subword *const sw)
  {
    word *const w1 = _word();
    word *const w2 = sw->_word();

    if (w1->is_connected_to(w2)) {
      cdbg << w1 << " is already connected to " << w2 << endl;
      return;
    }

    cdbg << "Connecting " << w1 << " to " << w2 << " via char ";
    if (_get_string()->size() == sw->_get_string()->size()) {
      cdbg << "substitution";
    }
    else {
      assert(_get_string()->size() + 1 == sw->_get_string()->size());
      cdbg << "insertion";
    }
    cdbg << " at position " << (size_t)sw->_pos << endl;

    w1->connect_to(w2);
    w2->connect_to(w1);
  }

  // Compares only subwords' substrings, but not character deletion positions.
  bool substring_equal_to(const subword *const sw) const
  {
    if (_substring_size() != sw->_substring_size()) {
      return false;
    }

    return (_pos <= sw->_pos) ? _substring_equal_to(this, sw) :
        _substring_equal_to(sw, this);
  }

  bool equal_to(const subword *const sw) const
  {
    return (_pos == sw->_pos && substring_equal_to(sw));
  }

  uint32_t hash() const
  {
    // Subwords with the same substring must have identical hash value,
    // so they will trap into the same hash table bucket.
    // So don't mix _pos into hash value.
    // This property is exploited when building links between words.

    const tiny_string *const ts = _get_string();
    const char *const s = ts->data();
    uint32_t h = 0;

    for (size_t i = 0; i < _pos; ++i) {
      h = jenkins_hash::mix(h, s[i]);
    }

    for (size_t i = _pos + 1; i < ts->size(); ++i) {
      h = jenkins_hash::mix(h, s[i]);
    }

    return jenkins_hash::final(h);
  }
};
#pragma pack(pop)


// Mapping between a word string and a color.
#pragma pack(push, 1)
struct word_to_color
{
  tiny_string ts;
  word::color_t color;

  word_to_color(const tiny_string *const ts_, const word::color_t color_) :
      ts(*ts_), color(color_)
  {
    assert(ts.size() > 0);
  }

  bool equal_to(const word_to_color *const wc) const
  {
    return ts.equal_to(&wc->ts);
  }

  uint32_t hash() const
  {
    return ts.hash();
  }
};
#pragma pack(pop)


// Subwords hashtable is used for building connections between words and
// for words' brushing.
// The hashtable is deleted after words are brushed.
typedef simple_hashtable<subword, 2> subwords_t;
subwords_t *subwords;


// Lookup table is used for mapping between dictionary words and
// the corresponding colors.
typedef simple_hashtable<word_to_color, 2> lookup_table_t;
lookup_table_t *lookup_table;


void get_stats_for_dict_words(const char *const string_buffer,
    const size_t string_buffer_size, size_t *const words_count_p,
    size_t *const total_words_size_p)
{
  cdbg << "*** Populating dict word stats" << endl;
  size_t words_count = 0;
  size_t total_words_size = 0;
  size_t i = 0;
  while (i < string_buffer_size) {
    char c = string_buffer[i++];
    if (::isspace(c)) {
      continue;
    }
    ++words_count;

    do {
      ++total_words_size;
      if (i == string_buffer_size) {
        break;
      }
      c = string_buffer[i++];
    } while (!::isspace(c));
  }
  cdbg << "*** Total words count: " << words_count << endl;
  cdbg << "*** Total chars in all words: " << total_words_size << endl;
  *words_count_p = words_count;
  *total_words_size_p = total_words_size;
}

void read_dict_words(char *const string_buffer, const size_t string_buffer_size)
{
  cdbg << "*** Reading dict words from the input" << endl;
  tiny_string::set_buffer(string_buffer, string_buffer_size);
  tiny_string ts;
  while (true) {
    tiny_string::get_next_string(&ts);
    size_t s_size = ts.size();
    if (s_size == 0) {
      cdbg << "*** End of input stream reached" << endl;
      break;
    }

    word::add_word(&ts);
  }
}

void build_subwords_for_word(const word *const w, const word::idx_t idx)
{
  cdbg << "Building subwords for " << w << endl;
  const tiny_string *const ts = w->get_string();
  for (size_t pos = 0; pos < ts->size(); ++pos) {
    subword sw(idx, pos);
    subword *const tmp = subwords->insert(&sw);
    if (tmp != 0) {
      assert(sw.equal_to(tmp));
      sw.connect_to(tmp);
    }
  }
}

void build_subwords_for_dict_words()
{
  cdbg << "*** Building subwords for dict words" << endl;
  word::for_each(&build_subwords_for_word);
}

class word_connections_builder
{
private:
  subword *const _sw;

public:
  word_connections_builder(subword *const sw) : _sw(sw) {}

  void operator () (subword *const sw) const
  {
    assert(subwords->get_bucket(_sw) == subwords->get_bucket(sw));

    if (sw->substring_equal_to(_sw)) {
      _sw->connect_to(sw);
    }
  }
};

void build_connections_for_word(word *const w, const word::idx_t idx)
{
  cdbg << "Building connections for " << w << endl;
  const tiny_string *const ts = w->get_string();
  subword sw(idx, ts->size());
  const word_connections_builder wcb(&sw);
  subwords->get_bucket(&sw)->for_each(&wcb);
}

void build_connections_for_dict_words()
{
  cdbg << "*** Building connections for dict words" << endl;
  word::for_each(&build_connections_for_word);
}

void brush_connected_graph(word::stack_t **const ws_p, word::idx_t idx,
    const word::color_t color)
{
  word *w = word::get_by_idx(idx);
  assert(w->get_color() == 0);
  assert(color > 0);

  word::idx_t null_idx = word::NULL_IDX;
  word::stack_t::push(ws_p, &null_idx);
  w->set_color(color);
  while (idx != word::NULL_IDX) {
    w = word::get_by_idx(idx);
    w->brush_connected_words(ws_p, color);
    word::stack_t::pop(ws_p, &idx);
  }
}

class dict_word_colorifier
{
private:
  word::stack_t *_ws;
  word::color_t _next_color;

public:
  dict_word_colorifier() : _ws(0), _next_color(0)
  {
    // Pre-allocate words' stack here in order to decrease the number of silly
    // new()/delete() calls for each brush_connected_graph() call.
    word::idx_t null_idx = word::NULL_IDX;
    word::stack_t::push(&_ws, &null_idx);
    assert(_ws != 0);
  }

  ~dict_word_colorifier()
  {
    assert(_ws != 0);

    word::idx_t idx;
    word::stack_t::pop(&_ws, &idx);

    assert(idx == word::NULL_IDX);
    assert(_ws == 0);
  }

  void operator () (word *const w, const word::idx_t idx)
  {
    if (w->get_color() != 0) {
      assert(w->get_color() <= _next_color);
      return;
    }

    ++_next_color;
    if (_next_color <= 0) {
      cerr << "Color space exhausted. Widen word::color_t for handling " <<
          "more colors" << endl;
      ::exit(EXIT_FAILURE);
    }

    brush_connected_graph(&_ws, idx, _next_color);
  }

  word::color_t get_next_color() const
  {
    return _next_color;
  }

  DENY_COPYING(dict_word_colorifier)
};

void colorify_dict_words()
{
  cdbg << "*** Colorifying dict words" << endl;
  dict_word_colorifier dwc;
  word::for_each(&dwc);
  cdbg << "*** Total colors=" << dwc.get_next_color() << endl;
}

void add_word_to_lookup_table(const word *const w, const word::idx_t idx)
{
  (void)idx;
  word_to_color wtc(w->get_string(), w->get_color());
  word_to_color *const tmp = lookup_table->insert(&wtc);
  if (tmp != 0) {
    assert(tmp->equal_to(&wtc));
    assert(tmp->color == wtc.color);
    cdbg << "Skipping duplicate " << w << endl;
  }
}

void build_lookup_table()
{
  cdbg << "*** Building a lookup table for dict words" << endl;
  word::for_each(&add_word_to_lookup_table);
}

bool get_word_color(istream &input, word::color_t *const color)
{
  tiny_string ts;
  tiny_string::read(input, &ts);
  if (ts.size() == 0) {
    cdbg << "End of input stream reached" << endl;
    return false;
  }

  assert(ts.size() > 0);
  cdbg << "Obtaining the color for word " << ts << endl;
  const word_to_color dummy_wtc(&ts, 0);
  const word_to_color *wtc = lookup_table->find(&dummy_wtc);
  if (wtc == 0) {
    cdbg << "Cannot find word " << ts << " in the dict" << endl;
    *color = 0;
  }
  else {
    assert(wtc->equal_to(&dummy_wtc));
    *color = wtc->color;
    assert(*color > 0);
    cdbg << "The word " << ts << " has color=" << *color << endl;
  }
  return true;
}

void read_word_pairs(istream &input)
{
  cdbg << "*** Started reading word pairs" << endl;

  while (input.good()) {
    word::color_t color1, color2;
    if (!get_word_color(input, &color1) || !get_word_color(input, &color2)) {
      break;
    }
    if (color1 == color2 && color1 != 0) {
      cout << "YES" << endl;
    }
    else {
      cout << "NO" << endl;
    }
  }

  cdbg << "*** Finished reading word pairs" << endl;
}

void show_usage(const char *const program_name)
{
  assert(program_name != 0);
  cerr << "Usage: " << program_name << " path/to/dict/file" << endl;
  cerr << "\tFeed word pairs to the input and the program will answer" << endl;
  cerr << "\tYES or NO for each pair depending on special rules" << endl;
  cerr << "\tdescribed at http://socialcam.com/jobs/problems " << endl;
  cerr << "\t(Word Chains problem)" << endl;
}

void read_dict_to_string_buffer(const char *const dict_filename,
    char **const string_buffer, size_t *const string_buffer_size)
{
  ifstream *const dict_file = new ifstream(dict_filename);
  if (!(*dict_file)) {
    cerr << "Cannot open the file [" << dict_filename << "]" << endl;
    ::exit(EXIT_FAILURE);
  }

  // Try reading the whole file into memory in one read() syscall.
  // mmap() could be faster, but it isn't portable.
  filebuf *const fbuf = dict_file->rdbuf();
  const size_t file_size = fbuf->pubseekoff(0, dict_file->end, dict_file->in);

  *string_buffer = new char[file_size];
  cdbg << "*** Allocated " << file_size << " bytes for string buffer" << endl;

  fbuf->pubseekpos(0, dict_file->in);
  const size_t bytes_read = fbuf->sgetn(*string_buffer, file_size);
  if (bytes_read != file_size) {
    cerr << "Expected to read " << file_size << " bytes from dict file, " <<
        "but read only " << bytes_read << " bytes" << endl;
    ::exit(EXIT_FAILURE);
  }

  dict_file->close();
  delete dict_file;

  *string_buffer_size = file_size;
}

void destroy_string_buffer(char *const string_buffer,
    const size_t string_buffer_size)
{
  delete[] string_buffer;
  cdbg << "*** Released " << string_buffer_size << " bytes allocated " <<
      "for string buffer" << endl;
  (void)string_buffer_size;
}

}  // end of anonymous namespace

int main(const int argc, const char *const argv[]) {
  if (argc != 2) {
    assert(argc >= 1);
    const char *const program_name = argv[0];
    assert(program_name != 0);
    show_usage(program_name);
    ::exit(EXIT_FAILURE);
  }

  const char *const dict_filename = argv[1];
  assert(dict_filename != 0);

  char *string_buffer;
  size_t string_buffer_size;
  read_dict_to_string_buffer(dict_filename, &string_buffer,
      &string_buffer_size);

  size_t words_count, total_words_size;
  get_stats_for_dict_words(string_buffer, string_buffer_size, &words_count,
      &total_words_size);
  assert(total_words_size >= words_count);

  if (words_count == 0) {
    cerr << "File " << dict_filename << " contains zero dict words" << endl;
    ::exit(EXIT_FAILURE);
  }

  word::reserve_words(words_count);

  read_dict_words(string_buffer, string_buffer_size);

  assert(subwords == 0);
  subwords = new subwords_t(total_words_size);

  build_subwords_for_dict_words();
  build_connections_for_dict_words();

  colorify_dict_words();

  assert(lookup_table == 0);
  lookup_table = new lookup_table_t(words_count);
  build_lookup_table();

  delete subwords;
  subwords = 0;
  word::release_words();

  read_word_pairs(cin);

  delete lookup_table;
  lookup_table = 0;
  destroy_string_buffer(string_buffer, string_buffer_size);

  cdbg << "*** Bye!" << endl;
  return EXIT_SUCCESS;
}