Skip to content

Instantly share code, notes, and snippets.

@adg
Created May 11, 2020 01:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adg/2eebefdad5c85327a981605a8500d5fd to your computer and use it in GitHub Desktop.
Save adg/2eebefdad5c85327a981605a8500d5fd to your computer and use it in GitHub Desktop.
package strings // import "strings"
// Compare returns an integer comparing two strings lexicographically.
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b.
//
// Compare is included only for symmetry with package bytes.
// It is usually clearer and always faster to use the built-in
// string comparison operators ==, <, >, and so on.
func Compare(a, b string) int {
// NOTE(rsc): This function does NOT call the runtime cmpstring function,
// because we do not want to provide any performance justification for
// using strings.Compare. Basically no one should use strings.Compare.
// As the comment above says, it is here only for symmetry with package bytes.
// If performance is important, the compiler should be changed to recognize
// the pattern so that all code doing three-way comparisons, not just code
// using strings.Compare, can benefit.
// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// A Reader implements the io.Reader, io.ReaderAt, io.Seeker, io.WriterTo,
// io.ByteScanner, and io.RuneScanner interfaces by reading
// from a string.
// The zero value for Reader operates like a Reader of an empty string.
// current reading index
// index of previous rune; or < 0
// Len returns the number of bytes of the unread portion of the
// string.
// Size returns the original length of the underlying string.
// Size is the number of bytes available for reading via ReadAt.
// The returned value is always the same and is not affected by calls
// to any other method.
if a == b {
return 0
}
if a < b {
return -1
}
return +1
}
// cannot modify state - see io.ReaderAt
// Seek implements the io.Seeker interface.
// WriteTo implements the io.WriterTo interface.
// Reset resets the Reader to be reading from s.
// NewReader returns a new Reader reading from s.
// It is similar to bytes.NewBufferString but more efficient and read-only.
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Replacer replaces a list of strings with replacements.
// It is safe for concurrent use by multiple goroutines.
// guards buildOnce method
// replacer is the interface that a replacement algorithm needs to implement.
// NewReplacer returns a new Replacer from a list of old, new string
// pairs. Replacements are performed in the order they appear in the
// target string, without overlapping matches. The old string
// comparisons are done in argument order.
//
// NewReplacer panics if given an odd number of arguments.
// The first occurrence of old->new map takes precedence
// over the others with the same old string.
// The first occurrence of old->new map takes precedence
// over the others with the same old string.
// To avoid counting repetitions multiple times.
// We need to use string([]byte{o}) instead of string(o),
// to avoid utf8 encoding of o.
// E. g. byte(150) produces string of length 2.
// Replace returns a copy of s with all replacements performed.
// WriteString writes s to w with all replacements performed.
// trieNode is a node in a lookup trie for prioritized key/value pairs. Keys
// and values may be empty. For example, the trie containing keys "ax", "ay",
// "bcbc", "x" and "xy" could have eight nodes:
//
// n0 -
// n1 a-
// n2 .x+
// n3 .y+
// n4 b-
// n5 .cbc+
// n6 x+
// n7 .y+
//
// n0 is the root node, and its children are n1, n4 and n6; n1's children are
// n2 and n3; n4's child is n5; n6's child is n7. Nodes n0, n1 and n4 (marked
// with a trailing "-") are partial keys, and nodes n2, n3, n5, n6 and n7
// (marked with a trailing "+") are complete keys.
// value is the value of the trie node's key/value pair. It is empty if
// this node is not a complete key.
// priority is the priority (higher is more important) of the trie node's
// key/value pair; keys are not necessarily matched shortest- or longest-
// first. Priority is positive if this node is a complete key, and zero
// otherwise. In the example above, positive/zero priorities are marked
// with a trailing "+" or "-".
// A trie node may have zero, one or more child nodes:
// * if the remaining fields are zero, there are no children.
// * if prefix and next are non-zero, there is one child in next.
// * if table is non-zero, it defines all the children.
//
// Prefixes are preferred over tables when there is one child, but the
// root node always uses a table for lookup efficiency.
// prefix is the difference in keys between this trie node and the next.
// In the example above, node n4 has prefix "cbc" and n4's next node is n5.
// Node n5 has no children and so has zero prefix, next and table fields.
// table is a lookup table indexed by the next byte in the key, after
// remapping that byte through genericReplacer.mapping to create a dense
// index. In the example above, the keys only use 'a', 'b', 'c', 'x' and
// 'y', which remap to 0, 1, 2, 3 and 4. All other bytes remap to 5, and
// genericReplacer.tableSize will be 5. Node n0's table will be
// []*trieNode{ 0:n1, 1:n4, 3:n6 }, where the 0, 1 and 3 are the remapped
// 'a', 'b' and 'x'.
// Need to split the prefix among multiple nodes.
// length of the longest common prefix
// First byte differs, start a new lookup table here. Looking up
// what is currently t.prefix[0] will lead to prefixNode, and
// looking up key[0] will lead to keyNode.
// Insert new node after the common section of the prefix.
// Insert into existing table.
// Iterate down the trie to the end, and grab the value and keylen with
// the highest priority.
// genericReplacer is the fully generic algorithm.
// It's used as a fallback when nothing faster can be used.
// tableSize is the size of a trie node's lookup table. It is the number
// of unique key bytes.
// mapping maps from key bytes to a dense index for trieNode.table.
// Find each byte used, then assign them each an index.
// Ensure root node uses a lookup table (for performance).
// Write writes to the buffer to satisfy io.Writer.
// WriteString writes to the buffer without string->[]byte->string allocations.
// Fast path: s[i] is not a prefix of any pattern.
// Ignore the empty match iff the previous loop found the empty match.
// singleStringReplacer is the implementation that's used when there is only
// one string to replace (and that string has more than one byte).
// value is the new string that replaces that pattern when it's found.
// byteReplacer is the implementation that's used when all the "old"
// and "new" values are single ASCII bytes.
// The array contains replacement bytes indexed by old byte.
// lazily allocated
// TODO(bradfitz): use io.WriteString with slices of s, avoiding allocation.
// byteStringReplacer is the implementation that's used when all the
// "old" values are single ASCII bytes but the "new" values vary in size.
// replacements contains replacement byte slices indexed by old byte.
// A nil []byte means that the old byte should not be replaced.
// toReplace keeps a list of bytes to replace. Depending on length of toReplace
// and length of target string it may be faster to use Count, or a plain loop.
// We store single byte as a string, because Count takes a string.
// countCutOff controls the ratio of a string length to a number of replacements
// at which (*byteStringReplacer).Replace switches algorithms.
// For strings with higher ration of length to replacements than that value,
// we call Count, for each replacement from toReplace.
// For strings, with a lower ratio we use simple loop, because of Count overhead.
// countCutOff is an empirically determined overhead multiplier.
// TODO(tocarip) revisit once we have register-based abi/mid-stack inlining.
// Is it faster to use Count?
// The -1 is because we are replacing 1 byte with len(replacements[b]) bytes.
// See above for explanation of -1
// Copyright 2012 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// stringFinder efficiently finds strings in a source text. It's implemented
// using the Boyer-Moore string search algorithm:
// https://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm
// https://www.cs.utexas.edu/~moore/publications/fstrpos.pdf (note: this aged
// document uses 1-based indexing)
// pattern is the string that we are searching for in the text.
// badCharSkip[b] contains the distance between the last byte of pattern
// and the rightmost occurrence of b in pattern. If b is not in pattern,
// badCharSkip[b] is len(pattern).
//
// Whenever a mismatch is found with byte b in the text, we can safely
// shift the matching frame at least badCharSkip[b] until the next time
// the matching char could be in alignment.
// goodSuffixSkip[i] defines how far we can shift the matching frame given
// that the suffix pattern[i+1:] matches, but the byte pattern[i] does
// not. There are two cases to consider:
//
// 1. The matched suffix occurs elsewhere in pattern (with a different
// byte preceding it that we might possibly match). In this case, we can
// shift the matching frame to align with the next suffix chunk. For
// example, the pattern "mississi" has the suffix "issi" next occurring
// (in right-to-left order) at index 1, so goodSuffixSkip[3] ==
// shift+len(suffix) == 3+4 == 7.
//
// 2. If the matched suffix does not occur elsewhere in pattern, then the
// matching frame may share part of its prefix with the end of the
// matching suffix. In this case, goodSuffixSkip[i] will contain how far
// to shift the frame to align this portion of the prefix to the
// suffix. For example, in the pattern "abcxxxabc", when the first
// mismatch from the back is found to be in position 3, the matching
// suffix "xxabc" is not found elsewhere in the pattern. However, its
// rightmost "abc" (at position 6) is a prefix of the whole pattern, so
// goodSuffixSkip[3] == shift+len(suffix) == 6+5 == 11.
// last is the index of the last character in the pattern.
// Build bad character table.
// Bytes not in the pattern can skip one pattern's length.
// The loop condition is < instead of <= so that the last byte does not
// have a zero distance to itself. Finding this byte out of place implies
// that it is not in the last position.
// Build good suffix table.
// First pass: set each value to the next index which starts a prefix of
// pattern.
// lastPrefix is the shift, and (last-i) is len(suffix).
// Second pass: find repeats of pattern's suffix starting from the front.
// (last-i) is the shift, and lenSuffix is len(suffix).
// next returns the index in text of the first occurrence of the pattern. If
// the pattern is not found, it returns -1.
// Compare backwards from the end until the first unmatching character.
// match
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment