Created
May 11, 2020 01:55
-
-
Save adg/2eebefdad5c85327a981605a8500d5fd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package strings // import "strings" | |
// Compare returns an integer comparing two strings lexicographically. | |
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b. | |
// | |
// Compare is included only for symmetry with package bytes. | |
// It is usually clearer and always faster to use the built-in | |
// string comparison operators ==, <, >, and so on. | |
func Compare(a, b string) int { | |
// NOTE(rsc): This function does NOT call the runtime cmpstring function, | |
// because we do not want to provide any performance justification for | |
// using strings.Compare. Basically no one should use strings.Compare. | |
// As the comment above says, it is here only for symmetry with package bytes. | |
// If performance is important, the compiler should be changed to recognize | |
// the pattern so that all code doing three-way comparisons, not just code | |
// using strings.Compare, can benefit. | |
// Copyright 2009 The Go Authors. All rights reserved. | |
// Use of this source code is governed by a BSD-style | |
// license that can be found in the LICENSE file. | |
// A Reader implements the io.Reader, io.ReaderAt, io.Seeker, io.WriterTo, | |
// io.ByteScanner, and io.RuneScanner interfaces by reading | |
// from a string. | |
// The zero value for Reader operates like a Reader of an empty string. | |
// current reading index | |
// index of previous rune; or < 0 | |
// Len returns the number of bytes of the unread portion of the | |
// string. | |
// Size returns the original length of the underlying string. | |
// Size is the number of bytes available for reading via ReadAt. | |
// The returned value is always the same and is not affected by calls | |
// to any other method. | |
if a == b { | |
return 0 | |
} | |
if a < b { | |
return -1 | |
} | |
return +1 | |
} | |
// cannot modify state - see io.ReaderAt | |
// Seek implements the io.Seeker interface. | |
// WriteTo implements the io.WriterTo interface. | |
// Reset resets the Reader to be reading from s. | |
// NewReader returns a new Reader reading from s. | |
// It is similar to bytes.NewBufferString but more efficient and read-only. | |
// Copyright 2011 The Go Authors. All rights reserved. | |
// Use of this source code is governed by a BSD-style | |
// license that can be found in the LICENSE file. | |
// Replacer replaces a list of strings with replacements. | |
// It is safe for concurrent use by multiple goroutines. | |
// guards buildOnce method | |
// replacer is the interface that a replacement algorithm needs to implement. | |
// NewReplacer returns a new Replacer from a list of old, new string | |
// pairs. Replacements are performed in the order they appear in the | |
// target string, without overlapping matches. The old string | |
// comparisons are done in argument order. | |
// | |
// NewReplacer panics if given an odd number of arguments. | |
// The first occurrence of old->new map takes precedence | |
// over the others with the same old string. | |
// The first occurrence of old->new map takes precedence | |
// over the others with the same old string. | |
// To avoid counting repetitions multiple times. | |
// We need to use string([]byte{o}) instead of string(o), | |
// to avoid utf8 encoding of o. | |
// E. g. byte(150) produces string of length 2. | |
// Replace returns a copy of s with all replacements performed. | |
// WriteString writes s to w with all replacements performed. | |
// trieNode is a node in a lookup trie for prioritized key/value pairs. Keys | |
// and values may be empty. For example, the trie containing keys "ax", "ay", | |
// "bcbc", "x" and "xy" could have eight nodes: | |
// | |
// n0 - | |
// n1 a- | |
// n2 .x+ | |
// n3 .y+ | |
// n4 b- | |
// n5 .cbc+ | |
// n6 x+ | |
// n7 .y+ | |
// | |
// n0 is the root node, and its children are n1, n4 and n6; n1's children are | |
// n2 and n3; n4's child is n5; n6's child is n7. Nodes n0, n1 and n4 (marked | |
// with a trailing "-") are partial keys, and nodes n2, n3, n5, n6 and n7 | |
// (marked with a trailing "+") are complete keys. | |
// value is the value of the trie node's key/value pair. It is empty if | |
// this node is not a complete key. | |
// priority is the priority (higher is more important) of the trie node's | |
// key/value pair; keys are not necessarily matched shortest- or longest- | |
// first. Priority is positive if this node is a complete key, and zero | |
// otherwise. In the example above, positive/zero priorities are marked | |
// with a trailing "+" or "-". | |
// A trie node may have zero, one or more child nodes: | |
// * if the remaining fields are zero, there are no children. | |
// * if prefix and next are non-zero, there is one child in next. | |
// * if table is non-zero, it defines all the children. | |
// | |
// Prefixes are preferred over tables when there is one child, but the | |
// root node always uses a table for lookup efficiency. | |
// prefix is the difference in keys between this trie node and the next. | |
// In the example above, node n4 has prefix "cbc" and n4's next node is n5. | |
// Node n5 has no children and so has zero prefix, next and table fields. | |
// table is a lookup table indexed by the next byte in the key, after | |
// remapping that byte through genericReplacer.mapping to create a dense | |
// index. In the example above, the keys only use 'a', 'b', 'c', 'x' and | |
// 'y', which remap to 0, 1, 2, 3 and 4. All other bytes remap to 5, and | |
// genericReplacer.tableSize will be 5. Node n0's table will be | |
// []*trieNode{ 0:n1, 1:n4, 3:n6 }, where the 0, 1 and 3 are the remapped | |
// 'a', 'b' and 'x'. | |
// Need to split the prefix among multiple nodes. | |
// length of the longest common prefix | |
// First byte differs, start a new lookup table here. Looking up | |
// what is currently t.prefix[0] will lead to prefixNode, and | |
// looking up key[0] will lead to keyNode. | |
// Insert new node after the common section of the prefix. | |
// Insert into existing table. | |
// Iterate down the trie to the end, and grab the value and keylen with | |
// the highest priority. | |
// genericReplacer is the fully generic algorithm. | |
// It's used as a fallback when nothing faster can be used. | |
// tableSize is the size of a trie node's lookup table. It is the number | |
// of unique key bytes. | |
// mapping maps from key bytes to a dense index for trieNode.table. | |
// Find each byte used, then assign them each an index. | |
// Ensure root node uses a lookup table (for performance). | |
// Write writes to the buffer to satisfy io.Writer. | |
// WriteString writes to the buffer without string->[]byte->string allocations. | |
// Fast path: s[i] is not a prefix of any pattern. | |
// Ignore the empty match iff the previous loop found the empty match. | |
// singleStringReplacer is the implementation that's used when there is only | |
// one string to replace (and that string has more than one byte). | |
// value is the new string that replaces that pattern when it's found. | |
// byteReplacer is the implementation that's used when all the "old" | |
// and "new" values are single ASCII bytes. | |
// The array contains replacement bytes indexed by old byte. | |
// lazily allocated | |
// TODO(bradfitz): use io.WriteString with slices of s, avoiding allocation. | |
// byteStringReplacer is the implementation that's used when all the | |
// "old" values are single ASCII bytes but the "new" values vary in size. | |
// replacements contains replacement byte slices indexed by old byte. | |
// A nil []byte means that the old byte should not be replaced. | |
// toReplace keeps a list of bytes to replace. Depending on length of toReplace | |
// and length of target string it may be faster to use Count, or a plain loop. | |
// We store single byte as a string, because Count takes a string. | |
// countCutOff controls the ratio of a string length to a number of replacements | |
// at which (*byteStringReplacer).Replace switches algorithms. | |
// For strings with higher ration of length to replacements than that value, | |
// we call Count, for each replacement from toReplace. | |
// For strings, with a lower ratio we use simple loop, because of Count overhead. | |
// countCutOff is an empirically determined overhead multiplier. | |
// TODO(tocarip) revisit once we have register-based abi/mid-stack inlining. | |
// Is it faster to use Count? | |
// The -1 is because we are replacing 1 byte with len(replacements[b]) bytes. | |
// See above for explanation of -1 | |
// Copyright 2012 The Go Authors. All rights reserved. | |
// Use of this source code is governed by a BSD-style | |
// license that can be found in the LICENSE file. | |
// stringFinder efficiently finds strings in a source text. It's implemented | |
// using the Boyer-Moore string search algorithm: | |
// https://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm | |
// https://www.cs.utexas.edu/~moore/publications/fstrpos.pdf (note: this aged | |
// document uses 1-based indexing) | |
// pattern is the string that we are searching for in the text. | |
// badCharSkip[b] contains the distance between the last byte of pattern | |
// and the rightmost occurrence of b in pattern. If b is not in pattern, | |
// badCharSkip[b] is len(pattern). | |
// | |
// Whenever a mismatch is found with byte b in the text, we can safely | |
// shift the matching frame at least badCharSkip[b] until the next time | |
// the matching char could be in alignment. | |
// goodSuffixSkip[i] defines how far we can shift the matching frame given | |
// that the suffix pattern[i+1:] matches, but the byte pattern[i] does | |
// not. There are two cases to consider: | |
// | |
// 1. The matched suffix occurs elsewhere in pattern (with a different | |
// byte preceding it that we might possibly match). In this case, we can | |
// shift the matching frame to align with the next suffix chunk. For | |
// example, the pattern "mississi" has the suffix "issi" next occurring | |
// (in right-to-left order) at index 1, so goodSuffixSkip[3] == | |
// shift+len(suffix) == 3+4 == 7. | |
// | |
// 2. If the matched suffix does not occur elsewhere in pattern, then the | |
// matching frame may share part of its prefix with the end of the | |
// matching suffix. In this case, goodSuffixSkip[i] will contain how far | |
// to shift the frame to align this portion of the prefix to the | |
// suffix. For example, in the pattern "abcxxxabc", when the first | |
// mismatch from the back is found to be in position 3, the matching | |
// suffix "xxabc" is not found elsewhere in the pattern. However, its | |
// rightmost "abc" (at position 6) is a prefix of the whole pattern, so | |
// goodSuffixSkip[3] == shift+len(suffix) == 6+5 == 11. | |
// last is the index of the last character in the pattern. | |
// Build bad character table. | |
// Bytes not in the pattern can skip one pattern's length. | |
// The loop condition is < instead of <= so that the last byte does not | |
// have a zero distance to itself. Finding this byte out of place implies | |
// that it is not in the last position. | |
// Build good suffix table. | |
// First pass: set each value to the next index which starts a prefix of | |
// pattern. | |
// lastPrefix is the shift, and (last-i) is len(suffix). | |
// Second pass: find repeats of pattern's suffix starting from the front. | |
// (last-i) is the shift, and lenSuffix is len(suffix). | |
// next returns the index in text of the first occurrence of the pattern. If | |
// the pattern is not found, it returns -1. | |
// Compare backwards from the end until the first unmatching character. | |
// match |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment