Skip to content

Instantly share code, notes, and snippets.

@M41KL-N41TT
Last active February 20, 2024 03:30
Show Gist options
  • Save M41KL-N41TT/4d469f9195bcb1237c79595b53ae25a6 to your computer and use it in GitHub Desktop.
Save M41KL-N41TT/4d469f9195bcb1237c79595b53ae25a6 to your computer and use it in GitHub Desktop.
Optimizing Map Operations in Go

Optimizing Map Operations in Go: A Comprehensive Guide πŸš€

In this guide, we dive into the nuances of optimizing map operations in Go, specifically focusing on the performance aspects related to checking for the presence and addition of elements in a map. We will explore how leveraging the right data structures and understanding the underlying mechanisms can significantly enhance your Go applications.

Step 1: Understanding the Performance Characteristics

The performance of checking if a map contains an element with matching attributes and subsequently adding it if nonexistent intricately relies on multiple factors:

Memory Usage πŸ“¦

Storage of Elements

  • Each element in the map entails memory allocation for both the key and value (ProxyHost struct, in this context). The memory usage is influenced by the contents of these elements, particularly strings due to their dynamic sizing in Go, alongside internal struct padding or alignment.

Overhead

  • Go maps encompass additional memory overhead for maintaining metadata. This includes pointers to data, map size, etc., thus escalating the total memory footprint beyond the collective size of all ProxyHost structs.

Time Complexity ⏱️

Checking for Existence

  • Typically O(1), suggesting a constant time complexity that remains unaffected by the map size. However, scenarios exist where this degrades to O(n) due to collisions but are mostly mitigated by dynamic resizing and rehashing by the Go map implementation.

Adding an Element

  • Addition operations also average O(1) in time complexity, subjected to similar conditions as existence checks.

Practical Considerations πŸ› οΈ

  • Hashing of Keys: The performance metric is substantially influenced by the key's choice and its hashing efficiency. For operations based on value searches rather than key lookups, a full iteration over the map is required (O(n)).
  • Adding Only When Absent: Includes a prior search operation (O(n) in adverse scenarios) followed by an addition (O(1) under average situations).

Optimization Tip: For frequent checks or large maps, consider auxiliary structures (e.g., a set of hashes of the ProxyHost values) or reevaluate the data structure strategy to circumvent bottlenecks.

Step 2: Streamlining the Operation πŸ›€οΈ

To achieve maximum optimization in Go for checking and adding a ProxyHost struct within a map, employing a hashing strategy for the structs significantly reduces time complexity for both lookup and insertion operations to O(1) on average.

Implementing the Optimization

Consider the following Go code snippet demonstrating an efficient struct hashing and addition methodology:

type ProxyHost struct {
	phishSubdomain string
	origSubdomain  string
	domain         string
	handleSession  bool
	isLanding      bool
	autoFilter     bool
}

// Use a map with string keys (hashes of the struct fields) for efficient lookups.
var proxyHosts = make(map[string]ProxyHost)

// hashProxyHost creates a hash of the ProxyHost struct's fields. You might adjust the hashing
// logic based on your use case to include/exclude certain fields.
func hashProxyHost(ph ProxyHost) string {
	h := fnv.New32a()
	h.Write([]byte(ph.phishSubdomain))
	h.Write([]byte(ph.origSubdomain))
	h.Write([]byte(ph.domain))
	h.Write([]byte(fmt.Sprintf("%t%t%t", ph.handleSession, ph.isLanding, ph.autoFilter)))
	return fmt.Sprintf("%x", h.Sum32())
}

// addProxyHost checks if the ProxyHost already exists in the map, and adds it if it doesn't.
// This function returns true if the ProxyHost was added, false if it already existed.
func addProxyHost(ph ProxyHost) bool {
	hash := hashProxyHost(ph)
	if _, exists := proxyHosts[hash]; !exists {
		proxyHosts[hash] = ph
		return true
	}
	return false
}

Elements of This Strategy:

  1. hashProxyHost function: Generates a unique hash for a ProxyHost struct that efficiently serves as the key in a map.
  2. addProxyHost function: Utilizes the hash to check for an existing struct with identical fields, ensuring no duplicate entry is added.

Conclusion 🎯

By embracing these optimization techniques, you can convert the process which previously took O(n) amount of time and resources into a massively more performant process which only takes O(1)

Whats difference between O(n) and O(1) anyway?

Given these two scenarios (O(1) vs. O(n)) and record counts (10k, 20k, 50k), the time gain comparison is:

  • O(1) Operations: Time taken is constant regardless of the number of records. So, the time taken for 10k, 20k, or 50k records does not differ and thus shows the maximum time efficiency. Ideal for real-time systems where performance must remain consistent as data grows.

  • O(n) Operations: Time taken increases linearly with the number of records. The time it takes to perform an operation with 20k records is double the time for 10k, and with 50k records, it's five times the time for 10k.

Quantitative Analysis: To simplify, let's say an operation in O(1) takes 1 unit of time, and for O(n):
  • At 10k records, it takes 10k units.
  • At 20k records, it takes 20k units.
  • At 50k records, it takes 50k units.
So, the time gain of O(1) over O(n) substantially increases as the number of records grows:
  • For 10k records, O(1) is 10k times faster.
  • For 20k records, O(1) is 20k times faster.
  • For 50k records, O(1) is 50k times faster.

When considering operations on the proxyHosts map in Golang, our analysis bifurcates between two computational complexities: O(1) and O(n), against varying record counts (10k, 20k, 50k).

O(1) Complexity

Operations such as access, insertion, or deletion ideally exhibit constant time complexity (O(1)), facilitated by efficient hashing. This ensures that performance remains unaffected by the scale of data.

O(n) Complexity

Under theoretical scenarios, where hash collisions severely degrade performance, operations may linearly correlate with data size, resembling linear time complexity (O(n)).

Comparative Analysis

Records Time Complexity Unit Time Comparative Efficiency
10k O(1) 1 β˜…β˜…β˜…β˜…β˜…
10k O(n) 10k β˜…
20k O(1) 1 β˜…β˜…β˜…β˜…β˜…
20k O(n) 20k β˜…
50k O(1) 1 β˜…β˜…β˜…β˜…β˜…
50k O(n) 50k β˜…

Key Insights:

  • O(1) ensures maximum time efficiency, vital for real-time systems requiring stable performance irrespective of data volume.
  • In contrast, O(n) efficiency diminishes linearly with data growth, highlighting the significance of algorithmic and structural choices in scalable systems.

Conclusion

Adhering to O(1) maintains may help relieve performance issues related to the existing, very much so limited O(n) solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment