Skip to content

Instantly share code, notes, and snippets.

@renctan
Created May 2, 2013 15:02
Show Gist options
  • Save renctan/5502835 to your computer and use it in GitHub Desktop.
Save renctan/5502835 to your computer and use it in GitHub Desktop.
Proof of concept for generalized chunk range squashing. Note: T compiles for both ChunkType (needs to define specialized rangeIsNewer) and TagsType.
namespace mongo {
template <typename T>
bool rangeIsNewer(const T* lhs, const T* rhs);
/**
* Given a list of ranges (note: must be sorted by min), outputs the operations needed
* to squash the the overlapping ranges such that the list of ranges will not overlap
* one another.
*
* The caller owns the pointer to both the input and output vectors. The method guarantees
* that none of the vectors will share the same pointer.
*/
template <typename T>
void squash(const std::vector<T*>& ranges,
std::vector<T*>* toDelete,
std::vector<T*>* toInsert) {
const T* rangeBefore = *ranges.begin();
for (typename std::vector<T*>::const_iterator it = ranges.begin() + 1;
it != ranges.end(); ++it) {
const T* rangeNow = *it;
if (rangeBefore->getMin().woCompare(rangeNow->getMin()) == 0) {
if (rangeIsNewer(rangeBefore, rangeNow)) {
T* newRange = new T;
rangeNow->cloneTo(newRange);
toDelete->push_back(newRange);
}
else {
T* newRange = new T;
rangeBefore->cloneTo(newRange);
toDelete->push_back(newRange);
}
}
else if (rangeBefore->getMax().woCompare(rangeNow->getMin()) > 0) {
do {
T* newRange = new T;
rangeNow->cloneTo(newRange);
toDelete->push_back(newRange);
++it;
} while (it != ranges.end() &&
rangeBefore->getMax().woCompare(rangeNow->getMin()) > 0);
T* newRange = new T;
rangeNow->cloneTo(newRange);
newRange->setMin(rangeBefore->getMax());
toInsert->push_back(newRange);
}
rangeBefore = *it;
}
}
}
namespace mongo {
// Note: getID is a new method that is going to be return an OID.
template <>
bool rangeIsNewer<TagsType>(const TagsType* lhs, const TagsType* rhs) {
return lhs->getID().compare(rhs->getID()) > 0;
}
}
@gregstuder
Copy link

Seems like a cool piece of code :-).

A few comments -

Do we need to use a template at all? This function really only makes sense for something that has "Ranges+Tags", where Tag is some arbitrary value. We could just use TagsType, but if we want to enforce a separation here we could just create a simple struct TaggedRange { Range range; string tag; }; and have a helper that explicitly converts back and forth. Or the struct could be templatized if we want to attach arbitrary other context to it. Speed isn't really an issue here.
This would also avoid ownership issues since the TaggedRange type could be copyable - currently it'd be a pain to go through and manually clean up all the memory we get back.

If we actually wanted to do things outside of applyOps (maybe not yet, but eventually), we'd also need "toModify" output? Then we could apply the operations in order of "modify, delete, insert" and never have a "hole" in the range.

Should this function also output "normalized" ranges as a result? If so, we could use it both for interpreting messy ranges and fixing them.

@renctan
Copy link
Author

renctan commented May 2, 2013

That could work as well. The motivation for using the TagsType/ChunkType instead of an intermediate TaggedRange class is it needs to have the superset of the information both the class have in order not to lose information when converting ranges. I do agree that it's a pain to manage the memory and would like to avoid to manually do it as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment