-
-
Save eguiraud/77a0ca3566e66bc6b8cd0f9e156c983b to your computer and use it in GitHub Desktop.
import ROOT | |
ROOT.gInterpreter.Declare(""" | |
// A thread-safe stateful filter that lets only one event pass for each value of | |
// "category" (where "category" is a random character). | |
// It is using gCoreMutex, which is a read-write lock, to have a bit less contention between threads. | |
class FilterOnePerKind { | |
std::unordered_set<char> _seenCategories; | |
public: | |
bool operator()(char category) { | |
{ | |
R__READ_LOCKGUARD(ROOT::gCoreMutex); // many threads can take a read lock concurrently | |
if (_seenCategories.count(category) == 1) | |
return false; | |
} | |
// if we are here, `category` was not already in _seenCategories | |
R__WRITE_LOCKGUARD(ROOT::gCoreMutex); // only one thread at a time can take the write lock | |
_seenCategories.insert(category); | |
return true; | |
} | |
}; | |
""") | |
ROOT.EnableImplicitMT(); | |
df = ROOT.RDataFrame(100).Define("category", "char(rdfentry_ % 10)") | |
cols = ROOT.std.vector['string'](["category"]) | |
df_with_unique_categories = df.Filter(ROOT.FilterOnePerKind(), cols) | |
print(df_with_unique_categories.Count().GetValue()) |
I tried this
#include <iostream>
#include <vector>
#include <unordered_map>
class ElementTracker {
public:
// Method to check if a given element (represented by id and value) has the highest value among elements with the same id
bool operator()(unsigned long long id, double value) {
// Check if the current element has a higher value than the stored value
if (highestValues.find(id) == highestValues.end() || value > highestValues[id]) {
highestValues[id] = value;
return true;
}
return value == highestValues[id];
}
private:
// Map to store the highest value for each id
std::unordered_map<unsigned long long, double> highestValues;
};
However I'm getting the highest element with unique ID only if it's also the first element seen (quite obviously). Any idea how to solve this?
hi @dlanci ,
Are you running with ROOT::EnableImplicitMT()
? this code is not thread-safe (there is no mutex protecting the accesses to highestValues
).
You can add print-outs to debug what's going on.
Also I'm not sure you need a Filter for this, you could implement this as a Reduce or similar.
Your best bet is to open a new topic on the forum where root devs can help out properly.
Cheers,
Enrico
Hi @eguiraud ,
Thanks for your answer! I have opened a thread here: https://root-forum.cern.ch/t/select-unique-candidates-based-on-their-id-and-the-value-of-a/59668/3
I'm not currently running with ROOT::EnableImplicitMT()
. For now I just want to get the expected result, then will extend functionality to MT execution
Best,
Davide
Hi @eguiraud ,
Thanks a lot for the nice snippet. Do you think it would be possible, with a similar strategy, to write some thead-safe stateful filter that would retain only the category that satisfies a condition? For example unique-ize the RDataFrame keeping only categories with the highest BDT output. I'm not sure if there is already some useful function implementation for that.
Cheers,
Davide