Skip to content

Instantly share code, notes, and snippets.

@dineshdharme
Last active May 11, 2024 09:11
Show Gist options
  • Save dineshdharme/962f2732849f7860ad3a9af84a3d0124 to your computer and use it in GitHub Desktop.
Save dineshdharme/962f2732849f7860ad3a9af84a3d0124 to your computer and use it in GitHub Desktop.
Find inline manager list all the way to CEO i.e. top using graph-tool library
I'm working with hierarchical data in PySpark where each employee has a manager, and I need to find all the inline managers for each employee. An inline manager is defined as the manager of the manager, and so on, until we reach the top-level manager (CEO) who does not have a manager.
Is it necessary that you have to use Pyspark in Databricks ?
If yes, this answer could help you. It does exactly that.
https://stackoverflow.com/a/77627393/3238085
Here is another solution which can help you. Unfortunately the guy who asked question has deleted it.
https://gist.github.com/dineshdharme/7c13dcde72e42fdd3ec47d1ad40f6177
Since this is a tree, graph problem statement, a processing library like NetworkX or Graphframes or graph-tool can be more beneficial.
I would suggest to look into them. I believe NetworkX or graph-tool is best.
https://graph-tool.skewed.de/
**What is graph-tool?**
*Graph-tool is an efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks). Contrary to most other Python modules with similar functionality, the core data structures and algorithms are implemented in C++, making extensive use of template metaprogramming, based heavily on the Boost Graph Library. This confers it a level of performance that is comparable (both in memory usage and computation time) to that of a pure C/C++ library.*
***Graph-tool can be orders of magnitude faster than Python-only alternatives, and therefore it is specially suited for large-scale network analysis.***
https://graph-tool.skewed.de/static/doc/index.html#installing-graph-tool
conda create --name gt -c conda-forge graph-tool
conda activate gt
Following is a simple example of how to do this :
from graph_tool.all import Graph, graph_draw, BFSVisitor, bfs_search
from graph_tool.draw import radial_tree_layout
data = [
("CEO", "M3"),
("M3", "M1"),
("M3", "M2"),
("M1", "E1"),
("M1", "E2"),
("M2", "E3")
]
g = Graph(directed=True)
vertex_dict = {}
def get_vertex(name):
if name not in vertex_dict:
v = g.add_vertex()
vertex_dict[name] = v
v_name[v] = name
return vertex_dict[name]
v_name = g.new_vertex_property("string")
for manager, employee in data:
v_manager = get_vertex(manager)
v_employee = get_vertex(employee)
g.add_edge(v_manager, v_employee)
class PathCollector(BFSVisitor):
def __init__(self, pred_map):
self.pred_map = pred_map
def tree_edge(self, e):
self.pred_map[e.target()] = e.source()
def find_paths(root):
pred_map = g.new_vertex_property("int64_t", -1)
bfs_search(g, root, PathCollector(pred_map))
def get_path_to(v):
path = []
while v != root and g.vertex(v) != None:
path.append(v)
v = pred_map[v]
path.append(root)
path.reverse()
return path
paths = {v_name[v]: [v_name[x] for x in get_path_to(v)] for v in vertex_dict.values() if v != root}
return paths
root_vertex = vertex_dict["CEO"]
paths = find_paths(root_vertex)
for emp, path in paths.items():
path_reversed = reversed(path)
print(f"Path from {emp} to CEO: {' -> '.join(path_reversed)}")
pos = radial_tree_layout(g, g.vertex(vertex_dict["CEO"]))
graph_draw(g, pos, vertex_text=v_name, vertex_font_size=18,
output_size=(1000, 1000), output="manager-employee.png")
Output :
Path from M3 to CEO: M3 -> CEO
Path from M1 to CEO: M1 -> M3 -> CEO
Path from M2 to CEO: M2 -> M3 -> CEO
Path from E1 to CEO: E1 -> M1 -> M3 -> CEO
Path from E2 to CEO: E2 -> M1 -> M3 -> CEO
Path from E3 to CEO: E3 -> M2 -> M3 -> CEO
Image generated by the library :
[![enter image description here][1]][1]
[1]: https://i.sstatic.net/ZLJ0EtIm.png
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment