Skip to content

Instantly share code, notes, and snippets.

@freemandealer
Created March 21, 2023 09:41
Show Gist options
  • Save freemandealer/33aac8ff64151ade5e91dcdcc5027d33 to your computer and use it in GitHub Desktop.
Save freemandealer/33aac8ff64151ade5e91dcdcc5027d33 to your computer and use it in GitHub Desktop.
get memory consumption of data from an apache ORC format file
import pyorc
import sys
# 打开要读取的orc文件
with open("./output.orc", "rb") as data:
# 创建一个读取器对象
reader = pyorc.Reader(data)
# 初始化一个变量来存储总大小
total_size = 0
# 遍历文件中的每一行记录
for row in reader:
# 计算当前行记录在内存中的大小,并累加到总大小中
total_size += sys.getsizeof(row)
# 打印总大小
print(f"The total size of the orc file content in memory is {total_size} bytes.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment