Bhavani Sudha Saktheeswaran bhasudha

## Hudi record level meta field benchmark.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                bhasudha
                / Hudi record level meta field benchmark.md
            
            
              Created
              May 19, 2023 00:15
            
              
                Hudi record level meta field benchmark
              
          
    Goal:

The idea is to analyze the cost of the metafields that Hudi stores at record level.
Setup:

Choose narrow to wide tables with varying columns ➝. 10, 30, 100, 1000 columns etc. Use Auto keygen with bulk_insert operation for Hudi. Generate vanilla parquet data via spark and non-partitioned HUDI COW table for comparison. For spark we can reduce the partition to 1 to compare it to non-partitioned Hudi table.  We will assume the input json data size is roughly the same for all three tables. Here we take ~ 350MB input json file size.
Schema generation:

I used chat gpt to generate a random json schema with # of columns that have primitive data types and built-in formats. One such schema for a 10-column table looks like below. Built-in formats in Json allow for more realistic data. The columns' names are boring though.
/tmp/hudi-metafields-benchmark/ten-columns-schema.json

  
## hudi integ test hang
root@adhoc-1:/opt# jstack 1025
2019-10-25 12:16:52
Full thread dump OpenJDK 64-Bit Server VM (25.212-b01 mixed mode):

"Attach Listener" #76 daemon prio=9 os_prio=0 tid=0x00007f5620001000 nid=0x49b waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"DestroyJavaVM" #73 prio=5 os_prio=0 tid=0x00007f565c00f800 nid=0x417 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

## gist:5710154
    @Override
    public void ensureWritableBytes(int minWritableBytes) {
        if (minWritableBytes <= writableBytes()) {
            return;
        }

        int newCapacity;
        if (capacity() == 0) {
            newCapacity = 1;
        } else {
	root@adhoc-1:/opt# jstack 1025
	2019-10-25 12:16:52
	Full thread dump OpenJDK 64-Bit Server VM (25.212-b01 mixed mode):

	"Attach Listener" #76 daemon prio=9 os_prio=0 tid=0x00007f5620001000 nid=0x49b waiting on condition [0x0000000000000000]
	java.lang.Thread.State: RUNNABLE

	"DestroyJavaVM" #73 prio=5 os_prio=0 tid=0x00007f565c00f800 nid=0x417 waiting on condition [0x0000000000000000]
	java.lang.Thread.State: RUNNABLE
	@Override
	public void ensureWritableBytes(int minWritableBytes) {
	if (minWritableBytes <= writableBytes()) {
	return;
	}

	int newCapacity;
	if (capacity() == 0) {
	newCapacity = 1;
	} else {