Skip to content

Instantly share code, notes, and snippets.

@gnumoreno
Last active April 24, 2024 17:09
Show Gist options
  • Save gnumoreno/27b64b5230d9529b81ffc93d9cc8d6a4 to your computer and use it in GitHub Desktop.
Save gnumoreno/27b64b5230d9529b81ffc93d9cc8d6a4 to your computer and use it in GitHub Desktop.
Running TPC-H with StarRocks on MacOS

curl --location-trusted -u root: -T ~/tpch-poc-0.1.2/benchmark/data_10/region.tbl -H "column_separator:|" http://localhost:8030/api/tpch/region/_stream_load

{
  "Status": "FAILED",
  "Message": "class com.starrocks.common.DdlException: There is no 100-continue header"
}%

curl --location-trusted -u root: -T /Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/region.tbl -H "column_separator:|" -H 'Expect: 100-continue' http://localhost:8030/api/tpch/region/_stream_load

{
    "TxnId": 82076,
    "Label": "d38e03c0-f339-4f14-bcdf-1d5f4f30e394",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 5,
    "NumberLoadedRows": 5,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 384,
    "LoadTimeMs": 58,
    "BeginTxnTimeMs": 1,
    "StreamLoadPlanTimeMs": 4,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 23,
    "CommitAndPublishTimeMs": 27
}%

Replication 3 won't work on Docker

cd ~/tpch-poc-0.1.2/benchmark
./bin/create_db_table.sh ddl_100
/Users/morenogarcia/tpch-poc-0.1.2/benchmark/src/db_table_operation.py:93: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn("create table error. sql: %s, msg: %s", sql_file_path, res["msg"])
[WARNING] 2024-04-23 16:50:28 db_table_operation.py[93] create table error. sql: /Users/morenogarcia/tpch-poc-0.1.2/benchmark/sql/tpch/ddl_100/tpch_create.sql, msg: (1064, 'Unexpected exception: Table replication num should be less than of equal to the number of available BE nodes. You can change this default by setting the replication_num table properties. Current alive backend is [10004]. table=nation, properties.replication_num=3')

Solution

sed -i '' 's/"3",/"1",/g' sql/tpch/ddl_100/tpch_create.sql

./bin/create_db_table.sh ddl_100

[INFO] 2024-04-23 17:30:50 db_table_operation.py[95] create table success. sql: /Users/morenogarcia/tpch-poc-0.1.2/benchmark/sql/tpch/ddl_100/tpch_create.sql
232M customer.tbl
3.6G lineitem.tbl.1
3.6G lineitem.tbl.2
2.1K nation.tbl
1.6G orders.tbl
230M part.tbl
1.1G partsupp.tbl
384B region.tbl
13M supplier.tbl

dbgen does not compile on MacOS

cd tpch-poc-0.1.2/benchmark/thirdparty/tpch-dbgen

make

gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o build.o build.c
build.c:109:48: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
                sprintf(szFormat, C_NAME_FMT, 9, HUGE_FORMAT + 1);
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^~~~~~~~~~~
build.c:109:48: note: use array indexing to silence this warning
                sprintf(szFormat, C_NAME_FMT, 9, HUGE_FORMAT + 1);
                                                             ^
                                                 &           [
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^
build.c:169:48: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
                sprintf(szFormat, O_CLRK_FMT, 9, HUGE_FORMAT + 1);
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^~~~~~~~~~~
build.c:169:48: note: use array indexing to silence this warning
                sprintf(szFormat, O_CLRK_FMT, 9, HUGE_FORMAT + 1);
                                                             ^
                                                 &           [
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^
build.c:283:47: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
                sprintf(szFormat, P_MFG_FMT, 1, HUGE_FORMAT + 1);
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^~~~~~~~~~~
build.c:283:47: note: use array indexing to silence this warning
                sprintf(szFormat, P_MFG_FMT, 1, HUGE_FORMAT + 1);
                                                            ^
                                                &           [
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^
build.c:284:53: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
                sprintf(szBrandFormat, P_BRND_FMT, 2, HUGE_FORMAT + 1);
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^~~~~~~~~~~
build.c:284:53: note: use array indexing to silence this warning
                sprintf(szBrandFormat, P_BRND_FMT, 2, HUGE_FORMAT + 1);
                                                                  ^
                                                      &           [
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^
build.c:322:48: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
                sprintf(szFormat, S_NAME_FMT, 9, HUGE_FORMAT + 1);
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^~~~~~~~~~~
build.c:322:48: note: use array indexing to silence this warning
                sprintf(szFormat, S_NAME_FMT, 9, HUGE_FORMAT + 1);
                                                             ^
                                                 &           [
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
  __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                                                       ^
5 warnings generated.
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o driver.o driver.c
driver.c:325:4: warning: add explicit braces to avoid dangling else [-Wdangling-else]
                        else
                        ^
driver.c:335:23: warning: passing arguments to a function without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
                                tdefs[tnum].loader(&o, upd_num);
                                                  ^
driver.c:340:23: warning: passing arguments to a function without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
                                tdefs[tnum].loader(&supp, upd_num);
                                                  ^
driver.c:345:23: warning: passing arguments to a function without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
                                tdefs[tnum].loader(&cust, upd_num);
                                                  ^
driver.c:352:23: warning: passing arguments to a function without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
                                tdefs[tnum].loader(&part, upd_num);
                                                  ^
driver.c:357:23: warning: passing arguments to a function without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
                                tdefs[tnum].loader(&code, 0);
                                                  ^
driver.c:362:23: warning: passing arguments to a function without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
                                tdefs[tnum].loader(&code, 0);
                                                  ^
driver.c:368:68: warning: format specifies type 'long' but the argument has type 'long long' [-Wformat]
                        printf("\nSeeds for %s at rowcount %ld\n", tdefs[tnum].comment, i);
                                                           ~~~                          ^
                                                           %lld
8 warnings generated.
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o bm_utils.o bm_utils.c
bm_utils.c:71:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
         ^~~~~~~~~~
1 error generated.
make: *** [bm_utils.o] Error 1

Solution

cd
git clone https://github.com/stevenchen3/tpch-osx.git
cd tpch-osx/dbgen
make
cd tpch-poc-0.1.2/benchmark/thirdparty/tpch-dbgen
mv dbgen dbgen.bkp
ln -s ~/tpch-osx/dbgen/dbgen dbgen

clean-tpch.sh Sed does work on MacOS

./bin/gen_data/gen-tpch.sh 10 data_10

[INFO] gen 10GB data under /Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10
[INFO] generate data...
[INFO] gen data of table: customer
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: lineitem
[INFO] gen <1>th part data of table: lineitem
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen <2>th part data of table: lineitem
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: nation
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: orders
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: parts
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: partsupp
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: region
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: suppliers
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] refine the data in /Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/customer.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/lineitem.tbl.1
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/lineitem.tbl.2
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/nation.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/orders.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/part.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/partsupp.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/region.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
[INFO] sed file:/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/supplier.tbl
sed: 1: "/Users/morenogarcia/tpc ...": invalid command code m
235M	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/customer.tbl
3.6G	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/lineitem.tbl.1
3.6G	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/lineitem.tbl.2
4.0K	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/nation.tbl
1.6G	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/orders.tbl
241M	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/part.tbl
1.1G	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/partsupp.tbl
4.0K	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/region.tbl
 14M	/Users/morenogarcia/tpch-poc-0.1.2/benchmark/data_10/supplier.tbl
[INFO] Data generation completed.

Solution

Funny enough you can use sed to fix the script:

cd tpch-poc-0.1.2/benchmark/bin/gen_data
sed -i '' 's/-i/-i \x27\x27/g' bin/gen_data/clean-tpch.sh
cd ~/tpch-poc-0.1.2/benchmark
./bin/gen_data/clean-tpch.sh data_10
[INFO] sed file:data_10/customer.tbl
[INFO] sed file:data_10/lineitem.tbl.1
[INFO] sed file:data_10/lineitem.tbl.2
[INFO] sed file:data_10/nation.tbl
[INFO] sed file:data_10/orders.tbl
[INFO] sed file:data_10/part.tbl
[INFO] sed file:data_10/partsupp.tbl
[INFO] sed file:data_10/region.tbl
[INFO] sed file:data_10/supplier.tbl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment