-
-
Save Krowemoh/380790a36db883da435c839f336ee92f to your computer and use it in GitHub Desktop.
| * | |
| EQU TRUE TO 1 | |
| EQU FALSE TO 0 | |
| * | |
| EQU ENV.SIZE TO 10000 | |
| DIM ENV(ENV.SIZE) | |
| * | |
| MAT ENV = '' | |
| * | |
| MAP.CACHE = '' | |
| * | |
| FILE.NAME = 'measurements-small.txt' | |
| FILE.NAME = 'measurements-med.txt' | |
| * | |
| PATH = '/home/transfer/' : FILE.NAME | |
| * | |
| GOSUB SETUP | |
| * | |
| GOSUB LOAD.DATA | |
| * | |
| *TEMP GOSUB CALCULATE | |
| * | |
| STOP | |
| * | |
| ********************* S U B R O U T I N E ********************* | |
| * | |
| SETUP:NULL | |
| * | |
| EXECUTE 'DELETE-FILE STATION-FILE' | |
| EXECUTE 'CREATE-FILE STATION-FILE 11,1 415,53,16 MINIMUM.MODULUS 415' | |
| * | |
| OPEN '','STATION-FILE' TO STATION.FILE ELSE | |
| PRINT 'Unable to open file: STATION-FILE' | |
| STOP | |
| END | |
| * | |
| OPEN 'DICT','STATION-FILE' TO DICT.STATION.FILE ELSE | |
| PRINT 'Unable to open file: DICT STATION-FILE' | |
| STOP | |
| END | |
| * | |
| D = '' | |
| D<1> = 'A' | |
| D<2> = 1 | |
| D<3> = 'TEMPERATURE' | |
| D<9> = 'R' | |
| D<10> = 25 | |
| * | |
| WRITE D ON DICT.STATION.FILE,'TEMPERATURE' | |
| * | |
| RETURN | |
| * | |
| ********************* S U B R O U T I N E ********************* | |
| * | |
| LOAD.DATA:NULL | |
| * | |
| T1 = TIME() | |
| * | |
| OPENSEQ PATH TO FLAT.FILE ELSE | |
| PRINT 'Unable to open: ' : PATH : | |
| INPUT ANYTHING : | |
| STOP | |
| END | |
| * | |
| DONE = FALSE | |
| CTR = 0 | |
| * | |
| STATION.NAMES = '' | |
| * | |
| READBLK RAW.DATA FROM FLAT.FILE, 2000000000 THEN | |
| RELEASE FLAT.FILE | |
| CLOSESEQ FLAT.FILE | |
| END | |
| * | |
| CONVERT CHAR(10):';' TO @AM:@VM IN RAW.DATA | |
| * | |
| LOOP | |
| CTR = CTR + 1 | |
| RAW.LINE = RAW.DATA<CTR> | |
| UNTIL RAW.LINE = '' DO | |
| * | |
| ITEM.ID = RAW.LINE<1,1> | |
| TEMP = RAW.LINE<1,2> | |
| * | |
| IF MOD(CTR,100000) = 0 THEN | |
| PRINT CTR : ': ' : TIME() - T1 : 's' | |
| END | |
| * | |
| MAP.KEY = ITEM.ID | |
| * | |
| MAP.KEY.LEN = LEN(MAP.KEY) | |
| MAP.HASH = 0 | |
| FOR MAP.KEY.CTR = 1 TO MAP.KEY.LEN | |
| MAP.HASH += SEQ(MAP.KEY[MAP.KEY.CTR,1]) * MAP.KEY.CTR | |
| NEXT MAP.KEY.CTR | |
| MAP.POS = MOD(MAP.HASH,ENV.SIZE) | |
| MAP.VALUE = ENV(MAP.POS) | |
| * | |
| IF MAP.VALUE = '' THEN | |
| STATION.NAMES<-1> = ITEM.ID | |
| END | |
| * | |
| ENV(MAP.POS)<-1> = TEMP | |
| REPEAT | |
| * | |
| NUMBER.OF.STATIONS = DCOUNT(STATION.NAMES,@AM) | |
| * | |
| FOR I = 1 TO NUMBER.OF.STATIONS | |
| MAP.KEY = STATION.NAMES<I> | |
| * | |
| MAP.KEY.LEN = LEN(MAP.KEY) | |
| MAP.HASH = 0 | |
| FOR MAP.KEY.CTR = 1 TO MAP.KEY.LEN | |
| MAP.HASH += SEQ(MAP.KEY[MAP.KEY.CTR,1]) * MAP.KEY.CTR | |
| NEXT MAP.KEY.CTR | |
| MAP.POS = MOD(MAP.HASH,ENV.SIZE) | |
| MAP.VALUE = ENV(MAP.POS) | |
| * | |
| WRITE LOWER(MAP.VALUE) ON STATION.FILE,MAP.KEY | |
| NEXT I | |
| * | |
| T2 = TIME() | |
| * | |
| PRINT 'Loading: ' : T2 - T1 : 's' | |
| * | |
| RETURN | |
| * | |
| ********************* S U B R O U T I N E ********************* | |
| * | |
| CALCULATE:NULL | |
| * | |
| T1 = TIME() | |
| * | |
| CMD = 'LIST STATION-FILE MIN TEMPERATURE AVERAGE TEMPERATURE MAX TEMPERATURE BREAK-ON @ID BY @ID (D' | |
| EXECUTE CMD CAPTURING RESULTS | |
| * | |
| T2 = TIME() | |
| * | |
| PRINT 'Calculate: ' : T2 - T1 : 's' | |
| * | |
| RETURN | |
| * | |
| * END OF PROGRAM | |
| * | |
| END | |
| * |
Load Considerations: 1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant 2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step Reporting Considerations: 1.) Is the file type & modulo optimal since it plays a big role in access times? 2.) Of course listing the file BY @id causes a sort which extends runtime
This is in regards to Revision 1 of the 1BRC.
I'll give it a shot skipping the lock and using the concat. :)
I did set the file type and modulo based on the hash help of the final file and it does seem to help but not enough. There might be a way to do the listing faster maybe if the mins,maxes and averages are calculated during a build.index type step.
Load Considerations: 1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant 2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step Reporting Considerations: 1.) Is the file type & modulo optimal since it plays a big role in access times? 2.) Of course listing the file BY @id causes a sort which extends runtime
Skipping the lock did help a little bit.
Concatenating vs apppending via <1,-1> seems to be basically the same. It might be under the hood -1 and concatenating are equivalent. That would be interesting to test.
Load Considerations:
1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant
2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step
Reporting Considerations:
1.) Is the file type & modulo optimal since it plays a big role in access times?
2.) Of course listing the file BY @id causes a sort which extends runtime