-
-
Save Krowemoh/380790a36db883da435c839f336ee92f to your computer and use it in GitHub Desktop.
* | |
EQU TRUE TO 1 | |
EQU FALSE TO 0 | |
* | |
EQU ENV.SIZE TO 10000 | |
DIM ENV(ENV.SIZE) | |
* | |
MAT ENV = '' | |
* | |
MAP.CACHE = '' | |
* | |
FILE.NAME = 'measurements-small.txt' | |
FILE.NAME = 'measurements-med.txt' | |
* | |
PATH = '/home/transfer/' : FILE.NAME | |
* | |
GOSUB SETUP | |
* | |
GOSUB LOAD.DATA | |
* | |
*TEMP GOSUB CALCULATE | |
* | |
STOP | |
* | |
********************* S U B R O U T I N E ********************* | |
* | |
SETUP:NULL | |
* | |
EXECUTE 'DELETE-FILE STATION-FILE' | |
EXECUTE 'CREATE-FILE STATION-FILE 11,1 415,53,16 MINIMUM.MODULUS 415' | |
* | |
OPEN '','STATION-FILE' TO STATION.FILE ELSE | |
PRINT 'Unable to open file: STATION-FILE' | |
STOP | |
END | |
* | |
OPEN 'DICT','STATION-FILE' TO DICT.STATION.FILE ELSE | |
PRINT 'Unable to open file: DICT STATION-FILE' | |
STOP | |
END | |
* | |
D = '' | |
D<1> = 'A' | |
D<2> = 1 | |
D<3> = 'TEMPERATURE' | |
D<9> = 'R' | |
D<10> = 25 | |
* | |
WRITE D ON DICT.STATION.FILE,'TEMPERATURE' | |
* | |
RETURN | |
* | |
********************* S U B R O U T I N E ********************* | |
* | |
LOAD.DATA:NULL | |
* | |
T1 = TIME() | |
* | |
OPENSEQ PATH TO FLAT.FILE ELSE | |
PRINT 'Unable to open: ' : PATH : | |
INPUT ANYTHING : | |
STOP | |
END | |
* | |
DONE = FALSE | |
CTR = 0 | |
* | |
STATION.NAMES = '' | |
* | |
READBLK RAW.DATA FROM FLAT.FILE, 2000000000 THEN | |
RELEASE FLAT.FILE | |
CLOSESEQ FLAT.FILE | |
END | |
* | |
CONVERT CHAR(10):';' TO @AM:@VM IN RAW.DATA | |
* | |
LOOP | |
CTR = CTR + 1 | |
RAW.LINE = RAW.DATA<CTR> | |
UNTIL RAW.LINE = '' DO | |
* | |
ITEM.ID = RAW.LINE<1,1> | |
TEMP = RAW.LINE<1,2> | |
* | |
IF MOD(CTR,100000) = 0 THEN | |
PRINT CTR : ': ' : TIME() - T1 : 's' | |
END | |
* | |
MAP.KEY = ITEM.ID | |
* | |
MAP.KEY.LEN = LEN(MAP.KEY) | |
MAP.HASH = 0 | |
FOR MAP.KEY.CTR = 1 TO MAP.KEY.LEN | |
MAP.HASH += SEQ(MAP.KEY[MAP.KEY.CTR,1]) * MAP.KEY.CTR | |
NEXT MAP.KEY.CTR | |
MAP.POS = MOD(MAP.HASH,ENV.SIZE) | |
MAP.VALUE = ENV(MAP.POS) | |
* | |
IF MAP.VALUE = '' THEN | |
STATION.NAMES<-1> = ITEM.ID | |
END | |
* | |
ENV(MAP.POS)<-1> = TEMP | |
REPEAT | |
* | |
NUMBER.OF.STATIONS = DCOUNT(STATION.NAMES,@AM) | |
* | |
FOR I = 1 TO NUMBER.OF.STATIONS | |
MAP.KEY = STATION.NAMES<I> | |
* | |
MAP.KEY.LEN = LEN(MAP.KEY) | |
MAP.HASH = 0 | |
FOR MAP.KEY.CTR = 1 TO MAP.KEY.LEN | |
MAP.HASH += SEQ(MAP.KEY[MAP.KEY.CTR,1]) * MAP.KEY.CTR | |
NEXT MAP.KEY.CTR | |
MAP.POS = MOD(MAP.HASH,ENV.SIZE) | |
MAP.VALUE = ENV(MAP.POS) | |
* | |
WRITE LOWER(MAP.VALUE) ON STATION.FILE,MAP.KEY | |
NEXT I | |
* | |
T2 = TIME() | |
* | |
PRINT 'Loading: ' : T2 - T1 : 's' | |
* | |
RETURN | |
* | |
********************* S U B R O U T I N E ********************* | |
* | |
CALCULATE:NULL | |
* | |
T1 = TIME() | |
* | |
CMD = 'LIST STATION-FILE MIN TEMPERATURE AVERAGE TEMPERATURE MAX TEMPERATURE BREAK-ON @ID BY @ID (D' | |
EXECUTE CMD CAPTURING RESULTS | |
* | |
T2 = TIME() | |
* | |
PRINT 'Calculate: ' : T2 - T1 : 's' | |
* | |
RETURN | |
* | |
* END OF PROGRAM | |
* | |
END | |
* |
Load Considerations: 1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant 2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step Reporting Considerations: 1.) Is the file type & modulo optimal since it plays a big role in access times? 2.) Of course listing the file BY @id causes a sort which extends runtime
This is in regards to Revision 1 of the 1BRC.
I'll give it a shot skipping the lock and using the concat. :)
I did set the file type and modulo based on the hash help of the final file and it does seem to help but not enough. There might be a way to do the listing faster maybe if the mins,maxes and averages are calculated during a build.index type step.
Load Considerations: 1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant 2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step Reporting Considerations: 1.) Is the file type & modulo optimal since it plays a big role in access times? 2.) Of course listing the file BY @id causes a sort which extends runtime
Skipping the lock did help a little bit.
Concatenating vs apppending via <1,-1> seems to be basically the same. It might be under the hood -1 and concatenating are equivalent. That would be interesting to test.
Load Considerations:
1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant
2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step
Reporting Considerations:
1.) Is the file type & modulo optimal since it plays a big role in access times?
2.) Of course listing the file BY @id causes a sort which extends runtime