Skip to content

Instantly share code, notes, and snippets.

@Krowemoh
Last active January 17, 2024 03:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Krowemoh/380790a36db883da435c839f336ee92f to your computer and use it in GitHub Desktop.
Save Krowemoh/380790a36db883da435c839f336ee92f to your computer and use it in GitHub Desktop.
*
EQU TRUE TO 1
EQU FALSE TO 0
*
EQU ENV.SIZE TO 10000
DIM ENV(ENV.SIZE)
*
MAT ENV = ''
*
MAP.CACHE = ''
*
FILE.NAME = 'measurements-small.txt'
FILE.NAME = 'measurements-med.txt'
*
PATH = '/home/transfer/' : FILE.NAME
*
GOSUB SETUP
*
GOSUB LOAD.DATA
*
*TEMP GOSUB CALCULATE
*
STOP
*
********************* S U B R O U T I N E *********************
*
SETUP:NULL
*
EXECUTE 'DELETE-FILE STATION-FILE'
EXECUTE 'CREATE-FILE STATION-FILE 11,1 415,53,16 MINIMUM.MODULUS 415'
*
OPEN '','STATION-FILE' TO STATION.FILE ELSE
PRINT 'Unable to open file: STATION-FILE'
STOP
END
*
OPEN 'DICT','STATION-FILE' TO DICT.STATION.FILE ELSE
PRINT 'Unable to open file: DICT STATION-FILE'
STOP
END
*
D = ''
D<1> = 'A'
D<2> = 1
D<3> = 'TEMPERATURE'
D<9> = 'R'
D<10> = 25
*
WRITE D ON DICT.STATION.FILE,'TEMPERATURE'
*
RETURN
*
********************* S U B R O U T I N E *********************
*
LOAD.DATA:NULL
*
T1 = TIME()
*
OPENSEQ PATH TO FLAT.FILE ELSE
PRINT 'Unable to open: ' : PATH :
INPUT ANYTHING :
STOP
END
*
DONE = FALSE
CTR = 0
*
STATION.NAMES = ''
*
READBLK RAW.DATA FROM FLAT.FILE, 2000000000 THEN
RELEASE FLAT.FILE
CLOSESEQ FLAT.FILE
END
*
CONVERT CHAR(10):';' TO @AM:@VM IN RAW.DATA
*
LOOP
CTR = CTR + 1
RAW.LINE = RAW.DATA<CTR>
UNTIL RAW.LINE = '' DO
*
ITEM.ID = RAW.LINE<1,1>
TEMP = RAW.LINE<1,2>
*
IF MOD(CTR,100000) = 0 THEN
PRINT CTR : ': ' : TIME() - T1 : 's'
END
*
MAP.KEY = ITEM.ID
*
MAP.KEY.LEN = LEN(MAP.KEY)
MAP.HASH = 0
FOR MAP.KEY.CTR = 1 TO MAP.KEY.LEN
MAP.HASH += SEQ(MAP.KEY[MAP.KEY.CTR,1]) * MAP.KEY.CTR
NEXT MAP.KEY.CTR
MAP.POS = MOD(MAP.HASH,ENV.SIZE)
MAP.VALUE = ENV(MAP.POS)
*
IF MAP.VALUE = '' THEN
STATION.NAMES<-1> = ITEM.ID
END
*
ENV(MAP.POS)<-1> = TEMP
REPEAT
*
NUMBER.OF.STATIONS = DCOUNT(STATION.NAMES,@AM)
*
FOR I = 1 TO NUMBER.OF.STATIONS
MAP.KEY = STATION.NAMES<I>
*
MAP.KEY.LEN = LEN(MAP.KEY)
MAP.HASH = 0
FOR MAP.KEY.CTR = 1 TO MAP.KEY.LEN
MAP.HASH += SEQ(MAP.KEY[MAP.KEY.CTR,1]) * MAP.KEY.CTR
NEXT MAP.KEY.CTR
MAP.POS = MOD(MAP.HASH,ENV.SIZE)
MAP.VALUE = ENV(MAP.POS)
*
WRITE LOWER(MAP.VALUE) ON STATION.FILE,MAP.KEY
NEXT I
*
T2 = TIME()
*
PRINT 'Loading: ' : T2 - T1 : 's'
*
RETURN
*
********************* S U B R O U T I N E *********************
*
CALCULATE:NULL
*
T1 = TIME()
*
CMD = 'LIST STATION-FILE MIN TEMPERATURE AVERAGE TEMPERATURE MAX TEMPERATURE BREAK-ON @ID BY @ID (D'
EXECUTE CMD CAPTURING RESULTS
*
T2 = TIME()
*
PRINT 'Calculate: ' : T2 - T1 : 's'
*
RETURN
*
* END OF PROGRAM
*
END
*
@mgpetryk
Copy link

Load Considerations:
1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant
2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step
Reporting Considerations:
1.) Is the file type & modulo optimal since it plays a big role in access times?
2.) Of course listing the file BY @id causes a sort which extends runtime

@Krowemoh
Copy link
Author

Load Considerations: 1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant 2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step Reporting Considerations: 1.) Is the file type & modulo optimal since it plays a big role in access times? 2.) Of course listing the file BY @id causes a sort which extends runtime

This is in regards to Revision 1 of the 1BRC.

I'll give it a shot skipping the lock and using the concat. :)

I did set the file type and modulo based on the hash help of the final file and it does seem to help but not enough. There might be a way to do the listing faster maybe if the mins,maxes and averages are calculated during a build.index type step.

@Krowemoh
Copy link
Author

Load Considerations: 1.) Why locking with a READU - I'm not sure of the overhead of locking but if there is any multiply it by 1mil and it may be significant 2.) ANYTHING<1,-1> is an insert - consider using just WRITE ANYTHING : @vm:TEMP ON STATION.FILE which is a concatenation which "may" be less overhead and eliminate the insert step Reporting Considerations: 1.) Is the file type & modulo optimal since it plays a big role in access times? 2.) Of course listing the file BY @id causes a sort which extends runtime

Skipping the lock did help a little bit.

Concatenating vs apppending via <1,-1> seems to be basically the same. It might be under the hood -1 and concatenating are equivalent. That would be interesting to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment