Skip to content

Instantly share code, notes, and snippets.

@sshleifer
Created June 7, 2018 17:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sshleifer/ec3a1875db44392ed0b15a196d4dc106 to your computer and use it in GitHub Desktop.
Save sshleifer/ec3a1875db44392ed0b15a196d4dc106 to your computer and use it in GitHub Desktop.
Imagerive Notes

WHERE IS THE DATA? SSH into {FIXME} while connected to ImageRive VPN (must be from windows machine) All data is is /merantix_core/data/hospitals/imagerive/export Anonymized reports in reports anonymized_dicoms/ export/cases_new.json export/patients_new.json

Normal Windows VPN connection. IP : 212.243.133.154 Protocol P2TP with IPSec and optional encryption (can also be called L2TP with key) L2TP key : DE70CABBDE7AC31F Login VPN : VPN.TEMP Pwd : ID2018/

SSH 192.168.5.126 merantix merantix

From WIndows

Unfixed: (re) De’cembre (re) fe’vrier New lines in json (make less readable)

Unanswered: What is hospitals/die-radiologie in codebase? How to get the left control key working on Jonas Probst window machine Imagerive Only Notes:

For Mess-Inheriting Developers First of all, do not expect my code to work on the first run. I am sorry. You will need to fix bugs, and hand check the results for at least 45 minutes. This took me like 50h and in hindsight it doesn’t seem that hard but lots of stuff can go wrong at ever turn.

End goal: a directory called anonymized_dicoms/ filled with dycoms that have been anonymized by the code in anonymization.py (This was fairly easy) a directory called anonymized_reports containing .txt files that have been anonymized and (represented communication between doctors in my case) need to be anonymized by adding regexes to Report.process_text Steps and Hacks: Run the dicom_receiver.py (it just sits there waiting for dicoms) and make sure the port is correct This only worked outside of docker for me Somebody will throw a bunch of dicoms at it and it will store them on the file system and make a thing called dicoms.json for you If it runs out of space/breaks it will tell you Run it with tee in tmux and save the logs changing things in directories.py (if want) I made it so that there can be duplicate study_ids (following how Flo stored stuff) Try running python projects/edison/hospitals/imagerive/export_process.py in docker image: Fix errors There will be some MatchingExceptions, I don’t know how to fix these but I am satisfied with 818 matched reports. Feel free to look into them. There will be some empty report warnings, don’t know how to fix those… Some warnings about duplicate study id Make sure the patients.json is >= 4.6 mb , cases.json > 3mb and look at their contents Cat /merantix_core/data/hospitals/telepaxx/export/reports/* > all_txt.txt then open that in vim Names, dates, places, and ages need to be anonymized, as well as references to Imagerive Search for strings like: madame, Madame, ['04/04', '21.12', '21/12', 'HS15', '06', '05', '01', '08','RUE DES MOULINS', 'Gen'] Hack: IR Only: Get_export_case_sort_key # HACK, visitation-pattern maybe misordered, but relative dates should be fine

Tips: If you dont have sudo access give up right now How much space is on the box (df -h) we ran out of space because 4000 dicoms * 50mb/dicom > than the 500 GB we had. You will need to copy each dicom so you need 2x+ as much as the dicoms you have France/Switzerland only: the \xe characters are french accents. They are very annoying. I still dont know how to type them install zsh and tmux on the box your connection will die a lot, and you will need to be comfortable in the shell you may b Don’t modify anything in place! Grep -Ril “madame” reports/* should reveal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment