Skip to content

Instantly share code, notes, and snippets.

@louismullie
Created November 29, 2022 02:17
Show Gist options
  • Save louismullie/d7ca229a18d0ca45544c87acba9f0e8a to your computer and use it in GitHub Desktop.
Save louismullie/d7ca229a18d0ca45544c87acba9f0e8a to your computer and use it in GitHub Desktop.
novo_test = pd.read_csv("test.csv")
novo_test
df_cavity = pd.read_csv('/content/gdrive/MyDrive/Kaggle/cavity_pred_CUSTOM_A.csv', low_memory=False)
df_cavity = df_cavity.rename(columns={'variant': 'mutation_key'})
df_cavity
novo_test2 = novo_test.copy().rename({'protein_sequence': 'mutant_seq', 'seq_id': 'source_df_id'}, axis = 1)
novo_test2['sequence'] = 'VPVNPEPDATSVENVALKTGSGDSQSDPIKADLEVKGQSALPFDVDCWAILCKGAPNVLQRVNEKTKNSNRDRSGANKGPFKDPQKWGIKALPPKNPSWSAQDFKSPEEYAFASSLQGGTNAILAPVNLASQNSQGGVLNGFYSANKVAQFDPSKPQQTKGTWFQITKFTGAAGPYCKALGSNDKSVCDKNKNIAGDWGFDPAKWAYQYDEKNNKFNYVGK'
novo_test2 = novo_test2.apply(find_mut,axis=1)
novo_test2 = novo_test2.join(df_cavity.set_index('mutation_key'), on='mutation_key')
novo_test2['scores'] = -novo_test2['score_ml_fermi']
novo_test2.loc[novo_test['scores'].isna(), 'scores'] = novo_test2.loc[~novo_test2['scores'].isna()].quantile(q=0.25)['scores']
novo_test2['scores_rank'] = rankdata(novo_test2['scores'])
submission_rosetta_scores2 = novo_test2[['source_df_id','scores_rank', 'scores']]
submission_rosetta_scores2 = submission_rosetta_scores2.rename({'source_df_id': 'seq_id', 'scores_rank': 'tm'}, axis = 1)
submission_rosetta_scores2.to_csv('submission_rosetta_scores', index=False)
submission_rosetta_scores2
preds2 = submission_rosetta_scores2.tm.values
pY2 = pd.DataFrame(preds2, index=range(31390,len(preds)+31390), columns=['tm'])
pY2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment