Skip to content

Instantly share code, notes, and snippets.

@greglandrum
Created May 19, 2021 07:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save greglandrum/ad6ec0e9bc3272cd891319f1d81b2686 to your computer and use it in GitHub Desktop.
Save greglandrum/ad6ec0e9bc3272cd891319f1d81b2686 to your computer and use it in GitHub Desktop.
SMILES atom regex.ipynb
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@adalke
Copy link

adalke commented Oct 15, 2021

This regex doesn't handle * terms, and it interprets 'Cc' as a single atom term, rather than the two atoms terms 'C' and 'c'.

Here's an alternative version which handles both these cases, written using re's "verbose" notation:

atom_finder = re.compile(r"""
(
 Cl? |             # Cl and Br are part of the organic subset
 Br? |
 [NOSPFIbcnosp*] | # as are these single-letter elements
 \[[^]]+\]         # everything else must be in []s
)
""", re.X)

@greglandrum
Copy link
Author

Thanks Andrew!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment