Skip to content

Instantly share code, notes, and snippets.

@vfscalfani
Created September 5, 2022 12:31
Show Gist options
  • Save vfscalfani/5d3290b1b2d32c837809b9987410732d to your computer and use it in GitHub Desktop.
Save vfscalfani/5d3290b1b2d32c837809b9987410732d to your computer and use it in GitHub Desktop.
PubChem Compound by Create Date

PubChem Compound by Create date field

VF Scalfani

used EDirect v16.5, on September 4, 2022.

create a file with dates

cat dates.txt
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022

get cumulative sum of compounds by year

The awk cumulative sum is printed in column 3.

cat dates.txt | \
while read date
do
  esearch -db pccompound -query "$date[CDAT]" < /dev/null |
  xtract -pattern ENTREZ_DIRECT -lbl "$date" -element Count
  sleep 1
done | \
awk '{cumulativesum += $2; $3 = cumulativesum; print $0}'

2004 4455 4455
2005 5275113 5279568
2006 5084975 10364543
2007 7381646 17746189
2008 1340505 19086694
2009 4769490 23856184
2010 1942029 25798213
2011 2477744 28275957
2012 15259611 43535568
2013 1518105 45053673
2014 6835168 51888841
2015 10386973 62275814
2016 19709326 81985140
2017 5353509 87338649
2018 3049769 90388418
2019 9402978 99791396
2020 3320327 103111723
2021 6263657 109375380
2022 2515482 111890862

plot compounds by year

Of course, we have to add some gnuplot terminal plotting!

cat dates.txt | \
while read date
do
  esearch -db pccompound -query "$date[CDAT]" < /dev/null |
  xtract -pattern ENTREZ_DIRECT -lbl "$date" -element Count
  sleep 1
done | \
awk '{cumulativesum += $2; $3 = cumulativesum; print $0}' | \
gnuplot -e "set term dumb; plot '-' using 1:3 with boxes notitle"


  1.2e+08 +----------------------------------------------------------------+   
          |      +      +       +      +      +      +       +      +    **|   
          |                                                           **** |   
    1e+08 |-+                                                 *********  *-|   
          |                                                   *   *   *  * |   
          |                                            ********   *   *  * |   
          |                                         ****   *  *   *   *  * |   
    8e+07 |-+                                       *  *   *  *   *   *  *-|   
          |                                         *  *   *  *   *   *  * |   
          |                                         *  *   *  *   *   *  * |   
    6e+07 |-+                                   *****  *   *  *   *   *  *-|   
          |                                 *****   *  *   *  *   *   *  * |   
          |                              ****   *   *  *   *  *   *   *  * |   
    4e+07 |-+                        *****  *   *   *  *   *  *   *   *  *-|   
          |                          *   *  *   *   *  *   *  *   *   *  * |   
          |                      *****   *  *   *   *  *   *  *   *   *  * |   
          |               ********   *   *  *   *   *  *   *  *   *   *  * |   
    2e+07 |-+      ********   *  *   *   *  *   *   *  *   *  *   *   *  *-|   
          |    *****   *  *   *  *   *   *  *   *   *  *   *  *   *   *  * |   
          | **** + *   *+ *   * +*   * + *  * + *   *+ *   * +*   * + *  * |   
        0 +----------------------------------------------------------------+   
         2004   2006   2008    2010   2012   2014   2016    2018   2020   2022 
                                                                               

Notes:

This example is part of the EDirectChemInfo repository and is MIT licensed, a copy can by found here: https://github.com/ualibweb/EDirectChemInfo/blob/master/LICENSE

Data output is credited to NCBI and NLM. Please see the NCBI Website and Data Usage Policies and Disclaimers: https://www.ncbi.nlm.nih.gov/home/about/policies/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment