Skip to content

Instantly share code, notes, and snippets.

Last active September 16, 2023 15:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JayGwod/7feaa84b5364d8f51f35c25bcb0569e1 to your computer and use it in GitHub Desktop.
Save JayGwod/7feaa84b5364d8f51f35c25bcb0569e1 to your computer and use it in GitHub Desktop.
[Research tools]Some useful apps and websites, including literature management using #zotero, #keras, #jupyter.




Zotero 的安装及使用

Zotero 的相关插件安装及使用





  1. 在新项目(写一篇综述,开始一个新课题或者完成一份大作业)开始之前,在Zotero中根据不同项目,建立一个新文件夹。
  2. 一个项目刚开始的时候,文件夹中没有文献,就需要先建立一份本地的文献原始积累。根据关键词去谷歌学术找所需要的文献,Zotero(需要配合ZotFile插件)会自动下载和提取文献信息,按照文献的作者,发表年份,和文章题目将文献重命名,并将该文献自动整理到指定文件夹中,完成文献的原始积累。
  3. 将这个文件夹中所有文献用云盘同步。之后要看文献时,只从这个文件夹中打开文献。这样做的好处就是,把所有的文献和笔记信息全部集中化了,不会造成信息碎片。



来源:如何总结和整理学术文献? - nerfing的回答 - 知乎


In this video, Prof. Pete Carr (faculty member at the University of Minnesota, Department of Chemistry) shares an algorithm to read a scientific paper more efficiently.

Structure of a Jounal Article

  1. Title
  2. Keywords
  3. Abstract
  4. Introduction
  5. Experimrntal
  6. Results and Disscussion
    1. tables
    2. figures
  7. Summary/Conclusion
  8. References

One might start reading the paper in the order in which it is written, for example, title, abstract, introduction, etc., however, there is a more efficient method to extract the most information from the article, in the least amount of time.

Phase 1: Survey the Article

Feel free to stop reading the article at any point.

  1. Read the title and keywords (these are probably what got you to look at the paper)
  2. Read the absrtract.
  3. Read the conclusions.

Phase 2: Read the Article

  • Look at the tables and figures (including captions).
    • This is really what was done in the work. This does not take much time so it is worth looking at before really getting into the details which will slow down the reading.
  • Read the introduction.
    • This is the background needed and why the study was done.
  • Read the results and discussion.
    • This is the heart of the paper.
  • Read the experimental.
    • This is how they did the work. You get to this point if you are really interested and need to understand exactly what was done to better understand the meaning of the data and its interpretation.


Write some notes so you don't have to read the whole paper again.

Citations in the Jupyter Notebook

python3 -m pip install cite2c
python3 -m cite2c.install
# Start/Restart the Notebook server





  • Could be someone else’s code... as long as you can read it
  • Even better if this code already modularizes what you want to change
    • On the other hand: Re-implementing a SOTA baseline is incredibly helpful for understanding what’s going on, and where some decisions might have been made better
  • Just go fast and find something that works, then go back and refactor (if you made something useful)



  • Meaningful names
  • Shape comments on tensors
  • Comments describing non-obvious logic


  • A test that checks experimental behavior is a waste of time
  • But, some parts of your code aren’t experimental
    • Makes sure data processing works consistently, that tensor operations run, gradients are non-zero
    • Ensure models can train, save and load
    • Run on small test fixtures, so debugging cycle is seconds, not minutes




核心:确保实验的正确性(correctness)和可重复性( reproducibility)


├── Makefile           <- Makefile with commands like `make data` or `make train`
├──          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
├── docs               <- A default Sphinx project; see for details
├── models             <- Trained and serialized models, model predictions, or model summaries
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
├──           <- Make this project pip installable with `pip install -e`
├── src                <- Source code for use in this project.
│   ├──    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └──
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └──
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├──
│   │   └──
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └──
└── tox.ini            <- tox file with settings for running tox; see

来源: Cookiecutter Data Science


  1. 永远不要编辑你的原始数据,尤其是不要手工编辑,也不要用Excel。不要覆写原始数据,也不要存多版本。把数据(及其格式)看成是不可变的。任何人都能复现最终的结果,只用src里的源代码和data/raw里的数据。
  2. 因为数据是不可变的,所以不需要对数据做版本控制。如果数据量较小,可以把数据放到代码仓库里。存储或同步大数据可以用AWS S3的同步工具。
  3. Currently by default, we ask for an S3 bucket and use AWS CLI to sync data in the data folder with the server.

用 Notebooks 探索和交流

When we use notebooks in our work, we often subdivide the notebooks folder. For example, notebooks/exploratory contains initial explorations, whereas notebooks/reports is more polished work that can be exported as html to the reports directory.

There are two steps we recommend for using notebooks effectively:

  1. Follow a naming convention that shows the owner and the order the analysis was done in. We use the format <step>-<ghuser>-<description>.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb).
  2. Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/ and load data from data/interim. If it's useful utility code, refactor it to src.
# OPTIONAL: Load the "autoreload" extension so that code can change
%load_ext autoreload

# OPTIONAL: always reload modules so that as you change code in src, it gets loaded
%autoreload 2

from import make_dataset

Keep secrets and configuration out of version control

Create a .env file in the project root folder. Thanks to the .gitignore, this file should never get committed into the version control repository. Here's an example:

# example .env file

Use a package to load these variables automatically. Here's an example snippet adapted from the python-dotenv documentation:

# src/data/
import os
from dotenv import load_dotenv, find_dotenv

# find .env automagically by walking up directories until it's found
dotenv_path = find_dotenv()

# load up the entries as environment variables

database_url = os.environ.get("DATABASE_URL")
other_variable = os.environ.get("OTHER_VARIABLE")

Keep track of what you ran

  1. 重点是记下每次实验的哈希码,方便复现;
  2. 受控实验,每次只改一件事;
  3. 用配置文件跟踪每次的改变。



Sharing Your Research



In this video, Prof. Carr (faculty member at the University of Minnesota, Department of Chemistry) is explaining the Algorithm of writing a paper in a weekend.


  • Review and Renew Your Literature Search.
  • Determine Who Your Audience Is.
    • What kind of paper is it - research, review, tutorial.
    • What journal is it intended for.
    • Undergraduates, researchers, but always reviewers.

The Big Picture

Writing the initial draft is the creative part of the job. Resist the tempatation to correct and edit as you go. You job now is the produce a complete first draft.

The "Algorithm"

  • Just get started don't procrastinate.
  • Create an outline by making a list of all your figures and tables. Put them in order of presentation as they may appear in the results and discussion. Always work from an outline. If you have to stop you can easily pick up the writing later. You have the data so this part is easy.
  • Do not write the introduction now. It is the hardest part of the part to write. Again it could be a waste of time to write it now.
  • Begin with the experimental section. It is the easiest part to write and getting it done will give you a feeling of progress.
  • Now write the results and discussion following the outline.
  • Then comes the really hard part - critical editing where you make sure that the English is concise and coherent, and the science is correct.
  • Write the conclusions. I like a numbered format.
    • 1...
    • 2...
  • And you write the "Abstract" and "Acknowledgements" after the "Conclusions"!
  • Now we have to do the introduction, and there are two very important things that need to be covered in the introduction:
    • Why was the study done? What is its purpose?
    • You've got to collect the relevant essential background information and put that together in the introduction.
  • The very last step is producing the references for the paper. It's a good idea to write some notes as you go through the first draft and manuscript indicating what references might be needed, what they would be about. But not to stop and collect the references at that time.

A few final words

Reading maketh a full man, conference a ready man and writing an exact man. - Sir Francis Bacon

  1. Writing is the most exacting part of what we do as a scientist.
  2. Always review the manuscript requirements for the journal of interest.
  3. A wonderful short paper by Professor Royce Murray in mock style tells you the worst things you can do in writing a manuscript:
    1. Never explain the objectives of the paper in a single sentence or paragraph and in particular never at the beginning of the paper.
    2. Similarly, never describe the experiment(s) in a single sentence or paragraph and never at the beginning. Instead, to enhance the reader’s pleasure of discovery, treat your experiment as a mystery, in which you divulge one essential detail on this page and a hint of one on the next and complete the last details only after a few results have been presented. It’s also really fun to divulge the reason that the experiment should successfully provide the information sought only at the very end of the paper, as any good mystery writer would do.
    3. Diagrams are worth a thousand words, so in the interest of writing a concise paper, omit all words that explain the diagram, including labels. Let the reader use his/her fertile imagination.
    4. Great writers invent abbreviations for complex topics, which also saves a lot of words. Really short abbreviations should be used for very complex topics, and more complicated ones for simple ideas.
    5. In referring to the previous literature, be careful to cite only the papers that make claims that would support your own, especially those that contain little evidence for the claim, so that your paper shines in comparison.
    6. It should be anathema to use any original phrasing or humor in your language, so as to adhere to the principle that scientific writing must be stiff and formal and without personality.
    7. Your readers are intelligent folks, so don’t bother to explain your reasoning in the interpretation of the results. Especially don’t bother to point out their impact on or consistency with other authors’ results and interpretation, so that your paper can be an island of original thinking.

Recommended References and Reading

  • W.Strunk and E.B. White., "The Elements of Style".
  • ACS Author's Guide.
  • R.W.Murray, Anal. Chem. 2011, 83, 633 "Skillful Writing of an Awful Research Paper". Seven Rules to Follow.
  • R. Schoenfeld, "The Chemist's English".
  • A. Eisenberg, "Effective Technical Communication".
  • P.T. O'Connerr "Words Fail Me".
  • George M. Whitesides: Whiteside's Group: Writing a Paper (Adv. Mater. 2004, 16, 1375.)




来源:没有导师的指导,研究生如何阅读文献、提出创见、写论文? - 王鸿伟的回答 - 知乎


  1. Create a notebook with some content!
  2. optionally create a .bib file and external images
  3. Adjust the notebook and cell metadata.
  4. install ipypublish and run the nbpublish for either the specific notebook, or a folder containing multiple notebooks.
  5. A converted folder will be created, into which final .tex .pdf and .html files will be output, named by the notebook or folder input

来源:A workflow for creating and editing publication ready scientific reports and presentations, from one or more Jupyter Notebooks, without leaving the browser!



来源:美国Kent 州立大学的研究调查:“科学上高效学习法只有2个”,附英文论文链接 - 王俊的文章 - 知乎






另外一个最基本的训练,就是平时不管你写一万字、三万字还是五万字都要养成遵照学术规范的习惯,要让它自然天成,就是说你论文的脚注、格式,在一开始进入研究生的阶段就要培养成为你生命中的一个部分。如果这个习惯没有养成,人家就会觉得这个论文不严谨,而且之后修改也要花很多时间,因为你的论文规模很大,可能几百页,如果一开始弄错了,后来再从头改到尾,一定很耗时费力,因此要在一开始就养成习惯。因为我们是在写论文而不是在写散文,哪一个逗点应该在哪里、哪一个书名号该在哪里、哪一个地方要用引号、哪一个要什么标点符号,都有一定的规定。用中文写还好,用英文有一大堆简称。在1960年代台湾知识还很封闭的时候,有一个人从美国回来就说:“美国有个不得了的情形,因为有一个人非常不得了。”有人问他为什么不得了,他说:“因为这个人的作品到处被引用。”他的名字就叫ibid。所谓ibid就是同前作者,这个字是从拉丁文发展出来的,拉丁文有一大堆简称,像就是两人共同编的。英文有一本The Chicago Manual of Style就是专门说明这一些写作规范。各位要尽早学会中英文的写作规范,慢慢练习,最后随性下笔,就能写出符合规范的文章。



来源:没有导师的指导,研究生如何阅读文献、提出创见、写论文? - 社会科学文献出版社的回答 - 知乎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment