Skip to content

Instantly share code, notes, and snippets.

@holtzermann17
Last active April 6, 2021 14:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save holtzermann17/a7c2f5d2ec97a82bb5b93d816debab2c to your computer and use it in GitHub Desktop.
Save holtzermann17/a7c2f5d2ec97a82bb5b93d816debab2c to your computer and use it in GitHub Desktop.
Public summary of NNexus Revolutions proposal

NNexus Revolutions: Neural Named Entity Recognition and Linking for Technical Topics

#1. Review the intention: what do we expect to learn or make together?

The mathematical sciences have progressed to the point where mathematics communication itself can be fruitfully treated as “big data”.

The scale and diversity of the field presently poses many difficulties for researchers who choose to enter a new area of research.

For instance, a physicist moving into mathematical biology will need to learn new concepts, and disentangle techniques from their original intuitions.

Students learning technical topics for the first time can face an even bigger challenge.

The obstacles compound into a well-documented ‘skills gap’.

Technical jobs remain unfilled, while employers are reluctant to hire candidates who have yet to give concrete evidence that they can do the job.

In this six-month project, we use contemporary artificial intelligence (AI) methods to help make technical topics more accessible,

… both to professionals working across the various branches of mathematical sciences, and to students, alike.

#2. Establish what is happening: what and how are we learning?

The NNexus project centres on building computationally salient models of mathematical documents with an approach based on named entity recognition (NER).

We will develop a method for reliably identifying the key concepts (‘named entities’) in mathematics text.

By using the structure of documents written in everyday mathematical language, we can then surface the way in which these concepts relate to each other.

This will provide the foundation for the creation of recommender systems and other software tools that can support learning and research.

#3. What are some different perspectives on what’s happening?

Our approach to NER expoits contemporary neural methods, which we will adapt to our use case and train on a large corpus of technical texts.

Specifically, we will build on the ELECTRA language model from Google Research and the GENRE approach to entity retrieval developed by Facebook, both published in 2020.

Unlike classical approaches based on simple term spotting, neural NER allows us to capture context.

This means that the tools will be able to distinguish different senses of a word, such as “Let G be a group” and “group the numbers in rows”.

Context awareness will also help our tools unwind the content of complex symbolic mathematical expressions, and link symbols to their definitions.

#4. What did we learn or change?

The project’s main objectives contribute to the core goal of using contemporary AI technologies to support learning and research in mathematics.

Surfacing named entities and the connections between them will provide researchers and students with a map, and a way to document their progress.

We will evaluate the system with regard to precision and recall, and assess its usefulness in a study with authors of mathematical preprints.

Lastly, we will co-design a roadmap for further research together with key stakeholders in the domain.

#5. What else should we change going forward?

As science grows and develops further, publishers, universities, and EdTech providers will need to rapidly adapt to a changing landscape.

NNexus will be able to turn documents written in familiar language into graph structures, which will open up a range of new approaches to data analysis and service provision.

By applying cutting edge mathematical, computational, and data analysis techniques to the language of the mathematical sciences, the project will also unlock new approaches to doing science.

Specifically, we expect that the knowledge graphs we extract will provide a basis for training more advanced AI systems in the mathematics domain.

The mathematical sciences have progressed to the point where
mathematics communication itself can be fruitfully treated as "big
data". The scale and diversity of the field presently poses many
difficulties for researchers who choose to enter a new area of
research. For instance, a physicist moving into mathematical biology
will need to learn new concepts, and disentangle techniques from their
original intuitions. Students learning technical topics for the first
time can face an even bigger challenge. The obstacles compound into a
well-documented ‘skills gap’. Technical jobs remain unfilled, while
employers are reluctant to hire candidates who have yet to give
concrete evidence that they can do the job. In this six-month project,
we use contemporary artificial intelligence (AI) methods to help make
technical topics more accessible, both to professionals working across
the various branches of mathematical sciences, and to students, alike.
The NNexus project centres on building computationally salient models
of mathematical documents with an approach based on named entity
recognition (NER). We will develop a method for reliably identifying
the key concepts (‘named entities’) in mathematics text. By using the
structure of documents written in everyday mathematical language, we
can then surface the way in which these concepts relate to each
other. This will provide the foundation for the creation of
recommender systems and other software tools that can support learning
and research.
Our approach to NER expoits contemporary neural methods, which we will
adapt to our use case and train on a large corpus of technical
texts. Specifically, we will build on the ELECTRA language model from
Google Research and the GENRE approach to entity retrieval developed
by Facebook, both published in 2020. Unlike classical approaches based
on simple term spotting, neural NER allows us to capture context. This
means that the tools will be able to distinguish different senses of a
word, such as "Let G be a group" and "group the numbers in
rows". Context awareness will also help our tools unwind the content
of complex symbolic mathematical expressions, and link symbols to
their definitions.
The project’s main objectives contribute to the core goal of using
contemporary AI technologies to support learning and research in
mathematics. Surfacing named entities and the connections between them
will provide researchers and students with a map, and a way to
document their progress. We will evaluate the system with regard to
precision and recall, and assess its usefulness in a study with
authors of mathematical preprints. Lastly, we will co-design a roadmap
for further research together with key stakeholders in the domain.
As science grows and develops further, publishers, universities, and
EdTech providers will need to rapidly adapt to a changing
landscape. NNexus will be able to turn documents written in familiar
language into graph structures, which will open up a range of new
approaches to data analysis and service provision. By applying cutting
edge mathematical, computational, and data analysis techniques to the
language of the mathematical sciences, the project will also unlock
new approaches to doing science. Specifically, we expect that the
knowledge graphs we extract will provide a basis for training more
advanced AI systems in the mathematics domain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment