holtzermann17/nnexus-revolutions.org

## nnexus-revolutions.org

      
    Raw
  

              nnexus-revolutions.org
            
          
    NNexus Revolutions: Neural Named Entity Recognition and Linking for Technical Topics

#1. Review the intention: what do we expect to learn or make together?

The mathematical sciences have progressed to the point where mathematics communication itself can be fruitfully treated as “big data”.

The scale and diversity of the field presently poses many difficulties for researchers who choose to enter a new area of research.

For instance, a physicist moving into mathematical biology will need to learn new concepts, and disentangle techniques from their original intuitions.

Students learning technical topics for the first time can face an even bigger challenge.

The obstacles compound into a well-documented ‘skills gap’.

Technical jobs remain unfilled, while employers are reluctant to hire candidates who have yet to give concrete evidence that they can do the job.

In this six-month project, we use contemporary artificial intelligence (AI) methods to help make technical topics more accessible,

… both to professionals working across the various branches of mathematical sciences, and to students, alike.

#2. Establish what is happening: what and how are we learning?

The NNexus project centres on building computationally salient models of mathematical documents with an approach based on named entity recognition (NER).

We will develop a method for reliably identifying the key concepts (‘named entities’) in mathematics text.

By using the structure of documents written in everyday mathematical language, we can then surface the way in which these concepts relate to each other.

This will provide the foundation for the creation of recommender systems and other software tools that can support learning and research.

#3. What are some different perspectives on what’s happening?

Our approach to NER expoits contemporary neural methods, which we will adapt to our use case and train on a large corpus of technical texts.

Specifically, we will build on the ELECTRA language model from Google Research and the GENRE approach to entity retrieval developed by Facebook, both published in 2020.

Unlike classical approaches based on simple term spotting, neural NER allows us to capture context.

This means that the tools will be able to distinguish different senses of a word, such as “Let G be a group” and “group the numbers in rows”.

Context awareness will also help our tools unwind the content of complex symbolic mathematical expressions, and link symbols to their definitions.

#4. What did we learn or change?

The project’s main objectives contribute to the core goal of using contemporary AI technologies to support learning and research in mathematics.

Surfacing named entities and the connections between them will provide researchers and students with a map, and a way to document their progress.

We will evaluate the system with regard to precision and recall, and assess its usefulness in a study with authors of mathematical preprints.

Lastly, we will co-design a roadmap for further research together with key stakeholders in the domain.

#5. What else should we change going forward?

As science grows and develops further, publishers, universities, and EdTech providers will need to rapidly adapt to a changing landscape.

NNexus will be able to turn documents written in familiar language into graph structures, which will open up a range of new approaches to data analysis and service provision.

By applying cutting edge mathematical, computational, and data analysis techniques to the language of the mathematical sciences, the project will also unlock new approaches to doing science.

Specifically, we expect that the knowledge graphs we extract will provide a basis for training more advanced AI systems in the mathematics domain.


## nnexus-revolutions.txt
The mathematical sciences have progressed to the point where
mathematics communication itself can be fruitfully treated as "big
data". The scale and diversity of the field presently poses many
difficulties for researchers who choose to enter a new area of
research. For instance, a physicist moving into mathematical biology
will need to learn new concepts, and disentangle techniques from their
original intuitions. Students learning technical topics for the first
time can face an even bigger challenge. The obstacles compound into a
well-documented ‘skills gap’. Technical jobs remain unfilled, while
employers are reluctant to hire candidates who have yet to give
concrete evidence that they can do the job. In this six-month project,
we use contemporary artificial intelligence (AI) methods to help make
technical topics more accessible, both to professionals working across
the various branches of mathematical sciences, and to students, alike.

The NNexus project centres on building computationally salient models
of mathematical documents with an approach based on named entity
recognition (NER). We will develop a method for reliably identifying
the key concepts (‘named entities’) in mathematics text. By using the
structure of documents written in everyday mathematical language, we
can then surface the way in which these concepts relate to each
other. This will provide the foundation for the creation of
recommender systems and other software tools that can support learning
and research.

Our approach to NER expoits contemporary neural methods, which we will
adapt to our use case and train on a large corpus of technical
texts. Specifically, we will build on the ELECTRA language model from
Google Research and the GENRE approach to entity retrieval developed
by Facebook, both published in 2020. Unlike classical approaches based
on simple term spotting, neural NER allows us to capture context. This
means that the tools will be able to distinguish different senses of a
word, such as "Let G be a group" and "group the numbers in
rows". Context awareness will also help our tools unwind the content
of complex symbolic mathematical expressions, and link symbols to
their definitions.

The project’s main objectives contribute to the core goal of using
contemporary AI technologies to support learning and research in
mathematics. Surfacing named entities and the connections between them
will provide researchers and students with a map, and a way to
document their progress. We will evaluate the system with regard to
precision and recall, and assess its usefulness in a study with
authors of mathematical preprints. Lastly, we will co-design a roadmap
for further research together with key stakeholders in the domain.

As science grows and develops further, publishers, universities, and
EdTech providers will need to rapidly adapt to a changing
landscape. NNexus will be able to turn documents written in familiar
language into graph structures, which will open up a range of new
approaches to data analysis and service provision. By applying cutting
edge mathematical, computational, and data analysis techniques to the
language of the mathematical sciences, the project will also unlock
new approaches to doing science. Specifically, we expect that the
knowledge graphs we extract will provide a basis for training more
advanced AI systems in the mathematics domain.
	The mathematical sciences have progressed to the point where
	mathematics communication itself can be fruitfully treated as "big
	data". The scale and diversity of the field presently poses many
	difficulties for researchers who choose to enter a new area of
	research. For instance, a physicist moving into mathematical biology
	will need to learn new concepts, and disentangle techniques from their
	original intuitions. Students learning technical topics for the first
	time can face an even bigger challenge. The obstacles compound into a
	well-documented ‘skills gap’. Technical jobs remain unfilled, while
	employers are reluctant to hire candidates who have yet to give
	concrete evidence that they can do the job. In this six-month project,
	we use contemporary artificial intelligence (AI) methods to help make
	technical topics more accessible, both to professionals working across
	the various branches of mathematical sciences, and to students, alike.

	The NNexus project centres on building computationally salient models
	of mathematical documents with an approach based on named entity
	recognition (NER). We will develop a method for reliably identifying
	the key concepts (‘named entities’) in mathematics text. By using the
	structure of documents written in everyday mathematical language, we
	can then surface the way in which these concepts relate to each
	other. This will provide the foundation for the creation of
	recommender systems and other software tools that can support learning
	and research.

	Our approach to NER expoits contemporary neural methods, which we will
	adapt to our use case and train on a large corpus of technical
	texts. Specifically, we will build on the ELECTRA language model from
	Google Research and the GENRE approach to entity retrieval developed
	by Facebook, both published in 2020. Unlike classical approaches based
	on simple term spotting, neural NER allows us to capture context. This
	means that the tools will be able to distinguish different senses of a
	word, such as "Let G be a group" and "group the numbers in
	rows". Context awareness will also help our tools unwind the content
	of complex symbolic mathematical expressions, and link symbols to
	their definitions.

	The project’s main objectives contribute to the core goal of using
	contemporary AI technologies to support learning and research in
	mathematics. Surfacing named entities and the connections between them
	will provide researchers and students with a map, and a way to
	document their progress. We will evaluate the system with regard to
	precision and recall, and assess its usefulness in a study with
	authors of mathematical preprints. Lastly, we will co-design a roadmap
	for further research together with key stakeholders in the domain.

	As science grows and develops further, publishers, universities, and
	EdTech providers will need to rapidly adapt to a changing
	landscape. NNexus will be able to turn documents written in familiar
	language into graph structures, which will open up a range of new
	approaches to data analysis and service provision. By applying cutting
	edge mathematical, computational, and data analysis techniques to the
	language of the mathematical sciences, the project will also unlock
	new approaches to doing science. Specifically, we expect that the
	knowledge graphs we extract will provide a basis for training more
	advanced AI systems in the mathematics domain.