Skip to content

Instantly share code, notes, and snippets.

@kenwebb
Last active September 14, 2018 11:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kenwebb/1384981d4264e9282ec0e6eacaf24d2c to your computer and use it in GitHub Desktop.
Save kenwebb/1384981d4264e9282ec0e6eacaf24d2c to your computer and use it in GitHub Desktop.
Graphs, Rules - Machine Learning
<?xml version="1.0" encoding="UTF-8"?>
<!--Xholon Workbook http://www.primordion.com/Xholon/gwt/ MIT License, Copyright (C) Ken Webb, Fri Sep 14 2018 07:18:12 GMT-0400 (Eastern Daylight Time)-->
<XholonWorkbook>
<Notes><![CDATA[
Xholon
------
Title: Graphs, Rules - Machine Learning
Description:
Url: http://www.primordion.com/Xholon/gwt/
InternalName: 1384981d4264e9282ec0e6eacaf24d2c
Keywords:
My Notes
--------
September 6, 2018
see notes from Sept 5
see my Documents/GraphsRulesML folder
Be able to predict the right-hand side (RHS) from the left-hand side (LHS) in a set of rules, especially where the LHS and RHS are graphs.
Explore what approaches are offered by Machine Learning (ML) and Data Science (DS).
Typically the LHS will be more or less abstract, while the RHS will be more or less concrete.
One Xholon scenario (Island app):
- capture data from multiple human players, where each player is represented in the game by an Avatar
- the data will consist of the Avatar's "before" state (LHS), an action taken, and the Avatar's "after" state (RHS)
- the data will be saved to the in-browser IndexedDB database
- subsequently, I want non-human players to be able to use the captured data to make reasonably intelligent moves in the game space
- they could choose randomly from a list of available rules, or
- they could choose matching rules based on probablities (which RHS was most often chosen by human players), or
- they could apply ML/DS algorithms
References
----------
(1) search: "machine learning" rules
(2) search: "machine learning" graph
(3) https://en.wikipedia.org/wiki/Rule-based_machine_learning
Rule-based machine learning (RBML) is a term in computer science intended to encompass any machine learning method
that identifies, learns, or evolves 'rules' to store, manipulate or apply.
The defining characteristic of a rule-based machine learner is the identification and utilization of a set of relational rules
that collectively represent the knowledge captured by the system.
This is in contrast to other machine learners that commonly identify a singular model that can be universally applied
to any instance in order to make a prediction.
Rule-based machine learning approaches include learning classifier systems, association rule learning, artificial immune systems,
and any other method that relies on a set of rules, each covering contextual knowledge.
While rule-based machine learning is conceptually a type of rule-based system, it is distinct from traditional rule-based systems,
which are often hand-crafted, and other rule-based decision makers.
This is because rule-based machine learning applies some form of learning algorithm to automatically identify useful rules,
rather than a human needing to apply prior domain knowledge to manually construct rules and curate a rule set.
KSW this is probably the iinverse of what I am looking for
(4) http://www.plantcell.org/content/23/9/3101
) http://www.plantcell.org/content/plantcell/23/9/3101.full.pdf
Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets
George W. Bassel, Enrico Glaab, Julietta Marquez, Michael J. Holdsworth, Jaume Bacardit
Published September 2011.
Rule-based learning methods ... where a model is composed by an arbitrary set of decision rules, where each rule specifies a subset of the input space.
An additional benefit of rule-based learning methods is that they produce human-readable rules (Figure 1A),
in contrast with other methods such as nonlinear SVM and artificial neural networks.
Here, we propose a novel approach in the construction of functional networks based on gene expression data, through the use of rule-based ML.
The premise of this approach is that genes present within the same rule that predicts a developmental outcome will have an increased likelihood
of being functionally related in the developmental process in question given their collective ability to act together in generating the prediction.
We term this associative measure “coprediction.”
The use of coprediction enables functional gene associations to be inferred that cannot be detected using coexpression analysis, for example.
Coprediction is not restricted by similarities in expression pattern that are the only measure used by coexpression.
Another difference between these two methods stems from rule-based methods focusing on identifying interactions within subsets of samples,
such as those belonging to the discrete developmental states or transitions between them,
rather than correlations across all samples, as is the case with coexpression.
In this way, state-dependant data can be considered independently from one another and novel knowledge extracted from the data.
KSW perhaps break Xholon subgraphs into fragments, that are at least spatio-temporally coordinated with each other, for example inventory and siblings
- use these in a way similar to what is described above
In this study we aimed to
(1) investigate whether rule-based ML methods can be used to predict the developmental output of a biological system,
(2) identify novel regulators using these predictions, and
(3) uncover functional associations between genes controlling a developmental phase transition using this approach.
We used the large number of publicly available transcriptomic data sets representing each the dormant and germinating states of
Arabidopsis seeds (Bassel et al., 2008) to generate a functional network using rule-based ML.
Four different ML methods representing different types of prediction techniques were compared to determine the accuracy with which they can predict developmental fate in seeds.
(5) http://vis.cs.ucdavis.edu/papers/kwon_infovis17.pdf
What Would a Graph Look Like in This Layout?
A Machine Learning Approach to Large Graph Visualization
Oh-Hyun Kwon, Student Member, IEEE, Tarik Crnovrsanin, and Kwan-Liu Ma, Fellow, IEEE
In the field of machine learning, several methods have been used to predict
the properties of graphs, such as the classes of graphs. One prominent
approach to predicting such properties is to use a graph kernel. Graph
kernel methods enable us to apply various kernelized machine learning
techniques, such as the Support Vector Machine (SVM) [18], on graphs.
Based on our assumption, we need to measure the topological similarities
between graphs. Depending on the discipline, this problem is called
graph matching, graph comparison, or network alignment. In the last
five decades, numerous approaches have been proposed to this problem
[22, 29]. Each approach measures the similarity between graphs
based on different aspects, such as isomorphism relation [48, 79, 92],
graph edit distance [11], or graph measures [4, 19]. However, many
of these traditional approaches are either computationally expensive,
not expressive enough to capture the topological features, or difficult to
adapt to different problems [64].
Graph kernels have been recently introduced for measuring pairwise
similarities between graphs in the field of machine learning. They allow
us to apply many different kernelized machine learning techniques, e.g.
SVM [18], on graph problems, including graph classification problems
found in bioinformatics and chemoinformatics.
A graph kernel can be considered to be an instance of an Rconvolution
kernel [42]. It measures the similarity between two graphs
based on the recursively decomposed substructures of said graphs.
The measure of similarity varies with each graph kernel based on different
types of substructures in graphs. These substructures include
walks [35, 50, 85], shortest paths [8], subtrees [72, 74, 75], cycles [44],
and graphlets [76]. Selecting a graph kernel is challenging as many
kernels are available. To exacerbate the problem, there is no theoretical
justification as to why one graph kernel works better than another for a
given problem [76].
Many graph kernels often have similar limitations as previously
mentioned approaches. They do not scale well to large graphs ...
or do not work well for unlabeled
graphs. To overcome this problem, a graph kernel based on sampling
a fixed number of graphlets has been introduced to be accurate and
efficient on large graphs [64, 76].
Graphlets are small, induced, and non-isomorphic subgraph patterns
in a graph [67] (Fig. 2). Graphlet frequencies (Fig. 3) have been used
to characterize biological networks [66, 67], identify disease genes [59],
and analyze social network structures [82]. Depending on the definition,
the relative frequencies are called graphlet frequency distribution,
graphlet degree distribution, or graphlet concentrations. While these
works often have different definitions, the fundamental idea is to count
the individual graphlets and compare their relative frequencies of occurrence
between graphs. A graph kernel based on graphlet frequencies,
called a graphlet kernel, was first proposed by Shervashidze et al. [76].
The main idea is to use a graphlet frequency vector as the feature vector
of a graph, then compute the similarity between graphs by defining the
inner product of the feature vectors.
(6) http://graphvis.net/wgl
(7) https://en.wikipedia.org/wiki/Graphlets
Graphlets are small connected non-isomorphic induced subgraphs of a large network.
Graphlets differ from network motifs, since they must be induced subgraphs, whereas motifs are partial subgraphs.
An induced subgraph must contain all edges between its nodes that are present in the large network, while a partial subgraph may contain only some of these edges.
Moreover, graphlets do not need to be over-represented in the data when compared with randomized networks, while motifs do.
(8) https://en.wikipedia.org/wiki/Network_motif
All networks, including biological networks, social networks, technological networks (e.g., computer networks and electrical circuits) and more,
can be represented as graphs, which include a wide variety of subgraphs.
One important local property of networks are so-called network motifs, which are defined as recurrent and statistically significant sub-graphs or patterns.
Motif Discovery Algorithms
Here, a review on computational aspects of major algorithms is given and their related benefits and drawbacks from an algorithmic perspective are discussed.
Well-Established Motifs and Their Functions
(9) http://robotics.stanford.edu/~quocle/CaeCheLeSmo07.pdf
Learning Graph Matching
Tiberio S. Caetano, Li Cheng, Quoc V. Le and Alex J. Smola ´
Statistical Machine Learning Program, NICTA and ANU, Canberra ACT 0200, Australia
(10) https://arxiv.org/pdf/1506.05163.pdf
Deep Convolutional Networks on Graph-Structured Data
Mikael Henaff, Joan Bruna, Yann LeCun
(11) https://en.wikipedia.org/wiki/Convolutional_neural_network
(12) https://www.quora.com/How-would-you-apply-deep-learning-to-graph-data-What-algorithm-would-you-use?share=1
How would you apply deep learning to graph data? What algorithm would you use?
Dileep Patchigolla, Specialist Data science at Hike Messenger
DL is used widely for network embeddings of graph data.
Basically, instead of representing each node of a graph with its links and features, there are several DL techniques using which each node can be represented in a low-dimensional latent space.
Once you get these embeddings, they are typically used for some downsteam ML applications. i.e, you can use these embeddings as features for your ML models.
CNNs are generally used to learn these node embeddings from its own attributes and attributes from its neighborhood.
Depending on how ‘deep’ you can go, a convolution operator can learn from the features of a node and its (extended) neighborhood.
This github location has a list of several great paper implementations:
chihming/awesome-network-embedding
(13) https://github.com/chihming/awesome-network-embedding
A curated list of network embedding techniques.
Also called network representation learning, graph embedding, knowledge embedding, etc.
The task is to learn the representations of the vertices from a given network.
(14) https://arxiv.org/pdf/1511.05493.pdf
GATED GRAPH SEQUENCE NEURAL NETWORKS
Yujia Li, Richard Zemel, Marc Brockschmidt, Daniel Tarlow
ABSTRACT
Graph-structured data appears frequently in domains including chemistry, natural
language semantics, social networks, and knowledge bases. In this work, we study
feature learning techniques for graph-structured inputs. Our starting point is previous
work on Graph Neural Networks (Scarselli et al., 2009), which we modify
to use gated recurrent units and modern optimization techniques and then extend
to output sequences. The result is a flexible and broadly useful class of neural network
models that has favorable inductive biases relative to purely sequence-based
models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the
capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We
then show it achieves state-of-the-art performance on a problem from program
verification, in which subgraphs need to be described as abstract data structures.
(15) https://en.wikipedia.org/wiki/Induced_subgraph
(16) https://github.com/josephmisiti/awesome-machine-learning#javascript
A curated list of awesome Machine Learning frameworks, libraries and software.
(17) http://www.weizmann.ac.il/mcb/UriAlon/homepage
Understanding the protein circuits that perform computations within the cell is a central problem in biology.
network motifs
]]></Notes>
<_-.XholonClass>
<!-- domain objects -->
<PhysicalSystem/>
<Block/>
<Brick/>
<!-- quantities -->
<Height superClass="Quantity"/>
</_-.XholonClass>
<xholonClassDetails>
<Block>
<port name="height" connector="Height"/>
</Block>
</xholonClassDetails>
<PhysicalSystem>
<Block>
<Height>0.1 m</Height>
</Block>
<Brick multiplicity="2"/>
</PhysicalSystem>
<Blockbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
var a = 123;
var b = 456;
var c = a * b;
if (console) {
console.log(c);
}
//# sourceURL=Blockbehavior.js
]]></Blockbehavior>
<Heightbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
var myHeight, testing;
var beh = {
postConfigure: function() {
testing = Math.floor(Math.random() * 10);
myHeight = this.cnode.parent();
},
act: function() {
myHeight.println(this.toString());
},
toString: function() {
return "testing:" + testing;
}
}
//# sourceURL=Heightbehavior.js
]]></Heightbehavior>
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
$wnd.xh.Brickbehavior = function Brickbehavior() {}
$wnd.xh.Brickbehavior.prototype.postConfigure = function() {
this.brick = this.cnode.parent();
this.iam = " red brick";
};
$wnd.xh.Brickbehavior.prototype.act = function() {
this.brick.println("I am a" + this.iam);
};
//# sourceURL=Brickbehavior.js
]]></Brickbehavior>
<Brickbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[
console.log("I'm another brick behavior");
]]></Brickbehavior>
<SvgClient><Attribute_String roleName="svgUri"><![CDATA[data:image/svg+xml,
<svg width="100" height="50" xmlns="http://www.w3.org/2000/svg">
<g>
<title>Block</title>
<rect id="PhysicalSystem/Block" fill="#98FB98" height="50" width="50" x="25" y="0"/>
<g>
<title>Height</title>
<rect id="PhysicalSystem/Block/Height" fill="#6AB06A" height="50" width="10" x="80" y="0"/>
</g>
</g>
</svg>
]]></Attribute_String><Attribute_String roleName="setup">${MODELNAME_DEFAULT},${SVGURI_DEFAULT}</Attribute_String></SvgClient>
</XholonWorkbook>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment