Skip to content

Instantly share code, notes, and snippets.

@Datseris
Created November 8, 2018 19:03
Show Gist options
  • Save Datseris/3ca61b32dea2033aeae5c4dfe3e0b78c to your computer and use it in GitHub Desktop.
Save Datseris/3ca61b32dea2033aeae5c4dfe3e0b78c to your computer and use it in GitHub Desktop.
Comparing DS with RA
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparison of DynamicalSystems.jl and RecurrenceAnalysis.jl"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"using DynamicalSystems, RecurrenceAnalysis, BenchmarkTools"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following RA stands for RecurrenceAnalysis while DS stands for DynamicalSystems (including ChaosTools).\n",
"\n",
"# 1. Delay Embedding\n",
"\n",
"## 1.1. Methods give same result\n",
"\n",
"First confirm that both methods give the same result"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"towel = Systems.towel();\n",
"N = 1000\n",
"x = trajectory(towel, N)[:, 1]\n",
"const τ = 2\n",
"D = 1 # for DS this is the amount of temporal neighbors, symbol subject to change"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"2-dimensional Dataset{Float64} with 999 points\n",
" 0.085 0.76827 \n",
" 0.285813 0.681871\n",
" 0.76827 0.837347\n",
" 0.681871 0.51969 \n",
" 0.837347 0.966676\n",
" 0.51969 0.112748\n",
" 0.966676 0.386547\n",
" 0.112748 0.910741\n",
" 0.386547 0.306095\n",
" 0.910741 0.824263\n",
" 0.306095 0.545332\n",
" 0.824263 0.954994\n",
" 0.545332 0.165792\n",
" ⋮ \n",
" 0.446527 0.221204\n",
" 0.941787 0.649266\n",
" 0.221204 0.877588\n",
" 0.649266 0.400512\n",
" 0.877588 0.924354\n",
" 0.400512 0.269831\n",
" 0.924354 0.764945\n",
" 0.269831 0.681131\n",
" 0.764945 0.841502\n",
" 0.681131 0.499625\n",
" 0.841502 0.962963\n",
" 0.499625 0.137614\n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r1 = reconstruct(x, D, τ)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"999×2 Array{Float64,2}:\n",
" 0.085 0.76827 \n",
" 0.285813 0.681871\n",
" 0.76827 0.837347\n",
" 0.681871 0.51969 \n",
" 0.837347 0.966676\n",
" 0.51969 0.112748\n",
" 0.966676 0.386547\n",
" 0.112748 0.910741\n",
" 0.386547 0.306095\n",
" 0.910741 0.824263\n",
" 0.306095 0.545332\n",
" 0.824263 0.954994\n",
" 0.545332 0.165792\n",
" ⋮ \n",
" 0.446527 0.221204\n",
" 0.941787 0.649266\n",
" 0.221204 0.877588\n",
" 0.649266 0.400512\n",
" 0.877588 0.924354\n",
" 0.400512 0.269831\n",
" 0.924354 0.764945\n",
" 0.269831 0.681131\n",
" 0.764945 0.841502\n",
" 0.681131 0.499625\n",
" 0.841502 0.962963\n",
" 0.499625 0.137614"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r2 = RecurrenceAnalysis.embed(x, D+1, τ)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Matrix(r1) == r2"
]
},
{
"cell_type": "markdown",
"metadata": {
"scrolled": true
},
"source": [
"## 1.2. Return type\n",
"`reconstruct` returns `Dataset` which is a Vector of `SVectors`. I have found this to be more performant than a matrix for operations that truly understand the dataset as a vector of points. For example entropy computation.\n",
"\n",
"RA returns `Matrix`. More comparisons will follow later in this notebook.\n",
"\n",
"## 1.3. Benchmarks\n",
"\n",
"Now benchmark the functions for various dimensions and lengths"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"N = 1000, D = 1\n",
" 4.206 μs (6 allocations: 15.91 KiB)\n",
" 24.320 μs (3 allocations: 31.52 KiB)\n",
"N = 1000, D = 10\n",
" 16.213 μs (8 allocations: 84.94 KiB)\n",
" 131.413 μs (5 allocations: 168.80 KiB)\n",
"N = 100000, D = 1\n",
" 436.480 μs (7 allocations: 1.53 MiB)\n",
" 3.331 ms (5 allocations: 3.05 MiB)\n",
"N = 100000, D = 10\n",
" 2.660 ms (8 allocations: 8.39 MiB)\n",
" 18.878 ms (5 allocations: 16.78 MiB)\n"
]
}
],
"source": [
"for N in [1000, 100000]\n",
" x = trajectory(towel, N)[:, 1]\n",
" for D in [1, 10]\n",
" println(\"N = $(N), D = $(D)\")\n",
" @btime reconstruct($x, $D, $τ);\n",
" @btime RecurrenceAnalysis.embed($x, $(D+1), $τ);\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"RA is much slower than the method in DS. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.4. Feature comparison\n",
"\n",
"DS also supports:\n",
"\n",
"- multiple timeseries embedding\n",
"- embeddings with different delay times (aka delay vectors)\n",
"- provides an embedding struct that can embed on demand any entry of the \"full embedding\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Distance matrix\n",
"\n",
"## 2.1. Implementation\n",
"\n",
"The implementation in RA uses `pairwise` from `Distances` _and in addition_ requires all inpute to be transposed. This introduces at least two problems:\n",
"1. Transposition, that may or may not be costly\n",
"2. Using `views` several times over, that even though optimized, do have some cost.\n",
"\n",
"In addition, the entire RA package uses some string to identify the function that does the distance computation. I am sure that this leads to type instability. Although it becomes clear later that this is not a performance hit, it is still not Julia syntax, but Pythonic. I stand that users should directly use `Euclidean()` etc. as per multiple dispatch. It is \"bad education\" to use strings, I believe.\n",
"\n",
"---\n",
"\n",
"Let's try to create the same pairwise-like function that takes as an input a vector of vectors, something much more suited for this kind of operation."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pairwiseDS (generic function with 1 method)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"VV = Vector{<:SVector}\n",
"using Distances\n",
"\n",
"# main API\n",
"distancematrixDS(x::Dataset, metric::Metric = Chebyshev()) = distancematrixDS(x, x, metric)\n",
"function distancematrixDS(x::Dataset, y::Dataset, metric::Metric = Chebyshev())\n",
" pairwiseDS(x.data, y.data, metric)\n",
"end\n",
"# Core function: pairwise of vectors of svectors\n",
"function pairwiseDS(x::VV, y::VV, metric::Metric)\n",
" d = zeros(eltype(eltype(x)), length(x), length(y))\n",
" for j in 1:length(y)\n",
" for i in 1:length(x)\n",
" @inbounds d[i,j] = evaluate(metric, x[i], y[j])\n",
" end\n",
" end\n",
" return d\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"999×999 Array{Float64,2}:\n",
" 0.0 0.21861 0.686753 … 0.653867 0.781154 0.754746 \n",
" 0.21861 0.0 0.506891 0.435305 0.622739 0.584749 \n",
" 0.686753 0.506891 0.0 0.348783 0.145404 0.749531 \n",
" 0.646566 0.427978 0.329197 0.0200783 0.471141 0.423315 \n",
" 0.778068 0.620728 0.14662 0.492483 0.00557238 0.895209 \n",
" 0.786553 0.615305 0.766052 … 0.41921 0.909082 0.0319517\n",
" 0.960762 0.742153 0.492529 0.307119 0.58985 0.529249 \n",
" 0.145147 0.286937 0.659619 0.701481 0.730623 0.864523 \n",
" 0.551848 0.389043 0.654172 0.352467 0.799036 0.20291 \n",
" 0.827637 0.640946 0.14307 0.397632 0.155021 0.800315 \n",
" 0.313982 0.138038 0.546698 … 0.377811 0.679026 0.451318 \n",
" 0.76248 0.603759 0.130292 0.477334 0.0189918 0.879488 \n",
" 0.758212 0.577657 0.707592 0.360397 0.85041 0.0536949\n",
" ⋮ ⋱ \n",
" 0.655731 0.487897 0.695091 0.364084 0.840364 0.0990287\n",
" 0.865012 0.656784 0.255895 0.300556 0.329337 0.676237 \n",
" 0.174648 0.206106 0.548545 0.595306 0.626146 0.79062 \n",
" 0.67353 0.459632 0.452754 … 0.104109 0.594395 0.302503 \n",
" 0.807811 0.639528 0.139716 0.467964 0.0528473 0.872821 \n",
" 0.589906 0.427706 0.676254 0.362701 0.821525 0.165242 \n",
" 0.839361 0.643923 0.172059 0.359933 0.214652 0.757588 \n",
" 0.204343 0.0159982 0.522345 0.449569 0.637367 0.590098 \n",
" 0.683877 0.505025 0.00532237 … 0.352001 0.143575 0.752232 \n",
" 0.653867 0.435305 0.348783 0.0 0.490307 0.404965 \n",
" 0.781154 0.622739 0.145404 0.490307 0.0 0.893354 \n",
" 0.754746 0.584749 0.749531 0.404965 0.893354 0.0 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"distancematrix(r2, \"euclidean\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"999×999 Array{Float64,2}:\n",
" 0.0 0.21861 0.686753 … 0.653867 0.781154 0.754746 \n",
" 0.21861 0.0 0.506891 0.435305 0.622739 0.584749 \n",
" 0.686753 0.506891 0.0 0.348783 0.145404 0.749531 \n",
" 0.646566 0.427978 0.329197 0.0200783 0.471141 0.423315 \n",
" 0.778068 0.620728 0.14662 0.492483 0.00557238 0.895209 \n",
" 0.786553 0.615305 0.766052 … 0.41921 0.909082 0.0319517\n",
" 0.960762 0.742153 0.492529 0.307119 0.58985 0.529249 \n",
" 0.145147 0.286937 0.659619 0.701481 0.730623 0.864523 \n",
" 0.551848 0.389043 0.654172 0.352467 0.799036 0.20291 \n",
" 0.827637 0.640946 0.14307 0.397632 0.155021 0.800315 \n",
" 0.313982 0.138038 0.546698 … 0.377811 0.679026 0.451318 \n",
" 0.76248 0.603759 0.130292 0.477334 0.0189918 0.879488 \n",
" 0.758212 0.577657 0.707592 0.360397 0.85041 0.0536949\n",
" ⋮ ⋱ \n",
" 0.655731 0.487897 0.695091 0.364084 0.840364 0.0990287\n",
" 0.865012 0.656784 0.255895 0.300556 0.329337 0.676237 \n",
" 0.174648 0.206106 0.548545 0.595306 0.626146 0.79062 \n",
" 0.67353 0.459632 0.452754 … 0.104109 0.594395 0.302503 \n",
" 0.807811 0.639528 0.139716 0.467964 0.0528473 0.872821 \n",
" 0.589906 0.427706 0.676254 0.362701 0.821525 0.165242 \n",
" 0.839361 0.643923 0.172059 0.359933 0.214652 0.757588 \n",
" 0.204343 0.0159982 0.522345 0.449569 0.637367 0.590098 \n",
" 0.683877 0.505025 0.00532237 … 0.352001 0.143575 0.752232 \n",
" 0.653867 0.435305 0.348783 0.0 0.490307 0.404965 \n",
" 0.781154 0.622739 0.145404 0.490307 0.0 0.893354 \n",
" 0.754746 0.584749 0.749531 0.404965 0.893354 0.0 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"distancematrixDS(r1, Euclidean())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2. Confirmation that it works"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"distancematrixDS(r1) == distancematrix(r2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.3. Benchmarks"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"N = 1000, D = 1\n",
" 4.694 ms (2 allocations: 7.61 MiB)\n",
" 6.247 ms (28 allocations: 7.62 MiB)\n",
"N = 1000, D = 5\n",
" 14.265 ms (2 allocations: 7.49 MiB)\n",
" 11.362 ms (28 allocations: 7.49 MiB)\n",
"N = 1000, D = 10\n",
" 34.079 ms (2 allocations: 7.34 MiB)\n",
" 18.724 ms (28 allocations: 7.34 MiB)\n"
]
}
],
"source": [
"for N in [1000]\n",
" x = trajectory(towel, N)[:, 1]\n",
" for D in [1, 5, 10]\n",
" println(\"N = $(N), D = $(D)\")\n",
" r1 = reconstruct(x, D, τ);\n",
" r2 = RecurrenceAnalysis.embed(x, (D+1), τ);\n",
" @btime distancematrixDS($r1)\n",
" @btime RecurrenceAnalysis.distancematrix($r2)\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See also the discussion here: https://discourse.julialang.org/t/pairwise-distance-using-matrix-or-vector-svector-the-first-is-faster/17312/4 and there is also some kind of weird optimization happening with the tranpose... I am not sure though. \n",
"\n",
"For very small dimensionalitities SVector is faster, but then Matrix wins. It is very weird though, because the actual distance evaluation is faster for SVector:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"m = Chebyshev()\n",
"v = rand(10)\n",
"sv = SVector{10}(v);"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 30.293 ns (0 allocations: 0 bytes)\n",
" 25.173 ns (0 allocations: 0 bytes)\n"
]
}
],
"source": [
"@btime evaluate($m, $v, $v)\n",
"@btime evaluate($m, $sv, $sv);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The benchmarks are better if one uses Euclidean distance though!"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"N = 1000, D = 1\n",
" 4.498 ms (2 allocations: 7.61 MiB)\n",
" 6.294 ms (2027 allocations: 7.69 MiB)\n",
"N = 1000, D = 10\n",
" 7.055 ms (2 allocations: 7.34 MiB)\n",
" 6.087 ms (1991 allocations: 7.41 MiB)\n",
"N = 1000, D = 100\n",
" 61.214 ms (2 allocations: 4.90 MiB)\n",
" 5.571 ms (1631 allocations: 4.95 MiB)\n"
]
}
],
"source": [
"for N in [1000]\n",
" x = trajectory(towel, N)[:, 1]\n",
" for D in [1, 10, 100]\n",
" println(\"N = $(N), D = $(D)\")\n",
" r1 = reconstruct(x, D, τ);\n",
" r2 = RecurrenceAnalysis.embed(x, (D+1), τ);\n",
" @btime distancematrixDS($r1, Euclidean())\n",
" @btime RecurrenceAnalysis.distancematrix($r2, \"euclidean\")\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see the version with `D = 100` is _much_ slower with static vectors. That is expected as these are the limits where static vectors are usable. We can have an if clause in the top level and if D near 100 the data is converted to matrix and then the `Distances.pairwise` is used."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Recurrence Matrix\n",
"\n",
"The implementation of the Recurrence matrix is as fast as it can get. BUT. It's return value is not very reasonable. It returns a sparse matrix with values the booleans... My question is, why? We know what the booleans mean.\n",
"\n",
"Should it return just a vector of indices, while only keeping the indices that correspond to `true` ? Something like:\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"recDS (generic function with 2 methods)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function recDS(x, radius, metric = \"max\")\n",
" r = RecurrenceAnalysis.distancematrix(x, metric)\n",
" out = Tuple{Int, Int}[]\n",
" s = size(x)[1]\n",
" @inbounds for j in 1:s\n",
" for i in 1:s\n",
" # i == j && continue <- This is worth considering...\n",
" r[i, j] <= radius && push!(out, (i, j))\n",
" end\n",
" end\n",
" return out\n",
"end "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"x = trajectory(towel, 10000)[:, 1]\n",
"data = Matrix(reconstruct(x, 1, 1));"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10000×10000 SparseArrays.SparseMatrixCSC{Bool,Int64} with 33506 stored entries:\n",
" [1 , 1] = true\n",
" [2 , 2] = true\n",
" [3 , 3] = true\n",
" [142 , 3] = true\n",
" [4142 , 3] = true\n",
" [4 , 4] = true\n",
" [1273 , 4] = true\n",
" [4143 , 4] = true\n",
" [7390 , 4] = true\n",
" [9999 , 4] = true\n",
" [5 , 5] = true\n",
" [1956 , 5] = true\n",
" ⋮\n",
" [5328 , 9997] = true\n",
" [5994 , 9997] = true\n",
" [6071 , 9997] = true\n",
" [9997 , 9997] = true\n",
" [2068 , 9998] = true\n",
" [9998 , 9998] = true\n",
" [4 , 9999] = true\n",
" [2069 , 9999] = true\n",
" [7390 , 9999] = true\n",
" [9999 , 9999] = true\n",
" [4725 , 10000] = true\n",
" [10000, 10000] = true"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recurrencematrix(data, 0.001)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"33506-element Array{Tuple{Int64,Int64},1}:\n",
" (1, 1) \n",
" (2, 2) \n",
" (3, 3) \n",
" (142, 3) \n",
" (4142, 3) \n",
" (4, 4) \n",
" (1273, 4) \n",
" (4143, 4) \n",
" (7390, 4) \n",
" (9999, 4) \n",
" (5, 5) \n",
" (1956, 5) \n",
" (6, 6) \n",
" ⋮ \n",
" (5328, 9997) \n",
" (5994, 9997) \n",
" (6071, 9997) \n",
" (9997, 9997) \n",
" (2068, 9998) \n",
" (9998, 9998) \n",
" (4, 9999) \n",
" (2069, 9999) \n",
" (7390, 9999) \n",
" (9999, 9999) \n",
" (4725, 10000) \n",
" (10000, 10000)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recDS(data, 0.001)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But once again my method is actually muuuch slower:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1.956113 seconds (75 allocations: 777.565 MiB, 1.23% gc time)\n",
" 9.819708 seconds (389.78 M allocations: 6.555 GiB, 7.44% gc time)\n"
]
}
],
"source": [
"@time recurrencematrix(data, 0.001);\n",
"@time recDS(data, 0.001);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I guess I am a complete noob in LinearAlgebra optimizations.... :(\n",
"\n",
"Anyway, same discussion as above goes to `crossrecurrence` and `joinrecurrence`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Estimating embedding dimension\n",
"## 4.1. Features\n",
"\n",
"DS has only Cao's method while RA has also the FNN and the even more recent False First Nearest Neighbors. Very cool!\n",
"\n",
"## 4.2. Same output & Benchmark\n",
"I'll compare Cao's method as its the only one we got"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4.403417 seconds (9.36 M allocations: 468.482 MiB, 4.21% gc time)\n"
]
},
{
"data": {
"text/plain": [
"5-element Array{Float64,1}:\n",
" 0.4225089613465132\n",
" 0.8484447761147434\n",
" 0.9113140842398368\n",
" 0.924133227174861 \n",
" 0.9594589837340649"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using ChaosTools, PyPlot\n",
"\n",
"x = trajectory(towel, 10000)[:, 1]\n",
"Ds = 1:5\n",
"\n",
"@time Es = estimate_dimension(x, 1, Ds) # compute E1 for Dimensions Ds .+ 1"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"([0.433097, 0.849603], [1.2272, 1.32898])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ERA = afnn(x, Ds .+ 1, 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alright the return value of the above is not really useful... We need E1 in many various dimensions to actually estimate when it saturates. Just two values cannot let us deduce this. Lets try again:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 54.894057 seconds (593.86 k allocations: 56.627 GiB, 18.10% gc time)\n"
]
},
{
"data": {
"text/plain": [
"4-element Array{Float64,1}:\n",
" 0.43309679708094556\n",
" 0.8496026271703787 \n",
" 0.9199791355071634 \n",
" 0.9204688035749184 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@time ERA = [afnn(x, (Ds[i], Ds[i+1]) .+ 1, 1)[1][1] for i in 1:4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, `afnn` is much, _much_ slower than our implementation. Several orders of magnitude."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"Figure(PyObject <Figure size 600x400 with 1 Axes>)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"rc(\"figure\", figsize = (6,4))\n",
"plot( Ds .+ 1, Es, label = \"DS\");\n",
"plot( (Ds .+ 1)[1:end-1], ERA, label = \"RA\");\n",
"legend();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At least our methods do yield the same results (so both are correct).\n",
"\n",
"In DS we have a _dedicated and different_ function for the E2 quantity of Cao. Most of the time you do not need both quantities at the same time, so there is no reason to compute them both..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusions\n",
"\n",
"In this notebook I have only reviewed common features and the most central function `recurrencematrix`. I have also read the README and have an overview of all features. I think RA is a great package and it has a lot of functionality and also very useful functionality!\n",
"\n",
"Here is what I believe are the best steps forward:\n",
"\n",
"0. Move `RecurrenceAnalysis` to JuliaDynamics org. (When you do I will give you owner priviliges for it)\n",
"1. Put `DynamicalSystemsBase` to the dependencies of RA.\n",
"1. Use `reconstruct` from DynamicalSystemsBase exclusively. It is too many more features and it is much faster! This also means that all `embed` code is discarded.\n",
"2. As far as `distancematrix` is concerned, we could mainly use the method I wrote in this notebook, which uses a `Dataset`. I suggest to write a higher level if-clause that if the dimensionality of the dataset is more than let's say 10, it converts it to a matrix and uses your method. This means that both mine and your methods are written and we choose which one to use!\n",
"3. I think `recurrencematrix` could be made more intuitive. But on the other hand your method is just so darn fast!!! I don't know why, but I suggest that we keep your method for now and see how it goes.\n",
"4. Move all delay embedding dimension estimation functionality `estimate_dimension` to DynamicalSystemsBase. At the moment it lives in `ChaosTools`. (I'll do that)\n",
"5. Use `estimate_dimension` from DynSysBase exclusively. Contribute your extra two methods FNN and FFNN to DynSysBase. I will be overviewing the contribution PRs and be sure we also get better performance (they should also be reworked to \"just return the values at the given dimensions\" instead of only first and last).\n",
"6. Compare the mutual information methods between the one in ChaosTools (in `estimate_delay`) and the one in RA. Use the best one exclusively in both ChaosTools and RA. It is already clear that the method in RA **has more features** than then one in ChaosTools, the only thing left to check is the performance.\n",
"6. [Optional/Suggestion] : Rework the handling of the metrics: instead of asking for a string ask for the `Metric` instance. E.g. `\"euclidean\" => Euclidean()`. This is important for education as it teaches the Julia way. But this is just my humble opinion.\n",
"8. Add citations to documentation strings.\n",
"\n",
"Once these are done, and if you agree of course, we make RA an official part of DS by \n",
"\n",
"1. adding it to its documentation page as a set of dedicated pages (much like `ChaosTools`). For this to happen we will also have to write a proper documentation (i.e. separate / expand the current README to documenter acceptable files that expand the docstrings. I will take care of that).\n",
"2. re-export it. \n",
"3. Make sure the source code is clear. I will read the source and wherever there is something unclear I'll either clarify myself or ask for your help in claryfying.\n",
"\n",
"You might think number 3 is irrelevant or useless or unecessary but the philoshopy of JuliaDynamics is that the source code is clear and understandable. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.0.0",
"language": "julia",
"name": "julia-1.0"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.0.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment