Instantly share code, notes, and snippets.

# ruoyu0088/segments_fit.ipynb

Last active July 3, 2024 23:21
Show Gist options
• Save ruoyu0088/70effade57483355bbd18b31dc370f2a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

### cerlymarco commented May 19, 2021

This can be done automatically by Linear Trees... A sklearn compatible implementation is available here

### dankoc commented Jul 3, 2021

Thanks for posting this!

Here are a few suggested edits to pick the number of segments automatically by optimizing AIC and/ or BIC. The implementation is similar to the heuristic strategy presented in this paper: https://discovery.ucl.ac.uk/id/eprint/10070516/1/AIC_BIC_Paper.pdf

``````def segments_fit(X, Y, maxcount):
xmin = X.min()
xmax = X.max()

n = len(X)

AIC_ = float('inf')
BIC_ = float('inf')
r_   = None

for count in range(1, maxcount+1):

seg = np.full(count - 1, (xmax - xmin) / count)

px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.1].mean() for x in px_init])

def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
return px, py

def err(p): # This is RSS / n
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)

r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')

# Compute AIC/ BIC.
AIC = n * np.log10(err(r.x)) + 4 * count
BIC = n * np.log10(err(r.x)) + 2 * count * np.log(n)

if((BIC < BIC_) & (AIC < AIC_)): # Continue adding complexity.
r_ = r
AIC_ = AIC
BIC_ = BIC
else: # Stop.
count = count - 1
break

return func(r_.x) ## Return the last (n-1)
``````

### jmox0351 commented Oct 19, 2021

Great stuff, thanks! I used this to model draining out a tank to capture the formation of vorticies.

### mylo19 commented Nov 29, 2021

That's really helpful, thanks. I want to use a part from your implementation in a project I am working on. Is that ok? Could you add a license, or specify whether you want some reference etc?

### dankoc commented Nov 29, 2021

No objections from me - feel free to use anything I've contributed under MIT no attribution license: https://opensource.org/licenses/MIT-0.

Be sure to credit ruoyu0088 for their contributions as they wish!

Hi! Sometimes, depending on the number of segments and the signals I pass, I get None in the return value. I believe it stems from the py_init initializing some indices to NaN. Why could it be? It leads me to a question: why do you multiply by 0.01?
Thank you!

@KateZi the multiplication factor of 0.01 ensures that only a small set of y-axis data (here 1% around the initial x-values) is selected to initialize the y-values corresponding to the initial x-values.

I had a data set which had a relatively larger difference between x-intervals than 1%, wherein I had to increase this parameter value to 0.02 (2%). That ensured that there are some y-values to initialize the optimization parameters.

### jef07 commented Feb 12, 2023

Hello, thanks for the material, really useful! I have a question: is there any such code available to accomodate for 3 dimensions?

Thanks for posting this!

Here are a few suggested edits to pick the number of segments automatically by optimizing AIC and/ or BIC. The implementation is similar to the heuristic strategy presented in this paper: https://discovery.ucl.ac.uk/id/eprint/10070516/1/AIC_BIC_Paper.pdf

``````def segments_fit(X, Y, maxcount):
xmin = X.min()
xmax = X.max()

n = len(X)

AIC_ = float('inf')
BIC_ = float('inf')
r_   = None

for count in range(1, maxcount+1):

seg = np.full(count - 1, (xmax - xmin) / count)

px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.1].mean() for x in px_init])

def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
return px, py

def err(p): # This is RSS / n
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)

r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')

# Compute AIC/ BIC.
AIC = n * np.log10(err(r.x)) + 4 * count
BIC = n * np.log10(err(r.x)) + 2 * count * np.log(n)

if((BIC < BIC_) & (AIC < AIC_)): # Continue adding complexity.
r_ = r
AIC_ = AIC
BIC_ = BIC
else: # Stop.
count = count - 1
break

return func(r_.x) ## Return the last (n-1)
``````

Hi dankoc! Thanks for sharing the code which really helps me. One little clarification, shouldn't it be natural log in the AIC/BIC formula?