Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yenchenlin/34aef13d26cbbe29a610a93aacb83f35 to your computer and use it in GitHub Desktop.
Save yenchenlin/34aef13d26cbbe29a610a93aacb83f35 to your computer and use it in GitHub Desktop.

I've tried to make SequentialDataset support Cython fused types, but it seems really expensive. You can find the modified code in this branch.

tl;dr - seq_dataset.pyx is heavily bound with sag_fast.pyx, sgd_fast.pyx.

After I modified seq_dataset.pyx, this line in sag_fast.pyx requires to change as well since this pointer is passed into SequentialDataset's function. However, my past experience is that one can only declare local floating variable when at least one of the function's argument variable also belongs to floating type. Nonetheless, that's not the case here, unless we make this function's arguments

np.ndarray[double, ndim=2, mode='c'] weights_array
np.ndarray[double, ndim=1, mode='c'] intercept_array

become floating types as well, i.e., changed double to floating.

However, doing so will require a comprehensive changes in three Cython files I mentioned above.

Further, it can become more complicated since ArrayDataset inherits from SequentialDataset and we know that inheritance doen't work well when combing with fused types.

Also it's is a little bit weird to degrade the precision of weights and y just because we need to use fused types in this function.

Considering above reasons, I guess I'll have a look at Neighbors Trees first.

@jnothman
Copy link

In which branch?

@yenchenlin
Copy link
Author

Sorry for the late late reply, this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment