Allan Haldane ahaldane

## structured_array_motivation.md

      
              1 file
            
          
              0 forks
            
          
              9 comments
            
          
              0 stars
            
          
                ahaldane
                / structured_array_motivation.md
            
            
              Last active
              November 20, 2018 18:11
            
              
                Structured array change notes
              
          
    Vision For Structured Array Cleanup

Structured arrays are a numpy feature allowing interpretation of structured (composed from multiple datatypes) data organized like "structs" in the C language. While the basic idea and functionality are useful, structured arrays have not received as much attention as other parts of numpy and as a result some of their behavior is self-contradictory, buggy, or undocumented.
Different users have also used structured arrays for different purposes, which may have led to the self-contradictory behavior: The original intended use appears to be for interpreting binary data blobs, but some users want to use structured arrays as a "pandas-lite" for manipulating tabular data. We have tried to discourage the latter behavior recently.
The purpose of this document is to better specify what we want structured arrays to do within numpy, what problems currently exist, and propose how structured arrays should be fixed.

  
## numpy_coercion.rst

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ahaldane
                / numpy_coercion.rst
            
            
              Last active
              June 15, 2019 00:00
            
              
                C-style type coercion
              
          
    Using C-style type coercion rules in Numpy

This document explores using C-like type-coercion for + - * // in numpy.
Motivation: Currently, when two dtypes are involved in a binary operation numpy's principle is that "the output dtype's range covers the range of both input dtypes", and when a single dtype is involved there is never any cast. One often-surprising consequence of this is that "np.uint64 + np.int64" gives an "np.float64". This is different from C-style coercion.  The current numpy coercion rules lead to unexpected behaviors like this one, which we often get questions about on github and the mailing list. See the issues collected in numpy/numpy#12525
Why switch to C-style coercion specifically? Because numpy is written in C and is designed around lowlevel C types like uint8, uint32, float64, etc, and the C language has already defined coercion rules for these types.  C-style coercion has gone through 60 years of trial by fire


## structuredoc.rst

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ahaldane
                / structuredoc.rst
            
            
              Last active
              October 12, 2016 16:28
            
              
                structure docs
              
          
Contents

Structured Arrays
Introduction


Structured Arrays


Introduction

Numpy allows creation of arrays with a "structured" datatype composed of


## split_classes.mkd

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ahaldane
                / split_classes.mkd
            
            
              Last active
              February 13, 2016 16:40
            
          
    This PR defines a new indexing function "split_classes" to accompany the others, which, every once in a while, I've wished existed. It splits up elements from one array based on the 'classification' provided by another array. In its simplest form, it does this:
def split_classes(c, v):
    return [v[c == u] for u in unique(c)]

This implemenation has nagged me though because of performance: If c contains n unique values, this loops through the entire c and v arrays n times each, and creates n intermediate boolean arrays. For large v,c,n I've been hit by performance.
This PR gives a performance improvement by computing everything in a single pass with no intermediate boolean arrays, and for conveniance also allows choice of axis.

  
## npy_alignment.mkd

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ahaldane
                / npy_alignment.mkd
            
            
              Last active
              September 26, 2015 02:05
            
              
                Numpy 1.10 Alignment Notes
              
          
    These are notes on how memory alignment currently works in numpy.
Numpy Alignment Goals

There are three use-cases related to memory alignment in numpy I see:

Creating structured datatypes with fields aligned like in a C-struct.
Speeding up copy operations by using word/double-word assignment in instead of memcopy
Guaranteeing safe aligned access for ufuncs/setitem/casting code

Alignment variables


## structures.mkd

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ahaldane
                / structures.mkd
            
            
              Last active
              August 29, 2015 14:26
            
              
                Future Improvements for Structured Arrays?
              
          
    Future Improvements for Structured Arrays?

To add some context to PR #6053, here are a other potential improvements to structured arrays we could make. I think with improvements like these structured arrays could become much more reliable.
structured assignment speedup

Structure assignment is slow because it goes through the 'wrong' path in mapiter_set. It uses copyswapn when dtype_transfer would be much faster, since copyswapn iterates through the field dict for every element. See #1984. This should be a somewhat straightforward fix.
structure comparison & ufuncs


## objectarrays.mkd

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ahaldane
                / objectarrays.mkd
            
            
              Last active
              December 2, 2017 17:01
            
              
                object array docs (future ideas)
              
          
    Object Arrays

Object arrays are ndarrays with a datatype of np.object whose elements are Python objects, enabling use of numpy's vectorized operations and broadcasting rules with arbitrary Python types. Object arrays have certain special rules to resolve ambiguities that arise between python types and numpy types, described here.
Envisioned uses of object arrays include:

Creating ndarrays whose elements are other ndarrays of varying length
Creating ndarrays containing number-like Python objects, for example mpmath's multiprecision types, or Python's built-in arbitrary precision integers or Decimal type.