The following question was asked on the APL Orchard chatroom (https://chat.stackexchange.com/transcript/message/59923809#5992380):
so I was reading https://xpqz.github.io/learnapl/iteration.html a bit ago and was confused by the binary search example, so wanted to test. the numbers I'm seeing for linear search are both nothing like his and don't make a lot of sense. I did randInts ← 100000 ? 100000 and compared execution times of randInts ⍳ 1, 19326, and 46729. the return values of those are 94438, 1001, and 1 respectively. the execution times are 0.13ms, 0.47ms, and 0.12us respectively. can anyone explain what's happening?
Obviously, we can't recreate the randInts exactly, but let's run a few timed iterations:
randInts ← 100000 ? 100000
'cmpx'⎕cy'dfns'
cmpx 'randInts⍳1' 'randInts⍳19326' 'randInts⍳46729'
┌→────────────────────────────────────────────────────────────────────────┐
↓ randInts⍳1 → 6.7E¯6 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
│* randInts⍳19326 → 7.0E¯6 | +3% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
│* randInts⍳46729 → 8.8E¯6 | +31% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕│
└─────────────────────────────────────────────────────────────────────────┘
We can do better by binding the array to dyadic ⍳, which optimises the array for doing multiple lookups by creating a hashed index:
find←randInts∘⍳
cmpx 'find 1' 'find 19326' 'find 46729'
┌→───────────────────────────────────────────────────────────────────┐
↓ find 1 → 1.9E¯7 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕│
│* find 19326 → 1.9E¯7 | -4% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
│* find 46729 → 1.9E¯7 | -5% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
└────────────────────────────────────────────────────────────────────┘
The point I was making in the book is that whilst in theory, a binary search should be much faster (if we sort the array first), in practice this isn't the case:
bsearch ← {⎕IO←0
_bs_ ← { ⍝ Operator: ⍺,⍵ - lower,upper index. ⍺⍺ - item, ⍵⍵ - array
⍺>⍵: ⍬ ⍝ If lower index has moved past upper, item's not present
mid ← ⌈0.5×⍺+⍵ ⍝ New midpoint
⍺⍺=mid⊃⍵⍵: mid ⍝ Check if item is at the new midpoint
⍺⍺<mid⊃⍵⍵: ⍺∇¯1+mid ⍝ Drill into lower half
⍵∇⍨1+mid ⍝ Upper half
}
0 (⍺ _bs_ (,⍵)) ¯1+≢,⍵
}
sortInts←⍳100000
cmpx '1 bsearch sortInts' '19326 bsearch sortInts' '46729 bsearch sortInts'
┌→────────────────────────────────────────────────────────────────────────────────┐
↓ 1 bsearch sortInts → 1.4E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
│* 19326 bsearch sortInts → 1.6E¯5 | +13% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕│
│* 46729 bsearch sortInts → 1.1E¯5 | -22% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
└─────────────────────────────────────────────────────────────────────────────────┘
What are we seeing here? It's a consequence of Dyalog's primitive functions operating on simple arrays are super-optimal and vectorised. Any recursive function you can write in APL will be slower, even if your algorithm is "better".
Dyalog also has dyadic ⍸ -- interval index -- which is really a binary search. In 18.0/1, it's really fast, beating even the hashed index:
cmpx 'sortInts⍸1' 'sortInts⍸19326' 'sortInts⍸46729'
┌→────────────────────────────────────────────────────────────────────────┐
↓ sortInts⍸1 → 1.3E¯7 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
│* sortInts⍸19326 → 1.4E¯7 | +13% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕│
│* sortInts⍸46729 → 1.4E¯7 | +7% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ │
└─────────────────────────────────────────────────────────────────────────┘
Sadly, this is an optimisation that has had to be dropped for 18.2.