Well, that's encapsulated in the AR::Result object. Databases usually
return an array for the columns and an array of arrays for each row.
I don't think it's a good idea for database drivers to return results
as hashes in the first place. The reason is because column order is
important.
We could use hashes in 1.9 because hashes in 1.9 are ordered, but
that is not true for 1.8. That's the motivation for the AR::Result
object. It simply holds the column and row information. If someone
uses the object as an array, it will return ordered hashes for each
row. However, if you just want the row information without columns,
just call #rows.
I know #select_rows would want to pass something down so that
a low level connection can do that leg work of passing down the correct
query option so the C code can build the object. I am also wondering if it
is pragmatic to initialize an object with everything partitioned too. In
Mysql2::Result and TinyTds::Result, these things are a bit lazy. They only
spring into existence when you start iterating over them with the query
parameters. It could be I was up way to late last night (i was BTW :) but at
first glance it looks like this wold be another layer in the way of the
raw_connection doing the legwork. I'd have to see more context and try to
build a subclass and see.
The other databases adapters are "lazy" too. They only create objects
as you iterate over the results in the cursor. The old mysql adapter
and the sqlite3 adapter return lists. PG returns a hash though which
is annoying because sometimes you need a list.
The bottom line though, is that you can compose a hash out of the
column and row info. Sometimes you need just the row info without the
column info, and in that case (if you're using hashes) you must do
work to extract that data. With the AR::Result object, we lazily
create the list and cache it.
I think it might be a good optimization. However, it seems that this
would only be a win in the case where the user fetches many rows but
only uses a few of them. I'm wondering how often that happens? What
do you think?
My gut feeling is this. AR wants a hash or an array of objects to build
other objects. Let's say that is a hash for 100 rows, those objects are
intermediary and will almost always be GC'ed right away.
The intermediate hashes are not thrown away. AR::Base holds on to
them and you can see that here:
http://github.com/rails/rails/blob/master/activerecord/lib/active_record/base.rb#L1423-1423
At least I think this is correct. ;-)
Why should the low
level connection be required to build that large object first just so it can
be iterated over and forgotten about. I look at it like the talk on view
flushing. I have no numbers on it might be a low gain in small practical
usage. But perhaps it has a bigger gain for bulk copies and lean
ActiveModels that want to do large migrations. There was a reason #find_each
was done in batches.rb. I still think the structure of AR should accommodate
this and my gut feeling is it would be a good performance gain.
I think you could accomplish this with the AR::Result object. Just as
pseudo code:
class MyAdapter
class MyResult < AR::Result
def initialize(cursor)
@cursor = cursor
end
def rows; @cursor.walk_over_rows; end
def each
@cursor.each_row { |row| build_hash(row) }
end
end
def exec(sql, name, binds)
MyAResult.new@real_connection.execute(sql)
end
end
Note that you defer the creation of extra objects until someone asks for them.
BTW, I'm having a hard time keeping up with your changes :) The Arel2 stuff
is going to be a ground zero rewrite for sql server. It took me 6 weeks last
time to pass all the AR tests. Keep it up, but man, your killing me!!!! :)
Let's keep talking, I really think this is a good idea and I would love to
see more. Right now I gotta figure out why 1.9.2 on AR 2.3.8 is slower that
1.8.6. What a night.
Good luck! That doesn't sound like fun. :'(