Skip to content

Instantly share code, notes, and snippets.

@drincruz
Created August 22, 2014 16:17
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drincruz/b3904cc338fc60a19f44 to your computer and use it in GitHub Desktop.
Save drincruz/b3904cc338fc60a19f44 to your computer and use it in GitHub Desktop.
Simple example of using a Python generator to fetch a row from a MySQL database
#!/usr/bin/env python
"""
Here is a simple example of how you can
use a generator to fetch a row from a database
"""
import MySQLdb
def subscriber_generator():
"""
Simple generator example to return a row
from a MySQL database
"""
# Simplicity purposes, we're connecting locally
db = MySQLdb.connect(user='root',db='your_db_name')
# Get a cursor
cursor = db.cursor()
# Execute your MySQL query
cursor.execute(
'SELECT id,first_name FROM subscribers_subscribers '
+ 'WHERE id > %s AND id < %s '
+ 'ORDER BY first_name ASC', (1234,1334))
# Iterate through fetchall(), but yield the row
for row in cursor.fetchall():
yield row
def main():
"""
Main
"""
# You can use the next() method to get a row
print(subscriber_generator().next())
# Or you can iterate through all with a for loop
for row in subscriber_generator():
print(row)
if '__main__' == __name__:
main()
@madanaman
Copy link

This is really helpful. Thank you

@drincruz
Copy link
Author

This is really helpful. Thank you

I'm glad you found it useful! 🙌

@odravison
Copy link

Hi @drincruz , very good code, nice contribution to public code community.

I'm learning about generators and its best usage of memory resource and I have a doubt:
About line 32: Doesn't .fetchall() conclude the SQL query and get all the result to a list?
If so, Isn't the same as using a list at all?

@drincruz
Copy link
Author

@odravison thank you! 🙏 it's a bit old, but i'm glad it's useful.

About line 32: Doesn't .fetchall() conclude the SQL query and get all the result to a list?
If so, Isn't the same as using a list at all?

This is a very good question! 💯 I think the important parts for generators are that they get lazy loaded; meaning if we didn't have lines 44:45, they won't get called at all. a more better-versed python expert can probably explain it better than me, but here's a simple example i'll try to explain it with below.

Consider this piece of code, yield_data.py here:

yield_data.py
import pymysql.cursors

from cProfile import Profile

def connect():
    # Connect to the database
    connection = pymysql.connect(
        host='127.0.0.1',
        user='root',
        password='secret',
        port=3306,
        database='mydb'
    )
    return connection

def get_cursor(connection):
    # Create a cursor object
    cursor = connection.cursor()
    return cursor

def yield_all_rows(connection):
    cursor = connection.cursor()
    query = "SELECT * FROM my_table"
    cursor.execute(query)
    for row in cursor.fetchall():
        yield row
    cursor.close()


def commit(connection):
    # Commit the changes
    connection.commit()

def close(cursor, connection):
    # Close the cursor and connection
    cursor.close()
    connection.close()

def main():
    connection = connect()
    p = Profile()
    p.enable()
    yield_all_rows(connection)
    p.disable()
    p.print_stats(sort='ncalls')
    connection.close()

if __name__ == '__main__':
    main()

Since we're using a generator here, we're lazy loading the data (on demand), so we actually don't use much. We are caling yield_all_rows, but we're not doing anything with the data. So, our profiler gives us the following:

python src/yield_data.py
         1 function calls in 0.000 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

But if we update the call to actually print out values, we actually see the query calls go through. Consider this updated code for yield_data.py

def main():
    connection = connect()
    p = Profile()
    p.enable()
    for row in yield_all_rows(connection):
        print(row)
    p.disable()
    p.print_stats(sort='ncalls')
    connection.close()

If we run the program now with the profiler, we'll see the following

yield_data.py profiler results
$ python src/yield_data.py                                                                                                                     [24/228]
         760569 function calls in 0.203 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100046    0.006    0.000    0.006    0.000 {built-in method builtins.len}
    80017    0.008    0.000    0.008    0.000 {method 'append' of 'list' objects}
    60025    0.010    0.000    0.016    0.000 protocol.py:147(read_length_encoded_integer)
    60025    0.006    0.000    0.006    0.000 protocol.py:114(read_uint8)
    60024    0.017    0.000    0.051    0.000 protocol.py:165(read_length_coded_string)
    60024    0.015    0.000    0.018    0.000 protocol.py:62(read)
    60018    0.008    0.000    0.008    0.000 {method 'decode' of 'bytes' objects}
    40017    0.016    0.000    0.016    0.000 {method 'settimeout' of '_socket.socket' objects}
    40016    0.007    0.000    0.019    0.000 {method 'read' of '_io.BufferedReader' objects}
    40016    0.018    0.000    0.055    0.000 connections.py:775(_read_bytes)
    20008    0.003    0.000    0.003    0.000 {built-in method _struct.unpack}
    20008    0.002    0.000    0.002    0.000 protocol.py:55(__init__)
    20008    0.002    0.000    0.002    0.000 protocol.py:208(is_error_packet)
    20008    0.025    0.000    0.087    0.000 connections.py:730(_read_packet)
    20005    0.002    0.000    0.002    0.000 protocol.py:187(is_eof_packet)
    20003    0.003    0.000    0.005    0.000 connections.py:1268(_check_packet_is_eof)
    20003    0.001    0.000    0.203    0.000 yield_data.py:21(yield_all_rows)
    20002    0.031    0.000    0.096    0.000 connections.py:1340(_read_row_from_packet)
       46    0.000    0.000    0.000    0.000 socket.py:732(readable)
       46    0.000    0.000    0.012    0.000 socket.py:693(readinto)
       46    0.012    0.000    0.012    0.000 {method 'recv_into' of '_socket.socket' objects}
       46    0.000    0.000    0.000    0.000 {method '_checkClosed' of '_io._IOBase' objects}
       46    0.000    0.000    0.000    0.000 {method '_checkReadable' of '_io._IOBase' objects}
        6    0.000    0.000    0.000    0.000 protocol.py:271(get_column_length)
        5    0.000    0.000    0.000    0.000 cursors.py:65(_get_db)
        5    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        4    0.000    0.000    0.000    0.000 {method 'unpack_from' of '_struct.Struct' objects}
        4    0.000    0.000    0.000    0.000 protocol.py:177(read_struct)
        3    0.000    0.000    0.000    0.000 protocol.py:234(__init__)
        3    0.000    0.000    0.000    0.000 protocol.py:259(description)
        3    0.000    0.000    0.000    0.000 protocol.py:238(_parse_field_descriptor)
        2    0.000    0.000    0.000    0.000 cursors.py:83(_nextset)
        2    0.000    0.000    0.000    0.000 cursors.py:97(nextset)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.203    0.203 connections.py:810(_read_query_result)
        1    0.000    0.000    0.203    0.203 connections.py:1198(read)
        1    0.000    0.000    0.000    0.000 connections.py:539(cursor)
        1    0.000    0.000    0.203    0.203 connections.py:552(query)
        1    0.000    0.000    0.000    0.000 connections.py:1178(__init__)
        1    0.000    0.000    0.000    0.000 connections.py:800(_write_bytes)
        1    0.013    0.013    0.191    0.191 connections.py:1327(_read_rowdata_packet)
        1    0.000    0.000    0.191    0.191 connections.py:1281(_read_result_packet)
        1    0.000    0.000    0.000    0.000 {built-in method _struct.pack}
        1    0.000    0.000    0.000    0.000 {method 'sendall' of '_socket.socket' objects}
        1    0.000    0.000    0.000    0.000 cursors.py:294(fetchall)
        1    0.000    0.000    0.203    0.203 cursors.py:133(execute)
        1    0.000    0.000    0.000    0.000 cursors.py:336(_do_get_result)
        1    0.000    0.000    0.000    0.000 cursors.py:34(__init__)
        1    0.000    0.000    0.000    0.000 cursors.py:45(close)
        1    0.000    0.000    0.000    0.000 cursors.py:110(mogrify)
        1    0.000    0.000    0.000    0.000 protocol.py:183(is_ok_packet)
        1    0.000    0.000    0.203    0.203 cursors.py:319(_query)
        1    0.000    0.000    0.000    0.000 protocol.py:323(__init__)
        1    0.000    0.000    0.000    0.000 cursors.py:326(_clear_result)
        1    0.000    0.000    0.000    0.000 cursors.py:70(_check_executed)
        1    0.000    0.000    0.000    0.000 protocol.py:205(is_load_local_packet)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'encode' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        1    0.000    0.000    0.000    0.000 connections.py:834(_execute_command)
        1    0.000    0.000    0.000    0.000 connections.py:1359(_get_descriptions)

To get back to your point, this last snippet here, will look like a similar code call as if we were just using a list to store the data. So, I guess it really just depends on your needs for data if you wanted to store them in memory or on-demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment