Skip to content

Instantly share code, notes, and snippets.

@jamescalam
Created February 22, 2020 20:50
Show Gist options
  • Save jamescalam/9a5008f064516d5fb4177362a03e2594 to your computer and use it in GitHub Desktop.
Save jamescalam/9a5008f064516d5fb4177362a03e2594 to your computer and use it in GitHub Desktop.
import sys
sys.path.insert(0, r'C:\\User\medium\pysqlplus\lib')
import os
from data import Sql
sql = Sql('database123') # initialise the Sql object
directory = r'C:\\User\medium\data\\' # this is where our generic data is stored
file_list = os.listdir(directory) # get a list of all files
for file in file_list: # loop to import files to sql
df = pd.read_csv(directory+file) # read file to dataframe
sql.push_dataframe(df, file[:-4])
# now we convert our file_list names into the table names we have imported to SQL
table_names = [x[:-4] for x in file_list]
sql.union(table_names, 'generic_jan') # union our files into one new table called 'generic_jan'
sql.drop(table_names) # drop our original tables as we now have full table
# get list of categories in colX, eg ['hr', 'finance', 'tech', 'c_suite']
sets = list(sql.manual("SELECT colX AS 'category' FROM generic_jan GROUP BY colX", response=True)['category'])
for category in sets:
sql.manual("SELECT * INTO generic_jan_"+category+" FROM generic_jan WHERE colX = '"+category+"'")
@toconnor9
Copy link

I had a difficult time getting more than one field back from the 'sql.manual' method. I was wondering if anyone could look at my code and give me some feedback. It works, but is there a better way?

from mssql_connection_short import Sql
from datetime import date, datetime

sql = Sql('AdventureWorks2017')

SQLcommand = """SELECT TOP 20 P.ProductID, P.Name, P.Weight, P.ModifiedDate
FROM   AdventureWorks2017.Production.Product p (NOLOCK)
WHERE  P.weight > 0"""

product_data = sql.manual(SQLcommand, response=True).to_records(
    column_dtypes={
        "ProductID": "int64",
        "Name": "object",
        "Weight": "float64",
        "ModifiedDate": "datetime64[ns]"
    })

for row2 in product_data:
    print(
        str(row2.ProductID) + "\t" + row2.Name + "\t" + str(row2.Weight) +
        "\t" + str(row2.ModifiedDate))

@jamescalam
Copy link
Author

jamescalam commented Feb 28, 2020

Hi toconner9, I've adjusted your code to use pandas dataframe, which is the object type that my code exports the data from your query as, maybe this works for you?

I had a difficult time getting more than one field back from the 'sql.manual' method. I was wondering if anyone could look at my code and give me some feedback. It works, but is there a better way?

from mssql_connection_short import Sql
from datetime import date, datetime
import pandas as pd  # here we import the pandas modules so we can work with dataframes

sql = Sql('AdventureWorks2017')

SQLcommand = """SELECT TOP 20 P.ProductID, P.Name, P.Weight, P.ModifiedDate
FROM   AdventureWorks2017.Production.Product p (NOLOCK)
WHERE  P.weight > 0"""

product_data = sql.manual(SQLcommand, response=True).astype(
    dtype={
        "ProductID": "int64",
        "Name": "object",
        "Weight": "float64",
        "ModifiedDate": "datetime64[ns]"
    })  # here we are converting datatypes using pandas dataframe

# to print all columns and rows to the console, must adjust pandas display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

print(product_data)  # then simply print

Printing the dataframe isn't always the easiest, so maybe (depending on what you are using it for) it would be easier to save to Excel or csv?

product_data.to_excel('output.xlsx', index=False) # index = False just prevents an index column from being included in your output
or
product_data.to_csv('output.csv', sep='|') # I usually set the delimiter (sep) to |, but can use any character

Also, there is documentation for using it all here if that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment