Skip to content

Instantly share code, notes, and snippets.

@son0fhobs
Last active May 9, 2018 16:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save son0fhobs/4be61837d130ab49f414ad0a7b209d7b to your computer and use it in GitHub Desktop.
Save son0fhobs/4be61837d130ab49f414ad0a7b209d7b to your computer and use it in GitHub Desktop.
Scrape Drug Ingredients
<?php
/*
Don't do unless relevant - all drugs convert.
All drugs https://www.rxlist.com/drugs/alpha_c.htm
- Scrape each page.
- Doesn't tell you if brand or generic.
- Remove half results, FDA and Multum sources. Compare? Save both?
DailyMed autocomplete ajax? Use that instead. Below for brand to generic searching.
- https://dailymed.nlm.nih.gov/dailymed/autocomplete.cfm?key=search&returntype=json&term=m
return val[] = [{"value":"MOBIC"},{"value":"MOBILITY TOPICAL ANALGESIC"}] - modify data with php?
Consider design - scroll https://baymard.com/blog/autocomplete-design
Auto Complete - http://easyautocomplete.com/guide#sec-data-providers
page - http://www.emedexpert.com/lists/brand-generic.shtml
request - http://www.emedexpert.com/lists/bg.php?modafinil,143_timestamp // make sure doesn't cache request
1k genric/brand? http://www.emedexpert.com/lists/bg.php
// copy results to own page.
Javascript autocomplete
onkeyup="showbrgen(this.value)"
<div id="brtab">Where results show</div>
// when create new post? Consistently load everything so it's cached?
Shortcode for initial page with search.
redirect to relevant page - copy code into random template.
Make relevant page. Redirect there. Code into random template. Or filter into the_content.
Try to make as flexible as possible for different API or scrapers.
Create post for each drug.
Redirect to post.
1. Save all drugs saved and time to global settings, also post id.
2. Save time scraped for each manufacterer (link, maker, time, ingredients)
3. Each drug goes to a different post.
4. Ajax. Get x number of drugs at a time. append rather than iterate. Search dates scraped to update accordingly.
5. Datatables - csv. Save to CSV File. Then print on page? Don't print, that'll encourage them to focus on it there. Beta.
Datatables
- Allow cross out, filter away all drugs with specific ingredients. Premade - all colorings, artificial, fibers, lactose, etc.
- Allow highlight rows? Why? In case no drug that fits all ideals.
- Manually remove rows, reorder?
Table
Drug - post
manufacerer, ingredient.
each row - drug ID (rxnorm thing) drug, manufacterer, url, date scraped, ingredients,
Table - Drug with list of manufacterers and links, row id where data stored. List scrape date.
Processed with dates, unprocessessed. drug, link, process date, ingredients
Duplicate, not helpful. Save with post. Stay with full data saved - Save manufacterer list to database, don't need to scrape each time.
Go through glossery, one at a time, update scrape date.
Date scraped - include oldest in all ingredients scraped.
javascript functions
** Request brand name and generic?
// php before page load
scrape_initial_directory_page(){
// check if page exists. If not, mention bad spelling, drug doesn't exist.
// get page
// total number results
// total number pages.
// Load html page. Send back ajax requests for full directory.
}
javascript
// immediate ajax for all directory pages.
php - ajax
get_directory_pages(){
// scrape each directory page for manufacterers
// send back list of manufacterers
// drug_makers = array(array('maker'=>'','url'=>'', 'categories'=>array(),'drug_name'=>'', 'scrape_time'=>''), array());
// return json_encode(drug_makers):
}
javascript
process_drug_directory()
combine_maker_directories() // all makers from each directory page.
process_drug_directory() // combine all drug types, including total number. Make into table. Display.
// make into object
dir_pages_to_drug_maker_list() // save total number
name_to_category_data() // tablet, capsule, delayed release, etc.
format_data_for_table() // checkmarks for table, capsule, etc? Because more than one?
display_name_options_table()
user_options_to_ajax() // checked which drugs are relevant.
num_dir_pages:1,
directory_urls:[],
num_manufacturers:1,
drug_makers:[{maker:'',url:'', categories:[],drug_name:'', scrape_time:'', inactive_ingredients:[]},{}] // data for later
User Submit Relevent Drug Options
user_filter_drug_makers() - user selection to filter makers array.
ajax_filtered_drug_makers() - pass each maker array to ajax, retreive and scrape.
php
scrape_maker_inactive_ingredients() scrape only one drug maker at a time. Return array of just that maker and inactive ingredients.
drug_data()
get_ajax_make_ingredients()
format_ingredients_for_table()
display_ingredients_table()
init_datatables();
// append_ajax_data function below
Datatables - export to csv.
update_database(); // once table display, ajax data to save to post meta tables
// update database on every ajax call? Safer, but longer. Only after return data?
Methods different, what about properties?
Makers, urls.
var object = {
sum: function(foo, bar) {
return foo + bar;
},
prop:''
};
---
https://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
All drugs https://www.rxlist.com/drugs/alpha_c.htm
Create custom post type that uses custom template?
single-mycustomposttype.php
single-drugingredients.php
Use in theme for now. Send to custom-functions.php. Include from template file. custom folders for vendor, js, css.
Wrap in plugin boilerplate later? Harder than I expected.
https://wppb.io/
https://github.com/devinvinson/WordPress-Plugin-Boilerplate/
Custom post type - Ingredients
- custom template - https://wordpress.stackexchange.com/questions/3396/create-custom-page-templates-with-plugins
- Shortcode for search. Redirect to custom template on search.
Mean time - drug with post. Save metadata.
Don't use custom tables.
*/
/*
- pass all makers to javascript. Pass them back to ajax. Track how many finished.
- Save to database each time? Unecessary amount of calls. Save energy, keep passing to javascript. Once all done, pass to ajax and save.
-
- first scrape - directory page, url's for all other directory pages (pagination), then load html to show page. The rest ajax.
- Send only url? Url and' maker?
- can I show without all ingredients? Refresh entire table each time? Every 10?
- pass up all ingredients each time?
pass up the link.
Pass back link and ingredients.
https://stackoverflow.com/questions/1060539/parallel-asynchronous-ajax-requests-using-jquery
// what if timeout? rescrape entire thing? Or save time for each and rescrape only relevant ones?
// keep each maker with all data together.
// ajax_data = {maker:'',link:'',ingredients:{}, scrape_time:''}; // send one request at a time. Append to array at end.
function urls_for_glossery_pages(){
$total_pages = (int)->find('total-pages',0).plaintext; // number pages, increment to create urls.
}
function append_ingredients_to_maker(drugs_parts_array, ajax_data){
var i=0;
var max=drugs_parts_array.length;
for(i=0;i<max;i++){
// drugs_parts_array[i] = ajax_data_compare(drugs_parts_array[i], ajax_data);
}
}
function append_ajax_data(drug_parts_row, ajax_data){
var a = 0;
a_max = ajax_data.length;
for(a;a<a_max;a++){
if(ajax_data[a].link == drug_parts_row.link){
drug_parts_row.ingredients = ajax_data[a].ingredients;
}
}
return drug_parts_row;
}
drugs_parts_array = [
{
'maker' :'',
'link' :'',
'ingredients' :{},
'scrape_time' :'',
},
{
}
]
*/
// Don't worry about custom table for now. Need different post URL for SEO reasons.
function array_to_csv_file($array_csv){
$file_url = plugin_dir_path( __FILE__ ) . '/errors.txt';
$file = fopen( $file_url, "a" );
foreach ($array_csv as $csv_row) {
fputcsv($file, $csv_row);
}
fclose( $file );
}
$output = fopen("php://output",'w') or die("Can't open php://output");
header("Content-Type:application/csv");
header("Content-Disposition:attachment;filename=pressurecsv.csv");
fputcsv($output, array('id','name','description'));
foreach($prod as $product) {
fputcsv($output, $product);
}
fclose($output) or die("Can't close php://output");
// select entire row.
register_activation_hook( __FILE__, 'my_plugin_create_db' );
function my_plugin_create_db() {
global $wpdb;
$charset_collate = $wpdb->get_charset_collate();
$table_name = $wpdb->prefix . 'inactive_ingredients';
$sql = "CREATE TABLE $table_name (
id mediumint(9) NOT NULL AUTO_INCREMENT,
time datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
drug VARCHAR(200),
url VARCHAR(512),
maker VARCHAR(200) NOT NULL,
ingredients TEXT NOT NULL,
UNIQUE KEY id (id)
) $charset_collate;";
require_once( ABSPATH . 'wp-admin/includes/upgrade.php' );
dbDelta( $sql );
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment