Skip to content

Instantly share code, notes, and snippets.

@arm5077
Created April 23, 2014 18:43
Show Gist options
  • Save arm5077/11227658 to your computer and use it in GitHub Desktop.
Save arm5077/11227658 to your computer and use it in GitHub Desktop.
Medicare doctor charge database query
SELECT medicare.*,
peerAverage.average AS charge_average,
( medicare.average_submitted_chrg_amt - peerAverage.average ) /
peerAverage.average * 100 AS difference_from_average,
totalAverage.total_difference_average
FROM medicare
JOIN (SELECT Avg(average_submitted_chrg_amt) AS average,
hcpcs_code,
place_of_service
FROM medicare
WHERE npi LIKE '%'
AND hcpcs_code LIKE medicare.hcpcs_code
AND nppes_provider_city LIKE 'Latrobe'
AND nppes_provider_state LIKE 'PA'
GROUP BY place_of_service,
hcpcs_code) AS peerAverage
ON medicare.hcpcs_code = peerAverage.hcpcs_code
AND medicare.place_of_service = peerAverage.place_of_service
/* This join basically repeats everything and to get total percentage average by doctor */
JOIN (SELECT npi,
Avg(( medicare.average_submitted_chrg_amt -
peerAverage.average ) /
peerAverage.average * 100) AS
total_difference_average
FROM medicare
JOIN (SELECT Avg(average_submitted_chrg_amt) AS average,
hcpcs_code,
place_of_service
FROM medicare
WHERE npi LIKE '%'
AND hcpcs_code LIKE medicare.hcpcs_code
AND nppes_provider_city LIKE 'Pittsburgh'
AND nppes_provider_state LIKE 'PA'
GROUP BY place_of_service,
hcpcs_code) AS peerAverage
ON medicare.hcpcs_code = peerAverage.hcpcs_code
AND medicare.place_of_service =
peerAverage.place_of_service
WHERE medicare.npi LIKE '%'
AND nppes_provider_last_org_name LIKE '%'
AND nppes_provider_first_name LIKE '%'
AND nppes_provider_city LIKE 'Pittsburgh'
AND nppes_provider_state LIKE 'PA'
AND medicare.hcpcs_code LIKE '%'
GROUP BY npi) AS totalAverage
ON medicare.npi = totalAverage.npi
GROUP BY npi,
place_of_service,
hcpcs_code
ORDER BY hcpcs_code ASC,
difference_from_average DESC
@arm5077
Copy link
Author

arm5077 commented Apr 24, 2014

Thanks to both of you.

Here's a preliminary query I've worked up using your advice. @seejee, I created the new location table like you suggested and join to it in this query. @mhowee0422, I've eliminated all the extra "LIKEs" and converted the ones I have into straight "=".

This doesn't yet deal with the "average of averages," which I'm not quite sure how to tackle -- will have to noodle on that some more. I must be doing something wrong with my joins (all joined fields are indexed) because this query is actually running slower than an older version of my previous query that didn't have the double-nested subquery.

SELECT medicare.*, 
       medicare_averages_by_hcpcs_and_place_of_service.average_charge,
       Avg((medicare.average_submitted_chrg_amt - medicare_averages_by_hcpcs_and_place_of_service.average_charge)/medicare_averages_by_hcpcs_and_place_of_service.average_charge * 100) as percent_difference
FROM   medicare 
       JOIN medicare_averages_by_hcpcs_and_place_of_service 
         ON medicare.hcpcs_code = 
       medicare_averages_by_hcpcs_and_place_of_service.hcpcs_code 
            AND medicare.place_of_service = 
            medicare_averages_by_hcpcs_and_place_of_service.place_of_service 
            AND medicare.nppes_provider_city = 
            medicare_averages_by_hcpcs_and_place_of_service.nppes_provider_city 
            AND medicare.nppes_provider_state = 
            medicare_averages_by_hcpcs_and_place_of_service.nppes_provider_state 
WHERE  medicare.nppes_provider_city = 'PITTSBURGH' 
       AND medicare.nppes_provider_state = 'PA' 
GROUP  BY npi, 
          place_of_service, 
          hcpcs_code 
ORDER  BY hcpcs_code ASC

@arm5077
Copy link
Author

arm5077 commented Apr 24, 2014

For reference, here's my SHOW CREATE TABLEs for both tables.

Main database

CREATE TABLE `medicare` (
  `npi` varchar(10) DEFAULT NULL,
  `nppes_provider_last_org_name` varchar(70) DEFAULT NULL,
  `nppes_provider_first_name` varchar(20) DEFAULT NULL,
  `nppes_provider_mi` tinytext,
  `nppes_credentials` tinytext,
  `nppes_provider_gender` tinytext,
  `nppes_entity_code` tinytext,
  `nppes_provider_street1` tinytext,
  `nppes_provider_street2` tinytext,
  `nppes_provider_city` varchar(40) DEFAULT NULL,
  `nppes_provider_zip` varchar(20) DEFAULT NULL,
  `nppes_provider_state` varchar(2) DEFAULT NULL,
  `nppes_provider_country` tinytext,
  `provider_type` tinytext,
  `medicare_participation_indicator` tinytext,
  `place_of_service` varchar(2) DEFAULT NULL,
  `hcpcs_code` varchar(10) DEFAULT NULL,
  `hcpcs_description` tinytext,
  `line_srvc_cnt` int(11) DEFAULT NULL,
  `bene_unique_cnt` int(11) DEFAULT NULL,
  `bene_day_srvc_cnt` int(11) DEFAULT NULL,
  `average_Medicare_allowed_amt` float DEFAULT NULL,
  `stdev_Medicare_allowed_amt` float DEFAULT NULL,
  `average_submitted_chrg_amt` float DEFAULT NULL,
  `stdev_submitted_chrg_amt` float DEFAULT NULL,
  `average_Medicare_payment_amt` float DEFAULT NULL,
  `stdev_Medicare_payment_amt` float DEFAULT NULL,
  KEY `nppes_provider_first_name` (`nppes_provider_first_name`),
  KEY `nppes_provider_last_org_name` (`nppes_provider_last_org_name`),
  KEY `nppes_provider_city` (`nppes_provider_city`),
  KEY `nppes_provider_zip` (`nppes_provider_zip`),
  KEY `nppes_provider_state` (`nppes_provider_state`),
  KEY `npi` (`npi`),
  KEY `average_submitted_chrg_amt` (`average_submitted_chrg_amt`),
  KEY `place_of_service` (`place_of_service`),
  KEY `hcpcs_code` (`hcpcs_code`),
  KEY `stdev_submitted_chrg_amt` (`stdev_submitted_chrg_amt`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='How much doctors charge Medicare/Medicaid for services'

Average costs by facility type, city and state:

CREATE TABLE `medicare_averages_by_hcpcs_and_place_of_service` (
  `hcpcs_code` varchar(10) DEFAULT NULL,
  `place_of_service` varchar(2) DEFAULT NULL,
  `nppes_provider_city` varchar(40) DEFAULT NULL,
  `nppes_provider_state` varchar(2) DEFAULT NULL,
  `average_charge` float DEFAULT NULL,
  KEY `hcpcs_code` (`hcpcs_code`,`place_of_service`),
  KEY `nppes_provider_city` (`nppes_provider_city`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

@seejee
Copy link

seejee commented Apr 24, 2014

@arm5077 Try throwing an index on the nppes_provider_state column of the averages table as well. It's the only thing I can think of that might be slowing it down. How many rows ended up in the averages table?

I'm more familiar with postgres, so I might pull this down into a postgres db and give it a try. I also have access to some Really Honking Powerful Postgres servers, so if i get it working locally, I might give the query a try there as well.

@arm5077
Copy link
Author

arm5077 commented Apr 25, 2014

@seejee Will try the index. The averages table has 2 million rows ... more than I expected, but logical, I guess -- you're dealing with 6,000 procedures across 50 states and 18,000 cities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment