Skip to content

Instantly share code, notes, and snippets.

@magic-lantern
Last active September 18, 2020 03:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magic-lantern/a8443100219f74371a10c37a562d0f14 to your computer and use it in GitHub Desktop.
Save magic-lantern/a8443100219f74371a10c37a562d0f14 to your computer and use it in GitHub Desktop.
How to get specific rows within a group Google BigQuery

How to get specific rows within a group in generic SQL (SQLite)?

Example scenario: Need to get all encounter rows and summary information about their labs such as min/max, and the value most immediately before and after an event of interest.

The following code will setup an example scenario. Some people have no labs, others only have labs before or only after the event of interest.

SQL Fiddle available at http://sqlfiddle.com/#!5/f80104/1/0

CREATE OR REPLACE TABLE encounter
(
  person_id INT NOT NULL,
  case_id INT NOT NULL,
  encounter_datetime DATETIME NOT NULL,
  other_info VARCHAR
);

INSERT INTO encounter VALUES (1, 1, '2012-12-12 17:00:00', 'blah');
INSERT INTO encounter VALUES (1, 2, '2013-12-12 17:00:00', 'foo');
INSERT INTO encounter VALUES (2, 3, '2014-12-12 17:00:00', 'bar');
INSERT INTO encounter VALUES (3, 4, '2015-12-12 17:00:00', 'baz');

CREATE OR REPLACE TABLE lab
(
  person_id INT NOT NULL,
  case_id INT NOT NULL,
  numeric_result FLOAT,
  lab_datetime DATETIME,
  lab_details VARCHAR
);

INSERT INTO lab VALUES (1, 1, 1.15, '2012-12-09 17:00:00', 'blood');
INSERT INTO lab VALUES (1, 1, 1.20, '2012-12-10 17:00:00', 'blood');
INSERT INTO lab VALUES (1, 1, 1.25, '2012-12-11 17:00:00', 'blood');
INSERT INTO lab VALUES (1, 2, 1.35, '2013-12-13 17:00:00', 'phlegm');
INSERT INTO lab VALUES (2, 3, 1.40, '2014-12-11 17:00:00', 'ybile');
INSERT INTO lab VALUES (2, 3, 1.45, '2014-12-11 20:00:00', 'ybile');
INSERT INTO lab VALUES (2, 3, 1.55, '2014-12-13 17:00:00', 'bbile');
INSERT INTO lab VALUES (3, 4, 1.65, '2015-12-13 17:00:00', 'blood');

Here's a query that gets part of the information:

select
  e.person_id,
  e.case_id,
  max(numeric_result) as max_result,
  min(numeric_result) as min_result,
  '?' as result_before_encounter,       -- how to get this by group
  '?' as result_after_encounter,        -- how to get this by group
  count(distinct lab_datetime) as num_result
from encounter e
  left join lab l
  on e.case_id = l.case_id
group by e.person_id, e.case_id
case_id person_id max_result min_result num_results result_before result_after
1 1 1.25 1.15 3 ? ?
2 1 1.35 1.35 1 ? ?
3 2 1.55 1.4 3 ? ?
4 3 1.65 1.65 1 ? ?
5 4 NULL NULL 0 ? ?

How do I go from the above to what I really want?

case_id person_id max_result min_result num_results result_before result_after
1 1 1.25 1.15 3 1.25 NULL
2 1 1.35 1.35 1 NULL 1.35
3 2 1.55 1.4 3 1.45 1.55
4 3 1.65 1.65 1 NULL 1.65
5 4 NULL NULL 0 NULL NULL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment