Skip to content

Instantly share code, notes, and snippets.

@magic-lantern
Last active September 18, 2020 03:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magic-lantern/3dae2a3bbb71a02f1eefa4b760b8fb3b to your computer and use it in GitHub Desktop.
Save magic-lantern/3dae2a3bbb71a02f1eefa4b760b8fb3b to your computer and use it in GitHub Desktop.
How to get specific rows within a group Google BigQuery

How to get specific rows within a group in Google BigQuery?

Example scenario: Need to get all encounter rows and summary information about their labs such as min/max, and the value most immediately before and after an event of interest.

The following code will setup an example scenario. Some people have no labs, others only have labs before or only after the event of interest.

CREATE OR REPLACE TABLE curation.test_encounter
(
  person_id INT64 NOT NULL,
  case_id INT64 NOT NULL,
  encounter_datetime DATETIME NOT NULL,
  other_info STRING
);

INSERT INTO curation.test_encounter VALUES (1, 1, '2012-12-12 17:00:00', 'blah');
INSERT INTO curation.test_encounter VALUES (1, 2, '2013-12-12 17:00:00', 'foo');
INSERT INTO curation.test_encounter VALUES (2, 3, '2014-12-12 17:00:00', 'bar');
INSERT INTO curation.test_encounter VALUES (3, 4, '2015-12-12 17:00:00', 'baz');
INSERT INTO curation.test_encounter VALUES (4, 5, '2016-12-12 17:00:00', 'tmp');

CREATE OR REPLACE TABLE curation.test_lab
(
  person_id INT64 NOT NULL,
  case_id INT64 NOT NULL,
  numeric_result FLOAT64,
  lab_datetime DATETIME,
  lab_details STRING
);

INSERT INTO curation.test_lab VALUES (1, 1, 1.15, '2012-12-09 17:00:00', 'blood');
INSERT INTO curation.test_lab VALUES (1, 1, 1.20, '2012-12-10 17:00:00', 'blood');
INSERT INTO curation.test_lab VALUES (1, 1, 1.25, '2012-12-11 17:00:00', 'blood');
INSERT INTO curation.test_lab VALUES (1, 2, 1.35, '2013-12-13 17:00:00', 'phlegm');
INSERT INTO curation.test_lab VALUES (2, 3, 1.40, '2014-12-11 17:00:00', 'ybile');
INSERT INTO curation.test_lab VALUES (2, 3, 1.45, '2014-12-11 20:00:00', 'ybile');
INSERT INTO curation.test_lab VALUES (2, 3, 1.55, '2014-12-13 17:00:00', 'bbile');
INSERT INTO curation.test_lab VALUES (3, 4, 1.65, '2015-12-13 17:00:00', 'blood');

Here's a query that gets part of the information:

SELECT
  e.person_id,
  e.case_id,
  MAX(numeric_result) AS max_result,
  MIN(numeric_result) AS min_result,
  '?' AS result_before_encounter,      -- how to get this by group?
  '?' AS result_after_encounter,       -- how to get this by group?
  COUNT(DISTINCT lab_datetime) AS num_result
FROM curation.test_encounter e
LEFT JOIN curation.test_lab l
ON e.case_id = l.case_id
GROUP BY e.person_id, e.case_id
case_id person_id max_result min_result num_results result_before result_after
1 1 1.25 1.15 3 ? ?
2 1 1.35 1.35 1 ? ?
3 2 1.55 1.4 3 ? ?
4 3 1.65 1.65 1 ? ?
5 4 NULL NULL 0 ? ?

How do I go from the above to what I really want?

case_id person_id max_result min_result num_results result_before result_after
1 1 1.25 1.15 3 1.25 NULL
2 1 1.35 1.35 1 NULL 1.35
3 2 1.55 1.4 3 1.45 1.55
4 3 1.65 1.65 1 NULL 1.65
5 4 NULL NULL 0 NULL NULL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment