sycobuny/a_bit_of_info.md

## a_bit_of_info.md

      
    Raw
  

              a_bit_of_info.md
            
          
    This query is meant to serve as a proof-of-concept for parsing sleep data off
of the fitbit API, which
returns you a start date, and then a series of times. Given that these data are
polled every minute, you can safely assume that, when you cross a date boundary,
the next measurement of time will be "smaller" than the last, and we could just
break it into two days.
However, in an extreme edge case where either the fitbit is reporting wrong, or
someone really is
sleeping for more than 24 hours,
we need to really break these data down a bit more consistently. This query
will do that, and then assign a proper date to the results so that it's a full
TIMESTAMP instead of a simple TIME reference.
The rd value is used to calculate what the result should be, for comparison.
Running this should result in a dataset similar to the following:
postgres@[local]:5432/fitbit-dev =# \i example.sql
┌─────┬─────────────────────┬─────────────────────┐
│ idx │    intended_date    │   calculated_date   │
├─────┼─────────────────────┼─────────────────────┤
│   1 │ 2016-06-21 11:52:00 │ 2016-06-21 11:52:00 │
│   2 │ 2016-06-22 00:52:00 │ 2016-06-22 00:52:00 │
│   3 │ 2016-06-22 10:52:00 │ 2016-06-22 10:52:00 │
│   4 │ 2016-06-22 23:52:00 │ 2016-06-22 23:52:00 │
│   5 │ 2016-06-23 00:52:00 │ 2016-06-23 00:52:00 │
└─────┴─────────────────────┴─────────────────────┘
(5 rows)


## example.sql
-- create a simple table for testing
CREATE TEMPORARY TABLE example (
  idx INTEGER PRIMARY KEY,
  ts TIME NOT NULL,
  rd DATE -- "real" date for later comparison
);

-- populate arbitrary times and intended dates
-- (note, in the real world these timestamps would be more predictable)
-- (and closer together)
INSERT INTO example (ts, idx, rd)
VALUES
  ('11:52:00', 1, '2016-06-21'),
  ('00:52:00', 2, '2016-06-22'),
  ('10:52:00', 3, '2016-06-22'),
  ('11:52:00', 4, '2016-06-22'),
  ('12:52:00', 5, '2016-06-23');

/**
 * This could be done much more cleanly and/or with an eye towards
 * efficiency, but this is merely a POC
 */
WITH
  st(d) AS (VALUES ('2016-06-21'::date)),
  ds AS (
    SELECT
      *,
      (
        lag(ts, 1, NULL) OVER a IS NOT NULL AND
        lag(ts, 1, NULL) OVER a > ts
      )::integer AS new_date
    FROM example
    WINDOW a AS (ORDER BY idx)
  ),
  rs AS (
    SELECT
      idx,
      rd::timestamp + ts::interval AS intended_date,
      (
        (SELECT d FROM st LIMIT 1) +
        ((SUM(new_date) OVER p)::integer::text || ' days')::interval
      ) + ts::interval AS calculated_date
    FROM ds
    WINDOW p AS (ORDER BY idx)
  )

SELECT *
FROM rs;
	-- create a simple table for testing
	CREATE TEMPORARY TABLE example (
	idx INTEGER PRIMARY KEY,
	ts TIME NOT NULL,
	rd DATE -- "real" date for later comparison
	);

	-- populate arbitrary times and intended dates
	-- (note, in the real world these timestamps would be more predictable)
	-- (and closer together)
	INSERT INTO example (ts, idx, rd)
	VALUES
	('11:52:00', 1, '2016-06-21'),
	('00:52:00', 2, '2016-06-22'),
	('10:52:00', 3, '2016-06-22'),
	('11:52:00', 4, '2016-06-22'),
	('12:52:00', 5, '2016-06-23');

	/**
	* This could be done much more cleanly and/or with an eye towards
	* efficiency, but this is merely a POC
	*/
	WITH
	st(d) AS (VALUES ('2016-06-21'::date)),
	ds AS (
	SELECT
	*,
	(
	lag(ts, 1, NULL) OVER a IS NOT NULL AND
	lag(ts, 1, NULL) OVER a > ts
	)::integer AS new_date
	FROM example
	WINDOW a AS (ORDER BY idx)
	),
	rs AS (
	SELECT
	idx,
	rd::timestamp + ts::interval AS intended_date,
	(
	(SELECT d FROM st LIMIT 1) +
	((SUM(new_date) OVER p)::integer::text \|\| ' days')::interval
	) + ts::interval AS calculated_date
	FROM ds
	WINDOW p AS (ORDER BY idx)
	)

	SELECT *
	FROM rs;