Skip to content

Instantly share code, notes, and snippets.

@sahglie
Last active October 4, 2022 16:33
Show Gist options
  • Save sahglie/eaafb8b9e18b435512136c9dc8538198 to your computer and use it in GitHub Desktop.
Save sahglie/eaafb8b9e18b435512136c9dc8538198 to your computer and use it in GitHub Desktop.
DB Optimization

This questions is a db design/optimization question. We're going to start off with some background. At UCB users connect to the campus wifi system via an access point. Connecting to an access point creates a wifi_accounting event. Throughout a user's time on wifi, wifi_accounting events will continue be created every 5-30 seconds. On average, we get about 20 million such events per day.

The following table represents the data we capture for each wifi_accounting event:

wifi_accounting
------------------------------
ip              INET
username        VARCHAR
mac             MACADDR
session_ident   VARCHAR
radius_ts       TIMESTAMPTZ

ip:            IP Address assigned to the user
username:      the username of the user
mac:           the mac address of the user's ethernet card
session_ident: string the identifies the user's wifi session
radius_ts:     timestamp of the accounting event

Our IDS system generates alerts for network connections on the wifi network. These alerts always contain an IP and a TIMESTAMP. If we are given such an alert, we can take the IP and the TIMESTAMP and figure out the user on the network with the following query:

WITH start_records as (
  SELECT DISTINCT ON (session_ident, ip)
         session_ident, ip, username, mac, radius_ts
  FROM wifi_accounting
  WHERE ip = '<ip>'
    AND radius_ts between '<beginning_of_day>' and '<end_of_day>'
  ORDER BY session_ident, ip, radius_ts asc
), stop_records AS (
  SELECT DISTINCT ON (session_ident, ip)
         session_ident, ip, username, mac, radius_ts
  FROM wifi_accounting
  WHERE ip = '<ip.to_s>'
    AND radius_ts between '<beginning_of_day>' and '<end_of_day>'
  ORDER BY session_ident, ip, radius_ts desc
)

SELECT start_records.session_ident
       start_records.ip
       start_records.username,
       start_records.mac
       start_records.radius_ts as start_ts,
       stop_records.radius_ts as stop_ts
FROM start_records
INNER JOIN stop_records ON
  (start_records.session_ident, start_records.ip) =
  (stop_records.session_ident, stop_records.ip)
WHERE '<ts>' between start_ts and stop_ts
AND   'ip' = '<ip>'
ORDER BY start_records.session_ident, stop_records.ip, stop_records.radius_ts desc;

However, as time goes on, the performance of this query starts to degrade from seconds to minutes.

What are some changes you would make to improve performance of looking up a user's wifi session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment