This questions is a db design/optimization question. We're going to start off with some background. At UCB users connect to the campus wifi system via an access point. Connecting to an access point creates a wifi_accounting event. Throughout a user's time on wifi, wifi_accounting events will continue be created every 5-30 seconds. On average, we get about 20 million such events per day.
The following table represents the data we capture for each wifi_accounting event:
wifi_accounting
------------------------------
ip INET
username VARCHAR
mac MACADDR
session_ident VARCHAR
radius_ts TIMESTAMPTZ
ip: IP Address assigned to the user
username: the username of the user
mac: the mac address of the user's ethernet card
session_ident: string the identifies the user's wifi session
radius_ts: timestamp of the accounting event
Our IDS system generates alerts for network connections on the wifi network. These alerts always contain an IP and a TIMESTAMP. If we are given such an alert, we can take the IP and the TIMESTAMP and figure out the user on the network with the following query:
WITH start_records as (
SELECT DISTINCT ON (session_ident, ip)
session_ident, ip, username, mac, radius_ts
FROM wifi_accounting
WHERE ip = '<ip>'
AND radius_ts between '<beginning_of_day>' and '<end_of_day>'
ORDER BY session_ident, ip, radius_ts asc
), stop_records AS (
SELECT DISTINCT ON (session_ident, ip)
session_ident, ip, username, mac, radius_ts
FROM wifi_accounting
WHERE ip = '<ip.to_s>'
AND radius_ts between '<beginning_of_day>' and '<end_of_day>'
ORDER BY session_ident, ip, radius_ts desc
)
SELECT start_records.session_ident
start_records.ip
start_records.username,
start_records.mac
start_records.radius_ts as start_ts,
stop_records.radius_ts as stop_ts
FROM start_records
INNER JOIN stop_records ON
(start_records.session_ident, start_records.ip) =
(stop_records.session_ident, stop_records.ip)
WHERE '<ts>' between start_ts and stop_ts
AND 'ip' = '<ip>'
ORDER BY start_records.session_ident, stop_records.ip, stop_records.radius_ts desc;
However, as time goes on, the performance of this query starts to degrade from seconds to minutes.
What are some changes you would make to improve performance of looking up a user's wifi session.