An agent observes a simulated user interacting with a web app across multiple sessions. Each user has a hidden behavioral profile — a fixed combination of traits like navigation style, error tendency, help-seeking behavior, and shortcut usage. The agent receives partial interaction logs and predicts the user's next actions. A learning agent should improve over sessions; a stateless baseline stays flat.
Each instance: agent receives a partial session log → predicts next K actions → scored against ground truth.