Skip to content

Instantly share code, notes, and snippets.

@903124
Created August 7, 2022 10:10
Show Gist options
  • Save 903124/6624b0e80420f7fe61488395eba73aff to your computer and use it in GitHub Desktop.
Save 903124/6624b0e80420f7fe61488395eba73aff to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a code demostrates how to merge tracking with event data with metrica sample data https://github.com/metrica-sports/sample-data as reference"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:16:21.618607Z",
"start_time": "2022-08-07T07:16:21.606607Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import csv\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:35:48.243495Z",
"start_time": "2022-08-07T07:35:48.228496Z"
}
},
"outputs": [],
"source": [
"def tracking_data_process(DATADIR):\n",
"\n",
" # First: deal with file headers so that we can get the player names correct\n",
" csvfile = open(DATADIR, 'r') # create a csv file reader\n",
" reader = csv.reader(csvfile) \n",
" teamnamefull = next(reader)[3].lower()\n",
" print(\"Reading team: %s\" % teamnamefull)\n",
" # construct column names\n",
" jerseys = [x for x in next(reader) if x != ''] # extract player jersey numbers from second row\n",
" columns = next(reader)\n",
" for i, j in enumerate(jerseys): # create x & y position column headers for each player\n",
" columns[i*2+3] = \"{}_x\".format(j)\n",
" columns[i*2+4] = \"{}_y\".format(j)\n",
" columns[-2] = \"ball_x\" # column headers for the x & y positions of the ball\n",
" columns[-1] = \"ball_y\"\n",
" # Second: read in tracking data and place into pandas Dataframe\n",
" tracking = pd.read_csv(DATADIR, names=columns, index_col='Frame', skiprows=3)\n",
" return tracking"
]
},
{
"cell_type": "code",
"execution_count": 284,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T09:56:52.908553Z",
"start_time": "2022-08-07T09:56:52.616917Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reading team: home\n"
]
}
],
"source": [
"event_data = pd.read_csv('data/Sample_Game_1/Sample_Game_1_RawEventsData.csv')\n",
"tracking_data = tracking_data_process('data/Sample_Game_1/Sample_Game_1_RawTrackingData_Home_Team.csv')"
]
},
{
"cell_type": "code",
"execution_count": 285,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T09:56:53.201929Z",
"start_time": "2022-08-07T09:56:53.184366Z"
},
"scrolled": true
},
"outputs": [],
"source": [
"event_data = event_data[event_data['Period'] == 2].reset_index()\n",
"tracking_data = tracking_data[tracking_data['Period'] == 2].reset_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While for sample data here is synchronized, imagine they are not e.g. offset by unknown amount for frames and we have to find the number out. Here the unknown frame is set to 30000"
]
},
{
"cell_type": "code",
"execution_count": 261,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:35:50.827323Z",
"start_time": "2022-08-07T07:35:50.812323Z"
}
},
"outputs": [],
"source": [
"unknown_frame_offset = 30000\n",
"\n",
"tracking_data['Frame'] += unknown_frame_offset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also tracking data will not have perfect agreement with event data location in real life so some random noise is added here."
]
},
{
"cell_type": "code",
"execution_count": 262,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:35:51.262118Z",
"start_time": "2022-08-07T07:35:51.243236Z"
}
},
"outputs": [],
"source": [
"tracking_data['ball_x'] += 0.005*np.random.random()\n",
"tracking_data['ball_y'] += 0.005*np.random.random()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then it's the main body of script: for every passes in event data, get the (starting) event x-y location, then we try to merge and iterate with data so the distance between the ball and event location is minimized. When we select a random frame, the distance between event and ball location will be essentially randomized, but for the frame that are synchronized, the distance will become minized.\n",
"\n",
"While event data and tracking data can be unknown frames apart, typically tracking data provider will start the game not too long after the first frame, and therefore it can be used as an initial guess instead of guessing every possible time."
]
},
{
"cell_type": "code",
"execution_count": 264,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:35:55.479153Z",
"start_time": "2022-08-07T07:35:55.476152Z"
}
},
"outputs": [],
"source": [
"intial_guess = tracking_data['Frame'][0] - event_data['Start Frame'][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"During the iteration, the distance between event and ball location is stored. Typically the min and max search frame can be adjusted so to decrease amount of run time. Certain threshold can also be added to stop the script early when minimum is found (not implemented here)."
]
},
{
"cell_type": "code",
"execution_count": 271,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:37:26.299397Z",
"start_time": "2022-08-07T07:37:03.900371Z"
},
"scrolled": true
},
"outputs": [],
"source": [
"ball_dist = []\n",
"min_search_frame = 0\n",
"max_search_frame = 1000\n",
"for i in range(min_search_frame,max_search_frame):\n",
" temp_tracking_data = tracking_data.copy()\n",
" \n",
" temp_tracking_data['mergeFrame'] = temp_tracking_data['Frame'] - i - intial_guess\n",
" merged_data = pd.merge(event_data[event_data.Type == 'PASS'][['Start Frame','Start X','Start Y']],temp_tracking_data[['mergeFrame','ball_x','ball_y']],left_on='Start Frame',right_on='mergeFrame')\n",
" ball_avg_dist = np.sqrt((merged_data['ball_x'] - merged_data['Start X']) ** 2 + (merged_data['ball_y'] - merged_data['Start Y']) ** 2)\n",
" ball_dist.append(np.mean(ball_avg_dist))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we plot the distance we can find there is a sharp minimum at a certain frame"
]
},
{
"cell_type": "code",
"execution_count": 272,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:37:32.431695Z",
"start_time": "2022-08-07T07:37:32.335552Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x1f915c54c10>]"
]
},
"execution_count": 272,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.title('Distance between ball and event location vs frame')\n",
"plt.plot(np.arange(len(ball_dist)),ball_dist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After adding back initial guess and guess range we can get the synchronized frame we want"
]
},
{
"cell_type": "code",
"execution_count": 275,
"metadata": {
"ExecuteTime": {
"end_time": "2022-08-07T07:37:40.614310Z",
"start_time": "2022-08-07T07:37:40.604310Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"30000"
]
},
"execution_count": 275,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"intial_guess + min_search_frame + np.argmin(ball_dist)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment