Skip to content

Instantly share code, notes, and snippets.

@AJP4
Created April 8, 2018 19:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AJP4/73b769a28fc64bd1afa0726f57175d7e to your computer and use it in GitHub Desktop.
Save AJP4/73b769a28fc64bd1afa0726f57175d7e to your computer and use it in GitHub Desktop.
MS Project to Excel
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Tool for Extracting and Manipulating MS Project Data.\n",
"\n",
"This Jupyter Note book extracts data from an MS Project file into an Python Pandas Object which is then processed as necessary before being converted to an Excel file\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Python Modules used by the Notebook\n",
"The next cell does not need to be changed. It is used to load the module that contains the MS Project to Excel upload and Parse"
]
},
{
"cell_type": "code",
"execution_count": 318,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The autoreload extension is already loaded. To reload it, use:\n",
" %reload_ext autoreload\n"
]
}
],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"import ProjectToExcelClasses\n",
"import formatExcel"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Below are additional modules used in this Notebooks manipulation activity that is not covered in the module above.\n",
"The __Out [#]__ cell just says where the \"ProjectToExcelClasses\" module lives "
]
},
{
"cell_type": "code",
"execution_count": 335,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"import imp\n",
"import pandas as pd\n",
"import win32com.client as win32\n",
"import datetime as dt\n",
"import dateutil\n",
"imp.reload(ProjectToExcelClasses)\n",
"imp.reload(formatExcel);\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## File System Configuration Cell\n",
"The cell below is essentially a configuration file. \n",
"Simply cut-n-paste the path to the MS Project and assign to \"mspFile\" variable. \n",
"\"base_path\" is the folder where output will be written, such as excel files. \n",
"\"xlsx_suffix\" is just a text string variable used by the Notebook. It is just the extension type of any excel files that output."
]
},
{
"cell_type": "code",
"execution_count": 320,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"date=\"2018-03-30\"\n",
"mspFilename = \"MyBlogProject.mpp\"\n",
"\n",
"msp_path=r\"MPP\"\n",
"base_path=r\"Your\\path\\to\\file\\\" + \"\\\\\"+ date + \"_OUT\\\\\"\n",
"\n",
"#mspFile=os.path.join(msp_path, mspFilename) \n",
"mspFile=r\"Your\\path\\to\\file\\MyBlogProject.mpp\"\n",
"\n",
"paths=[base_path, msp_path]\n",
"files=[mspFile]\n",
"\n",
"xlsx_suffix = \"xlsx\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"BELOW: Checks directories, if they don't exist it creates them"
]
},
{
"cell_type": "code",
"execution_count": 339,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"from IPython.utils.path import ensure_dir_exists\n",
"for path in paths:\n",
" if os.path.isdir(path):\n",
" pass\n",
" # print(\"{} path already exists\".format(path))\n",
" else:\n",
" #print(\"{} does not exists. It is now being created\".format(path))\n",
" ensure_dir_exists(path)\n",
" # print(\"CHECK: {p} {check} \".format(p=path,check=\"created\" if os.path.isdir(path)== True else \"not created\"))\n",
" \n",
"# remove \"#\" as necessary"
]
},
{
"cell_type": "code",
"execution_count": 340,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"#Utility function to check if file exists and if yes, is it open. If not further processing of cells is stopped\n",
"\n",
"def check_file(file):\n",
" if os.path.isfile(file):\n",
" print(\"{} \\nfile exists\".format(file))\n",
" if not os.access(file, os.W_OK):\n",
" print (\"\\n...but write access not permitted on {}\".format(file))\n",
" else:\n",
" print (\"\\n...and write access is permited\")\n",
" else:\n",
" print(\"{}\\n file does not exist. \\n Further processing of cells will be stopped\".format(file))\n",
" raise KeyboardInterrupt"
]
},
{
"cell_type": "raw",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"check_file(mspFile)"
]
},
{
"cell_type": "code",
"execution_count": 343,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"UniqueIDs_to_Avoid=[7,999]"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Team Members\n",
"Place in the cell below, in quotes, between the square brackets the names used in the MS Project File for the various team members.\n",
"\n",
"__Warning__: If a team member is not named AND/OR a team member is not listed as written in the MS Project File then the Notebook output will not be accurate.\n",
"\n",
"_(code note: the conversion of the Python Lists to Sets is merely to guard against replication of team member names)_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Arnie\n",
"Billy\n",
"Charlie\n",
"Charle\n",
"Arnie\n",
"Billy\n",
"George\n",
"Amy\n",
"Cindy\n",
"Derek\n",
"Eric\n",
"Fred"
]
},
{
"cell_type": "code",
"execution_count": 325,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Arnie', 'Billy', 'Charlie'}\n",
"{'Amy', 'George', 'Fred', 'Hendrick'}\n",
"{'Billy', 'Cindy', 'Charle', 'Eric', 'Amy', 'Fred', 'Derek', 'Arnie', 'Charlie', 'George'}\n",
"{'Billy', 'Cindy', 'Charle', 'Eric', 'Amy', 'Fred', 'Derek', 'Charlie', 'Arnie', 'George', 'Hendrick'}\n"
]
}
],
"source": [
"Team1=[\"Billy\",\n",
" \"Charlie\",\n",
" \"Arnie\"]\n",
" \n",
"Team2=[\"George\",\n",
" \"Amy\",\n",
" \"Fred\",\n",
" \"Fred\",\"Hendrick\" ]\n",
"Team3=[\"Arnie\",\"Billy\",\"Charlie\", \"Charle\", \"Arnie\", \"Billy\", \"George\",\"Amy\", \"Cindy\",\"Derek\",\"Eric\",\"Fred\"]\n",
"\n",
"Team1_set=set(Team1) # Team 1\n",
"Team2_set=set(Team2) # Team 2\n",
"Team3_set=set(Team3)\n",
"All_set=set(Team1+Team2+Team3)\n",
"\n",
"#exclusions\n",
"exclusions_sets=set([\"Eric\"])\n",
"\n",
"print(Team1_set)\n",
"print(Team2_set)\n",
"print(Team3_set)\n",
"print(All_set)\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Parse the MS Project File\n",
"The next two cells set up and then parse the assigned MS Project file. \n",
"\n",
"First assign the MS Project headers you wish to parse using their \"Field Name\", which is not always the same as the column \"Title\". E.g. Field Name = \"Text14\", Title = \"Key Deliverable Type\". Header setting is done by updating the list assigned to the variable \"list_of_headers\". \n",
"_note: Summary Tasks, Percent Complete, start, finish are always added, there is no need to add these column headers_\n",
"\n",
"In the second cell does not need to be changed\n"
]
},
{
"cell_type": "code",
"execution_count": 326,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"list_of_headers = [\"Resource Names\",\"Predecessors\",\"Resource Group\", \"Text13\",\"Text14\",\"Text12\",\"Text11\",\"Duration\"]"
]
},
{
"cell_type": "code",
"execution_count": 345,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"df=ProjectToExcelClasses.DataFrameOfMSProject(headers=list_of_headers, ms_project_file=mspFile,\n",
" logging_level=\"DEBUG\", UniqueIDs_to_Ignore=UniqueIDs_to_Avoid); "
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Create a Dictionary of Dataframes based on the Due Date, Duration of Period and Number of periods to report.\n",
"\n",
"\n",
"**dueDate**=\"02/04/2018\" # Add the date that anything before will be flagged as overdue \n",
"**durationOfPeriods** = 7 # number of days in a period (ie how manys in a tab in the worksheet). e.g. 7 days = a week \n",
"**numberOfPeriods**= 6 # the number of periods to be covered. E.g. is 7 days is the duration, 6 would cover next 6 weeks from the dueDate "
]
},
{
"cell_type": "code",
"execution_count": 328,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished processing output_dictionary_of_data_frames_FINISHING\n",
"Finished processing output_dictionary_of_data_frames_WIP\n",
"Finished processing output_dictionary_of_data_frames_WIP\n"
]
}
],
"source": [
"dueDate=\"30/03/2018\" \n",
"durationOfPeriods = 7 \n",
"numberOfPeriods= 10\n",
"\n",
"#All Tasks that are incomplete BUT that referencing finish window only\n",
"df_dictF=df.output_dictionary_of_data_frames_FINISHING(due_date=dueDate,\n",
" duration_of_periods=durationOfPeriods,\n",
" num_of_periods=numberOfPeriods,\n",
" flag_incomplete_only=True)\n",
"\n",
"#All tasks that are incomplete BUT that reference starting, WIP or finishing in period\n",
"df_dictW=df.output_dictionary_of_data_frames_WIP(due_date=dueDate,\n",
" duration_of_periods=durationOfPeriods,\n",
" num_of_periods=numberOfPeriods,\n",
" flag_incomplete_only=True)\n",
"\n",
"\n",
"\n",
"#All tasks over the project as a single window whether complete or not. This is used to produce a deliverable status output\n",
"#that can be used to list completed and yet to be completed deliverables\n",
"dueDateDeliverables=\"01/01/2018\" \n",
"durationOfPeriods = 300 \n",
"numberOfPeriods= 1\n",
"\n",
"df_dict_Deliverables=df.output_dictionary_of_data_frames_WIP(due_date=dueDateDeliverables,\n",
" duration_of_periods=durationOfPeriods,\n",
" num_of_periods=numberOfPeriods,\n",
" flag_incomplete_only=False)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Output Dictionary of Dataframes to excel filtered by selection: One Excel Sheet\n",
"## All Team. Tasks due to finish in Period"
]
},
{
"cell_type": "code",
"execution_count": 329,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# One SpreadhsheetOut\n",
"\n",
"team=All_set\n",
"\n",
"listOfSheetNames=[]\n",
"for key in df_dictF.keys():\n",
" listOfSheetNames.append(key)\n",
"\n",
"# step through a list (which were converted to sets to remove accidental duplicates)\n",
"# of resource_names and produce a excel file for each resources\n",
"# Then format the lis\n",
"\n",
"\n",
"filename = \"All_Incomplete_AllTeam_Resource_Tasks_Finishing_in_Period\"\n",
"colFilterAttrib=\"|\".join(team)\n",
"colExclusionAttrib = \"|\".join(exclusions_sets)\n",
"writer = pd.ExcelWriter(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
"for k in df_dictF:\n",
" df_item=df_dictF[k].copy()\n",
" df_out = df_item[(df_item[\"Resource Names\"].str.contains(colFilterAttrib, na=False) &\n",
" ~(df_item[\"Resource Names\"].str.contains(colExclusionAttrib,na=False)))] #find resource name in column \n",
" df_out.to_excel(writer, k)\n",
"writer.save()"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Set Format of Columns in Excel\n",
"The next cell sets the format for the various columns in excel. When an excel spreadsheet is the formatExcelSheet1 function will used to format the columns. Now if you change the number, type or arrangement of the headers then the next cell will have to change. Columns A, B, C, D, E, F are 'UniqueID', 'SummaryTask', 'Name', 'Start', 'Finish', '% Complete' respectively. Depending on what other headers may be added and then you may need to run this cell after changeing the content and re-run the following cell to create the excel outputs.\n",
"\n",
"_Note: going forward I may make the formating function a Class/Methods in the ProjectToExcelClasses module."
]
},
{
"cell_type": "code",
"execution_count": 330,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"imp.reload(formatExcel)\n",
"formatObj=formatExcel.excelFormatColumns()\n",
"\n",
"formatObj.date_cols = [\"D\",\"E\"]\n",
"formatObj.autofit_cols = [\"A\", \"D\", \"E\", \"F\", \"P\"]\n",
"formatObj.wrap_text_cols_ColWidth_small = [\"G\",\"H\"]\n",
"formatObj.wrap_text_cols_ColWidth_medium = [\"B\",\"C\"]\n",
"formatObj.wrap_text_cols_ColWidth_large = [\"Z\"]"
]
},
{
"cell_type": "code",
"execution_count": 331,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"formatObj.excelFile(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
"os.path.join(base_path, filename + \".\" + xlsx_suffix)\n",
"formatObj.formatExcel(listOfSheetNames)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Repeat of the Above, Different Team/People Criteria"
]
},
{
"cell_type": "code",
"execution_count": 332,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# Multiple Spreadsheets out depending on iterator\n",
"iterator=Team1_set\n",
"\n",
"listOfSheetNames=[]\n",
"for key in df_dictF.keys():\n",
" listOfSheetNames.append(key)\n",
"\n",
"colFilterAttrib=\"|\".join(team)\n",
"colExclusionAttrib = \"|\".join(exclusions_sets)\n",
"for resource in iterator:\n",
" filename = \"Tasks for \"+resource\n",
" writer = pd.ExcelWriter(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
" for k in df_dictF:\n",
" df_item=df_dictF[k].copy()\n",
" #df_out = df_item[df_item[\"Resource Names\"].str.contains(colFilterAttrib, na=False)]\n",
" df_temp = df_item[(df_item[\"Resource Names\"].str.contains(colFilterAttrib, na=False) & ~(df_item[\"Resource Names\"].str.contains(colExclusionAttrib,na=False)))] #find resource name in column \n",
" df_out=df_temp[df_temp[\"Resource Names\"]==resource]\n",
" df_out.to_excel(writer, k)\n",
" writer.save()\n",
" formatObj.excelFile(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
" formatObj.formatExcel(listOfSheetNames)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Activities in Period (Starting, WIP, Finishing)"
]
},
{
"cell_type": "code",
"execution_count": 333,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# One SpreadhsheetOut\n",
"\n",
"team=All_set\n",
"\n",
"listOfSheetNames=[]\n",
"for key in df_dictW.keys():\n",
" listOfSheetNames.append(key)\n",
"\n",
"# step through a list (which were converted to sets to remove accidental duplicates)\n",
"# of resource_names and produce a excel file for each resources\n",
"# Then format the lis\n",
"\n",
"\n",
"filename = \"All_Task_In_Period_With_WIP_Status\"\n",
"colFilterAttrib=\"|\".join(team)\n",
"colExclusionAttrib = \"|\".join(exclusions_sets)\n",
"writer = pd.ExcelWriter(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
"for k in df_dictW:\n",
" df_item=df_dictW[k].copy()\n",
" df_out = df_item[(df_item[\"Resource Names\"].str.contains(colFilterAttrib, na=False) & ~(df_item[\"Resource Names\"].str.contains(colExclusionAttrib,na=False)))] #find resource name in column \n",
" df_out.to_excel(writer, k)\n",
"writer.save()\n",
"formatObj.excelFile(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
"formatObj.formatExcel(listOfSheetNames)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## All WIP Task: Output by Team Member"
]
},
{
"cell_type": "code",
"execution_count": 334,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"No Task for Eric\n",
"No Task for Hendrick\n"
]
}
],
"source": [
"# Multiple Spreadhsheets Out\n",
"\n",
"team=All_set\n",
"\n",
"iterator = All_set\n",
"\n",
"listOfSheetNames=[]\n",
"numSheets=0\n",
"for key in df_dictW.keys():\n",
" listOfSheetNames.append(key)\n",
" numSheets=numSheets+1\n",
"\n",
"# step through a list (which were converted to sets to remove accidental duplicates)\n",
"# of resource_names and produce a excel file for each resources\n",
"# Then format the lis\n",
"colFilterAttrib=\"|\".join(team)\n",
"colExclusionAttrib = \"|\".join(exclusions_sets)\n",
"\n",
"for resource in iterator:\n",
" filename = \"All_Task_In_Period_With_WIP_Status - \" + resource\n",
" writer = pd.ExcelWriter(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
" df_counter = numSheets\n",
" for k in df_dictW:\n",
" df_item=df_dictW[k].copy()\n",
" df_temp = df_item[(df_item[\"Resource Names\"].str.contains(colFilterAttrib, na=False) & ~(df_item[\"Resource Names\"].str.contains(colExclusionAttrib,na=False)))].copy() #find resource name in column \n",
" df_out=df_temp[df_temp[\"Resource Names\"].str.contains(resource, na=False)]\n",
" if df_out.empty:\n",
" df_counter=df_counter-1\n",
" df_out.to_excel(writer, k)\n",
" writer.save()\n",
" if df_counter !=0:\n",
" formatObj.excelFile(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
" formatObj.formatExcel(listOfSheetNames)\n",
" else:\n",
" os.remove(os.path.join(base_path, filename + \".\" + xlsx_suffix))\n",
" print(\"No Task for \" + resource)\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (jupyterLab)",
"language": "python",
"name": "jupyterlab"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment