Skip to content

Instantly share code, notes, and snippets.

@restrepo
Last active January 29, 2016 06:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save restrepo/8d692e994b860f45d9c7 to your computer and use it in GitHub Desktop.
Save restrepo/8d692e994b860f45d9c7 to your computer and use it in GitHub Desktop.
Estudiantes de posgrado admitidos y graduados en el Instituto de Física de la UdeA desde 1995
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Estudiantes de posgrado"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Configure los datos del informe"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"config_inf={'DEBUG':False} #Development mode\n",
"config_inf['Years']=range(1995,2016)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Función de búsqueda"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para obtener las claves de las hojas de cálculo de Google: `gss_key`, visita la [página de búsqueda de Indicadores](http://gfif.udea.edu.co/gssis/sql_query/search.html): (Seleccione indicador, ponga `*` en la caja de `\"select\"`, pulse el botón de búsqueda y abra el link HTML)."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import re\n",
"def read_google_cvs(gss_url=\"http://spreadsheets.google.com\",\\\n",
" gss_format=\"csv\",\\\n",
" gss_key=\"0AuLa_xuSIEvxdERYSGVQWDBTX1NCN19QMXVpb0lhWXc\",\\\n",
" gss_sheet=0,\\\n",
" gss_query=\"select B,D,E,F,I where (H contains 'GFIF') order by D desc\",\\\n",
" gss_keep_default_na=False\n",
" ):\n",
" import urllib\n",
" \"\"\"\n",
" read a google spreadsheet in cvs format and return a pandas DataFrame object.\n",
" ....\n",
" gss_keep_default_na: (False) Blank values are filled with NaN\n",
" \"\"\"\n",
" issn_url=\"%s/tq?tqx=out:%s&tq=%s&key=%s&gid=%s\" %(gss_url,\\\n",
" gss_format,\\\n",
" gss_query,\\\n",
" gss_key,\\\n",
" str(gss_sheet))\n",
"\n",
" gfile=urllib.urlopen(issn_url)\n",
" return pd.read_csv(gfile,keep_default_na=gss_keep_default_na)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare `pandas` DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Estudiantes"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dfd=read_google_cvs(gss_key=\"0AuLa_xuSIEvxdERYSGVQWDBTX1NCN19QMXVpb0lhWXc\",gss_query=\"select *\",gss_sheet=2)\n",
"dfd['PROGRAMA']='Doctorado'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dfm=read_google_cvs(gss_key=\"0AuLa_xuSIEvxdERYSGVQWDBTX1NCN19QMXVpb0lhWXc\",gss_query=\"select *\",gss_sheet=1)\n",
"dfm['PROGRAMA']='Maestría'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Join into a single data frame"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index([u'ASESOR', u'Becado de', u'CODIRECTOR', u'CORREO', u'Cédula', u'Distinción', u'Doble titulación', u'ESTANTE', u'ESTUDIANTE', u'Estado', u'Estado Laboral', u'Fecha de graduación', u'Fecha de ingreso', u'Fecha de sustentación', u'GRUPO', u'INSTITUTO', u'JURADO 1', u'JURADO 2', u'JURADO 3', u'Materias matriculas semestre 2011-1', u'NOMBRE TESIS', u'PROGRAMA', u'Pasantía', u'Proyecto ID', u'SIGNATURA', u'Situación', u'Unnamed: 0'], dtype='object')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dfe=dfd.append(dfm)\n",
"dfe.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Main part: Replace 'Year-semester' by 'Year'"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dfe['Año de ingreso']=dfe['Fecha de ingreso'].replace('','0').replace('null','0').str.replace('-1','').str.replace('-2','').astype('int')\n",
"dfe['Año de graduación']=dfe['Fecha de graduación'].replace('','0').replace('null','0').str.replace('-1','').str.replace('-2','').astype('int')#.values\n",
"#dfe['Año de graduación'].values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Extract configured years"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dfei=dfe[dfe['Año de ingreso']>=config_inf['Years'][0]].reset_index(drop=True)\n",
"dfeg=dfe[dfe['Año de graduación']>=config_inf['Years'][0]].reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Estudiantes graduados"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año de graduación\n",
"2014 9\n",
"2013 11\n",
"2012 15\n",
"2011 5\n",
"2010 6\n",
"2009 9\n",
"2008 14\n",
"2007 6\n",
"2006 7\n",
"2005 1\n",
"2004 7\n",
"2003 4\n",
"2002 3\n",
"2001 4\n",
"2000 5\n",
"1999 2\n",
"0 88\n",
"Name: ASESOR, dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a=dfei.groupby('Año de graduación').count()['ASESOR'][::-1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año de graduación\n",
"2014 5\n",
"2013 7\n",
"2012 12\n",
"2011 3\n",
"2010 3\n",
"2009 4\n",
"2008 12\n",
"2007 4\n",
"2006 4\n",
"2004 5\n",
"2003 4\n",
"2002 1\n",
"2001 3\n",
"2000 2\n",
"1999 2\n",
"0 44\n",
"Name: ASESOR, dtype: int64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a=dfei[dfei.PROGRAMA=='Maestría'].groupby('Año de graduación').count()['ASESOR'][::-1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año de graduación\n",
"2014 4\n",
"2013 4\n",
"2012 3\n",
"2011 2\n",
"2010 3\n",
"2009 5\n",
"2008 2\n",
"2007 2\n",
"2006 3\n",
"2005 1\n",
"2004 2\n",
"2002 2\n",
"2001 1\n",
"2000 3\n",
"0 44\n",
"Name: ASESOR, dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a=dfei[dfei.PROGRAMA=='Doctorado'].groupby('Año de graduación').count()['ASESOR'][::-1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"dfei[dfei.PROGRAMA=='Maestría'].groupby('Año de graduación').count()['ASESOR']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estudiantes admitidos y matriculados "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se descartan los desertores"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año de ingreso\n",
"2015 8\n",
"2014 28\n",
"2013 19\n",
"2012 14\n",
"2011 10\n",
"2010 12\n",
"2009 8\n",
"2008 15\n",
"2007 8\n",
"2006 18\n",
"2005 13\n",
"2004 11\n",
"2003 3\n",
"2002 5\n",
"2001 7\n",
"2000 2\n",
"1999 4\n",
"1998 6\n",
"1997 3\n",
"1995 2\n",
"Name: ASESOR, dtype: int64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dfeit=dfei[np.logical_or(dfei['Estado']=='Activo',dfei['Estado']=='Graduado')]\n",
"dfeit[['Año de ingreso','Estado']]\n",
"dfei.groupby('Año de ingreso').count()['ASESOR'][::-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Artículos estudiantes"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def plainstr(stra):\n",
" return stra.replace('á','a').replace('é','e').replace('í','i').\\\n",
" replace('ó','o').replace('ú','u').replace('ñ','n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Artículos"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df=read_google_cvs(gss_key='0AjqGPI5Q_Ez6dDA3ajhtYVVDOWdBckVhWm1MSFRET1E',gss_query=\"select *\")\n",
"publicadas=df[df['ISSN']!='0000-0000']\n",
"publicadas=publicadas[publicadas['ISSN']!='']\n",
"publicadas=publicadas[publicadas['Año']!=0]\n",
"publicadas=publicadas.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Artículos con participación de estudiantes"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"dfei=dfei[dfei.Estado!='Desertor'].reset_index(drop=True)"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Estudiantes\n",
"student_coauthor=1 #until 2 years after graduation date\n",
"dfd=pd.Series()\n",
"art_est=pd.Series()\n",
"dfd['D']=dfei[dfei.PROGRAMA=='Doctorado'].reset_index(drop=True)\n",
"dfd['M']=dfei[dfei.PROGRAMA=='Maestría'].reset_index(drop=True)\n",
"dfd['P']=dfei\n",
"for k in dfd.keys():\n",
" art_est[k]=pd.DataFrame()\n",
"#phdest=pd.DataFrame()\n",
" dfd[k]['Author']=''\n",
" for i in range(dfd[k].shape[0]):\n",
" est=re.sub('\\s+$','',re.sub('^\\s+','',dfd[k]['ESTUDIANTE'][i])).replace(' ',' ').split(' ')\n",
" if len(est)==4:\n",
" dfd[k]['Author'][i]='%s, %s' %(plainstr(est[2]),est[0][0])\n",
" elif len(est)==3: \n",
" dfd[k]['Author'][i]='%s, %s' %(plainstr(est[1]),est[0][0])\n",
" elif len(est)==2:\n",
" dfd[k]['Author'][i]='%s, %s' %(plainstr(est[1]),est[0][0])\n",
" else:\n",
" dfd[k]['Author'][i]='%s, %s' %(plainstr(est[-1]),est[0][0])\n",
" #special cases\n",
" if dfd[k]['Author'][i]=='Duque, C':\n",
" dfd[k]['Author'][i]='Duque, CM'\n",
" \n",
" ma=publicadas[publicadas['Autor(es)'].str.contains(dfd[k]['Author'][i])]\n",
" if ma.shape[0]>0:\n",
" ma['Estudiante']=dfd[k]['Author'][i]\n",
" ma['Año Graduación']=dfd[k]['Fecha de graduación'][i]\n",
" if dfd[k]['Fecha de graduación'][i]!='':\n",
" ya=eval(re.sub('-[12]$','',dfd[k]['Fecha de graduación'][i]))+student_coauthor\n",
" ma=ma[ma['Año']<=ya]\n",
" \n",
" art_est[k]=art_est[k].append(ma)\n",
" #phdest=phdest.append({'ESTUDIANTE':dfd[k]['ESTUDIANTE'][i],'Autor':dfd['Author'][i],\\\n",
" # 'Fecha de graduación':dfd[k]['Fecha de graduación'][i],\\\n",
" # 'Artículos':ma.shape[0]},ignore_index=True)\n",
"\n",
" #Filter pandas dataframe\n",
" #Fill from phd_students.xlsx (see below) to cross check the data\n",
" art_est[k]=art_est[k].drop_duplicates().reset_index(True).sort('Año',ascending=False).reset_index(drop=True)\n",
" art_est[k]=art_est[k][['Tipo','Artículo','Autor(es)','Año Graduación','Año','Revista','ISSN','Estudiante']]"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Artículo</th>\n",
" <th>Autor(es)</th>\n",
" <th>Año Graduación</th>\n",
" <th>Año</th>\n",
" <th>Revista</th>\n",
" <th>ISSN</th>\n",
" <th>Estudiante</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>219</th>\n",
" <td> B [over→] τμ (X) decays in SUSY models without...</td>\n",
" <td> Guetta, Dafne; Mira, Jesus M; Nardi, Enrico; </td>\n",
" <td> 2000-2</td>\n",
" <td> 1999</td>\n",
" <td> Physical Review D</td>\n",
" <td> 1550-7998</td>\n",
" <td> Mira, J</td>\n",
" </tr>\n",
" <tr>\n",
" <th>220</th>\n",
" <td> A $ SU (3) _ {c} x SU (2) _ {L} x U (1) _ {\\ g...</td>\n",
" <td> Mira, Jesus M; Nardi, Enrico; Restrepo, Diego A; </td>\n",
" <td> 2000-2</td>\n",
" <td> 1999</td>\n",
" <td> Physical Review D</td>\n",
" <td> 1550-7998</td>\n",
" <td> Mira, J</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Artículo \\\n",
"219 B [over→] τμ (X) decays in SUSY models without... \n",
"220 A $ SU (3) _ {c} x SU (2) _ {L} x U (1) _ {\\ g... \n",
"\n",
" Autor(es) Año Graduación Año \\\n",
"219 Guetta, Dafne; Mira, Jesus M; Nardi, Enrico; 2000-2 1999 \n",
"220 Mira, Jesus M; Nardi, Enrico; Restrepo, Diego A; 2000-2 1999 \n",
"\n",
" Revista ISSN Estudiante \n",
"219 Physical Review D 1550-7998 Mira, J \n",
"220 Physical Review D 1550-7998 Mira, J "
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"art_est.D[art_est.D['Año']==1999][['Artículo','Autor(es)','Año Graduación','Año','Revista','ISSN','Estudiante']]"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año\n",
"2016 1\n",
"2015 14\n",
"2014 10\n",
"2013 8\n",
"2012 13\n",
"2011 8\n",
"2010 12\n",
"2009 21\n",
"2008 18\n",
"2007 16\n",
"2006 28\n",
"2005 23\n",
"2004 14\n",
"2003 21\n",
"2002 12\n",
"2001 20\n",
"2000 5\n",
"1999 2\n",
"1996 2\n",
"1993 2\n",
"1992 2\n",
"Name: Tipo, dtype: int64"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a=art_est.D.groupby('Año').count().Tipo[::-1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año\n",
"2015 2\n",
"2014 6\n",
"2013 3\n",
"2012 14\n",
"2011 12\n",
"2010 14\n",
"2009 18\n",
"2008 26\n",
"2007 20\n",
"2006 19\n",
"2005 10\n",
"2004 13\n",
"2003 13\n",
"2002 11\n",
"2001 17\n",
"2000 9\n",
"1999 1\n",
"1998 1\n",
"1997 2\n",
"1996 1\n",
"1992 2\n",
"Name: Tipo, dtype: int64"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a=art_est.M.groupby('Año').count().Tipo[::-1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Año\n",
"2016 1\n",
"2015 15\n",
"2014 16\n",
"2013 11\n",
"2012 27\n",
"2011 20\n",
"2010 26\n",
"2009 39\n",
"2008 44\n",
"2007 36\n",
"2006 47\n",
"2005 33\n",
"2004 27\n",
"2003 34\n",
"2002 23\n",
"2001 37\n",
"2000 14\n",
"1999 3\n",
"1998 1\n",
"1997 2\n",
"1996 3\n",
"1993 2\n",
"1992 4\n",
"Name: Tipo, dtype: int64"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a=art_est.P.groupby('Año').count().Tipo[::-1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"14\n",
"10\n",
"8\n",
"13\n",
"8\n",
"12\n",
"21\n",
"18\n",
"16\n",
"28\n",
"23\n",
"14\n",
"21\n",
"12\n",
"20\n",
"5\n",
"2\n",
"2\n",
"2\n",
"2\n"
]
}
],
"source": [
"for i in a.values: print i"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment