Skip to content

Instantly share code, notes, and snippets.

@fperez
Last active July 1, 2021 04:43
Show Gist options
  • Save fperez/5b49246af4e340c37549265a90894ce6 to your computer and use it in GitHub Desktop.
Save fperez/5b49246af4e340c37549265a90894ce6 to your computer and use it in GitHub Desktop.
Polyglot Data Science with IPython

Polyglot Data Science with IPython & friends

Author: Fernando Pérez.

A demonstration of how to use Python, Julia, Fortran and R cooperatively to analyze data, in the same process.

This is supported by the IPython kernel and a few extensions that take advantage of IPython's magic system to provide low-level integration between Python and other languages.

See the companion notebook for data preparation and setup.

Used for a lecture at the Berkeley Institute for Data Science. The lecture video has a live demo of this material.

License: CC-BY.

x y
0.0 0.09428703274649862
0.02101399768287487 -0.21682787079387694
0.04202799536574974 0.32998247004803666
0.06304199304862461 0.0036910376779959175
0.08405599073149948 -0.0544094870632954
0.10506998841437436 0.29133411420559346
0.12608398609724922 0.310718736500297
0.1470979837801241 0.03710184794855739
0.16811198146299897 0.19385680449502704
0.18912597914587384 -0.23080372945059743
0.21013997682874871 0.4754598194444005
0.2311539745116236 0.47226348579069954
0.25216797219449844 0.4936609017425528
0.2731819698773733 -0.07143553645414097
0.2941959675602482 0.2965134877139408
0.31520996524312306 0.39495610271631554
0.33622396292599793 0.5075112614675588
0.3572379606088728 0.5168053860493421
0.3782519582917477 0.7564556008394825
0.39926595597462256 0.216741161143698
0.42027995365749743 0.520141808615399
0.4412939513403723 0.46466381308230564
0.4623079490232472 0.6703516636116772
0.48332194670612205 0.7787710470956681
0.5043359443889969 0.9687162726597178
0.5253499420718718 0.6487924421695102
0.5463639397547466 0.9158718710835747
0.5673779374376215 0.4559752998461058
0.5883919351204964 0.821859705782694
0.6094059328033713 1.1098222426034332
0.6304199304862461 0.8584153034108905
0.651433928169121 1.0457914750281228
0.6724479258519959 1.2284594600580687
0.6934619235348707 1.2690396077874566
0.7144759212177456 1.2737166912460058
0.7354899189006204 1.1178291771613955
0.7565039165834954 1.2085523305960715
0.7775179142663702 1.160430948324203
0.7985319119492451 1.4346476634879184
0.8195459096321199 1.7856896638059545
0.8405599073149949 1.363698304252134
0.8615739049978697 1.2758144742466853
0.8825879026807446 1.436564698288517
0.9036019003636194 1.054058214396473
0.9246158980464944 1.5577062715745673
0.9456298957293692 1.3670737307877898
0.9666438934122441 1.5566477333034936
0.9876578910951189 1.6241865931574329
1.0086718887779937 1.8070252562351916
1.0296858864608687 1.7331681488676631
1.0506998841437436 1.891108813605713
1.0717138818266183 1.4650168717924723
1.0927278795094932 1.5033905033420423
1.1137418771923682 1.7913946882131335
1.134755874875243 1.7277602211222844
1.1557698725581178 1.8322012987848455
1.1767838702409927 1.9533801227508665
1.1977978679238677 1.894507174191391
1.2188118656067426 2.0312237287401396
1.2398258632896173 2.2409586168192526
1.2608398609724922 1.74787016968533
1.2818538586553672 1.9365386844626173
1.302867856338242 2.016956985811513
1.3238818540211168 1.915157228352737
1.3448958517039917 2.1617305993411757
1.3659098493868667 1.4695044990462909
1.3869238470697414 2.346755263624674
1.4079378447526163 1.6994786100038657
1.4289518424354912 1.9540142243834207
1.4499658401183662 2.0324245533593643
1.4709798378012409 1.7104125016036007
1.4919938354841158 1.932092715157879
1.5130078331669907 1.9488771044291837
1.5340218308498657 1.8772836619419955
1.5550358285327404 1.5481368188707556
1.5760498262156153 2.092098081216662
1.5970638238984902 1.689508668821579
1.6180778215813651 1.363753243257729
1.6390918192642399 1.6675992701153861
1.6601058169471148 1.4929715328232587
1.6811198146299897 1.518989678972056
1.7021338123128644 0.6518879009250004
1.7231478099957394 1.5650234771609497
1.7441618076786143 1.2655960185446726
1.7651758053614892 1.200656774373136
1.786189803044364 1.0132139672164098
1.8072038007272389 1.1834080615728857
1.8282177984101138 1.1572844700428722
1.8492317960929887 0.9449712689069361
1.8702457937758634 1.1003388751867371
1.8912597914587383 0.770199436323397
1.9122737891416133 0.6082151664471259
1.9332877868244882 0.41885326004336676
1.954301784507363 0.4464939140061885
1.9753157821902378 0.6691358626736834
1.9963297798731128 0.43573967652613405
2.0173437775559875 0.3339828126398665
2.0383577752388624 0.4644473825618527
2.0593717729217373 0.1064277237776948
2.0803857706046123 0.1853789263188848
2.101399768287487 0.3201020224385418
2.122413765970362 0.35629474266753264
2.1434277636532366 0.33225605736934366
2.1644417613361115 0.28491959796611266
2.1854557590189865 0.32910631521731204
2.2064697567018614 0.5176262662892887
2.2274837543847363 0.109839296007394
2.2484977520676113 0.20211465878039475
2.269511749750486 0.5788131607716548
2.290525747433361 0.1242850987814677
2.3115397451162356 0.6104516455738529
2.3325537427991105 0.156020816657225
2.3535677404819855 0.4783926051674913
2.3745817381648604 0.7958505276164087
2.3955957358477353 0.6892085463329622
2.4166097335306103 0.7833572716310165
2.437623731213485 1.0507576396020168
2.4586377288963597 0.6499207994695929
2.4796517265792346 1.1252241650348074
2.5006657242621095 1.2991169459684497
2.5216797219449845 1.275830668753998
2.5426937196278594 1.307195313270599
2.5637077173107343 1.3980001489560312
2.5847217149936093 1.7230235994943333
2.605735712676484 1.3924326262625255
2.6267497103593587 1.903817084872605
2.6477637080422336 1.8147486062636935
2.6687777057251085 1.928601211864948
2.6897917034079835 2.196074272466092
2.7108057010908584 2.251763766706309
2.7318196987737333 2.211637113563625
2.7528336964566082 2.2296439602321607
2.7738476941394827 2.385493920800681
2.7948616918223577 2.6104054328620516
2.8158756895052326 2.3551468077571522
2.8368896871881075 2.01614367546487
2.8579036848709825 1.758547230062079
2.8789176825538574 2.5145466583995195
2.8999316802367323 1.7205136112646957
2.9209456779196072 2.2355028033020905
2.9419596756024817 2.0663062219935857
2.9629736732853567 1.7328187304433138
2.9839876709682316 1.8410605026462201
3.0050016686511065 1.41349631197623
3.0260156663339814 1.6948342491959802
3.0470296640168564 1.225211982275375
3.0680436616997313 1.3375325966658254
3.089057659382606 1.2602107474366946
3.1100716570654807 0.9059763417450576
3.1310856547483557 1.27358315157639
3.1520996524312306 0.7751719459170574
3.1731136501141055 0.7184056838930605
3.1941276477969804 0.35720962825103025
3.2151416454798554 0.3454266842065717
3.2361556431627303 0.5467853336586905
3.257169640845605 0.04810795996137804
3.2781836385284797 -0.2649760702592041
3.2991976362113546 0.06169584132449482
3.3202116338942296 -0.06149482697996339
3.3412256315771045 0.18942219160248575
3.3622396292599794 0.25605182400139126
3.3832536269428544 0.04512111749786685
3.404267624625729 0.19341064591789695
3.425281622308604 0.15502940992435946
3.4462956199914787 0.2628837343218484
3.4673096176743536 0.5903816234688343
3.4883236153572286 0.7928516146939639
3.5093376130401035 0.8352510045541296
3.5303516107229784 1.1253646232783057
3.5513656084058534 1.272364619510003
3.572379606088728 1.199760718896991
3.5933936037716028 1.2400157196887907
3.6144076014544777 1.291281705897311
3.6354215991373526 1.349471635900479
3.6564355968202276 1.7300156091830594
3.6774495945031025 1.7457385690681717
3.6984635921859774 2.284177744394171
3.719477589868852 1.9315785622203046
3.740491587551727 1.6497267465629497
3.7615055852346018 2.2162488056706726
3.7825195829174767 1.4769937904058161
3.8035335806003516 1.586792806741044
3.8245475782832266 1.854161812692901
3.8455615759661015 1.6803133942079087
3.8665755736489764 1.8181215829541293
3.887589571331851 1.135191939572346
3.908603569014726 1.03164339714643
3.9296175666976008 0.9858069759360137
3.9506315643804757 0.8465371945050578
3.9716455620633506 0.46573563542895974
3.9926595597462256 0.6150177958199532
4.0136735574291 0.28227854040030514
4.034687555111975 -0.05593170482163062
4.05570155279485 -0.08820050185328651
4.076715550477725 0.18315078126159096
4.0977295481606 -0.23042439222793287
4.118743545843475 -0.2571793298707865
4.13975754352635 -0.0999819260172232
4.1607715412092245 -0.24340398263360863
4.1817855388920995 -0.50431617492975
4.202799536574974 -0.3207044398109188
4.223813534257849 -0.3145809051772469
4.244827531940724 -0.0660875532158049
4.265841529623599 -0.09482847989385279
4.286855527306473 0.36790855403938516
4.307869524989348 0.15033179677285174
4.328883522672223 0.36601183355486666
4.349897520355098 0.6519416287238422
4.370911518037973 1.184636801781128
4.391925515720848 1.1171617441775716
4.412939513403723 1.2054237787691973
4.433953511086598 1.1172360264040786
4.454967508769473 1.534353101017178
4.475981506452348 1.8171007670521293
4.4969955041352225 1.7249702922535741
4.5180095018180975 1.6385770810646945
4.539023499500972 1.5517279928683787
4.560037497183847 1.1971680468803079
4.581051494866722 1.3161222409555393
4.602065492549596 1.2298207684545746
4.623079490232471 0.8728156511447843
4.644093487915346 1.1264733882736175
4.665107485598221 0.6206552292833607
4.686121483281096 0.5037364158453278
4.707135480963971 0.15576837933202337
4.728149478646846 -0.5149011882433355
4.749163476329721 -0.3733926583803232
4.770177474012596 -0.4895766398570291
4.791191471695471 -0.3963229001074472
4.812205469378346 -0.6890871939218315
4.8332194670612205 -0.7942286766222939
4.8542334647440954 -0.7986849132828343
4.87524746242697 -0.8886178465483747
4.896261460109845 -0.9630569221353825
4.917275457792719 -0.9834146007334708
4.938289455475594 -0.5678948714845939
4.959303453158469 -0.3558829333325443
4.980317450841344 0.249107365558437
5.001331448524219 -0.040608009173703025
5.022345446207094 0.20233845146527824
5.043359443889969 0.1992400765938189
5.064373441572844 0.5275485354884621
5.085387439255719 0.7514618756521231
5.106401436938594 0.49009426874379936
5.127415434621469 0.5067880624234495
5.148429432304344 1.0585654272835328
5.1694434299872185 1.0610109488693784
5.190457427670093 0.8525357584534238
5.211471425352968 0.5012416534838943
5.232485423035842 0.5961987692083116
5.253499420718717 0.7313663909072938
5.274513418401592 -0.1936178203876623
5.295527416084467 -0.3647565438145515
5.316541413767342 -0.3025037023803415
5.337555411450217 -0.5730997131866751
5.358569409133092 -1.0639608138195455
5.379583406815967 -1.1936870927698648
5.400597404498842 -1.1003121867771797
5.421611402181717 -1.4578695062400544
5.442625399864592 -1.6147443297379096
5.463639397547467 -1.2007321839904823
5.484653395230342 -1.4627527499991468
5.5056673929132165 -1.3870180706182835
5.526681390596091 -1.1742923901910507
5.5476953882789655 -1.1442703154821237
5.56870938596184 -0.8124758234383467
5.589723383644715 -0.7567307076700628
5.61073738132759 -0.24219862383124224
5.631751379010465 -0.5125378280491243
5.65276537669334 -0.14817757149732258
5.673779374376215 -0.059543461085104284
5.69479337205909 -0.07404084969062283
5.715807369741965 0.06838207045207122
5.73682136742484 0.00720846773340722
5.757835365107715 0.14904174929020586
5.77884936279059 0.21429843688006844
5.799863360473465 -0.12806012166224962
5.8208773581563396 -0.22006598530250332
5.8418913558392145 -0.49618236638574853
5.8629053535220885 -0.8841193381431568
5.8839193512049635 -1.2012248534668872
5.904933348887838 -1.4493445088282495
5.925947346570713 -2.1075446008438767
5.946961344253588 -1.541582498748978
5.967975341936463 -2.0147229289823425
5.988989339619338 -2.2078045542029896
6.010003337302213 -2.467111374682806
6.031017334985088 -1.8561933450948547
6.052031332667963 -2.222459733873345
6.073045330350838 -2.0535376309771367
6.094059328033713 -1.7887028848220365
6.115073325716588 -1.4994796532579218
6.136087323399463 -1.5785104619899815
6.1571013210823375 -1.0042099198350283
6.178115318765212 -0.9964468106577317
6.1991293164480865 -0.5511365331477565
6.220143314130961 -0.6913880447173069
6.241157311813836 -0.6721482997464577
6.262171309496711 -0.8929737091096578
6.283185307179586 -0.6980172227961545
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Data generation for polyglot data science example\n",
"\n",
"In order to run the full [Polyglot Data Science with IPython notebook](polyglot-ds.ipynb), you will need to install [Julia](https://julialang.org/downloads), and then the following (assuming a conda-based deployment that will automatically pull in R, otherwise you also need ot install R):\n",
"\n",
"```\n",
"conda install jupyter cython pandas matplotlib seaborn\n",
"conda install rpy2\n",
"pip install julia fortran-magic \n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We generate synthetic data according to\n",
"\n",
"$$\n",
"y(x) = a x + b x^2 + c \\sin(x^2) + \\cal{N}(0, \\epsilon)\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"npts = 300\n",
"eps = 0.2 # noise\n",
"a, b, c = 1, -0.2, 1 # model coefficients\n",
"\n",
"np.random.seed(1234)\n",
"x = np.linspace(0, 2*np.pi, npts)\n",
"y = a*x + b*x**2 + c*np.sin(x**2) + np.random.normal(scale=eps, size=npts)\n",
"plt.plot(x, y, 'o');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Write it to a CSV file for convenient retrieval in a \"typical\" workflow, Pandas does the job nicely:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.000000</td>\n",
" <td>0.094287</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.021014</td>\n",
" <td>-0.216828</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.042028</td>\n",
" <td>0.329982</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x y\n",
"0 0.000000 0.094287\n",
"1 0.021014 -0.216828\n",
"2 0.042028 0.329982"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.DataFrame({'x':x, 'y':y})\n",
"data.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x,y\n",
"0.0,0.09428703274649862\n",
"0.02101399768287487,-0.21682787079387694\n"
]
}
],
"source": [
"data.to_csv('data.csv', index=False)\n",
"!head -3 data.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sanity check"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.000000</td>\n",
" <td>0.094287</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.021014</td>\n",
" <td>-0.216828</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.042028</td>\n",
" <td>0.329982</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x y\n",
"0 0.000000 0.094287\n",
"1 0.021014 -0.216828\n",
"2 0.042028 0.329982"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2 = pd.read_csv('data.csv')\n",
"data2.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"x 5.894937e-14\n",
"y 1.431537e-14\n",
"dtype: float64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(data2-data).abs().sum()"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernel_info": {
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"nteract": {
"version": "0.8.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fperez
Copy link
Author

fperez commented Apr 22, 2018

@teoliphant
Copy link

This is really nice work! I look forward to more of this kind of interoperability.

@pnavaro
Copy link

pnavaro commented Apr 22, 2018

Nice work, thanks !

  • I had to install PyCall in julia with Pkg.add("PyCall")
  • Minor error in cell [25] sin(x**2) instead of sin(xx**2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment