- Sven-Hendrik Haase / Consultant
- Video
- PyO3
- Why?
- Rust is...
- Safe
- Modern
- Fast
- Statically and strongly typed
- Immutable by default
- Private by default
- Amazing tooling
- Great community
- You can use Rust to speed up your Python code
- Annotate your Rust code
- Compile to a
.so
file - Python supports
import
ing.so
files - Optionally add to
setup.py
- You can use Python to slow down your Rust code 😂 (could still be useful - e.g. Lua in Redis)
- rayon crate for data parallelism
- That Rust Book
- Marco Bonzanini / Consultant
- Video
- Slides
-
In the Vatican City there are 5.88 popes per square mile
- Even if the statistics are correct, they can be misleading
- Lurking variable
- ⬆️Ice Cream → ⬆️Drowning❓
- ⬆️Temperature → ⬆️Ice Cream, ⬆️Temperature → ⬆️Drowning
- Correlation between number of firefighters and damage - reduce damage by sending fewer firefighters? 🙃
- Simpson's paradox
- Bias - a systematic error
- Sampling bias - an error in your sampling process
- Data visualisation can give better insights but also mislead
- Significant means the results are reliable but not necessarily important (could be a small change)
- Data dredging / data fishing / p-hacking
- Sarah Diot-Girard / PeopleDoc
- Bag of words vs word embeddings (e.g. Word2Vec)
- Bias in word embeddings: Man → Developer, Woman → Homemaker 💥
- Interpretability can be used to get feedback from experts and make sure reasoning is unbiased
- With GDPR you have to be able to explain decisions
- Local interpretable model-agnostic explanations (LIME) 🍋
- Apophenia - seeing patterns that don't exist
- Illusory causation - confusing correlation and causation
- Paul Keating / Consultant
- Think about audience - interactive, batch, API
- Sometimes two audiences - users calling the library and users of the application
- Did you mean? error 👍
- Something went wrong error 👎
- Use
logger.exception
- Avoid identical messages for different error paths
- Need to test the error path
- Understandable, explicit, unambiguous, point in the right direction
- Dan Taylor / Microsoft
- Snippets and tasks
⌘P
then⌥⏎
to open in split tab⌘
click to open definition- 💡 Learn how to use watch in the debugger
- Right click and run to cursor (~temporary breakpoint)
- IntelliCode 🤯
- Language server is opt-in for now but will be the default soon
- VS Live Share 🤯
- Victor Stinner / Red Hat
- Inheritance from
object
🤢 long
vsint
🤢- Unicode 🤢
- 2008 - Python 3 was released
- All dependencies must be Python 3 compatible
- Some projects were forked at add Python 3 support
- Python 3 wall of shame
ensurepip
Remove Python 2 supportAdd Python 3 supportu"unicode"
re-added in Python 3.3 - doesn't do anything but helps port Python 2 code- Python 2 backports
- 95% of top 200 packages support Python 3
- Instagram on Python 3
- ⬇️12% CPU
- ⬇️30% Memory
time.monotonic()
added in Python 3.3
- Ines Montani / Explosion AI
- Video
- Slides
- Explosion AI
- spaCy
- Bootstrapped / self-funded / did consulting
- prodigy
- 🔴 Misconception #1 - you need to run at a loss
- Reasons to run at a loss: network effects, scale operations, predatory pricing, enterprise sales
- Upfront costs
- Bigger is not necessarily better
- Most businesses aren't "winner takes all"
- Optimise for median outcome
- 🔴 Misconception #2 - you need to hire lots of people
- 🚌 test
- Excellence requires authorship / ownership (not design by committee)
- Building the right thing
- Specialists - people don't understand what others are working on, more meetings...
- Generalists
- Complementary - mix of skills ✨
- Tree-shaped skills 🌳
- 🔴 Misconception #3 - you can't make good decisions without testing all of your assumptions
- Inverse of survivorship bias - We didn't do X and we failed, therefore X would have saved us.
- http://autopsy.io
- You can't replace logic with data
- Build things you think are good
- 🔴 Misconception #4 - the true value lies in your user data
- Sell products, not promises
- Focus on what you can really charge people money for right now
- Ship value, charge money
- Profit is the best KPI
- Konstantin Ignatov / Qrator
bytearray
- Succinct data structures use knowledge about the data
- PySDSL
- SDSL
x.bit_compress()
picks the most appropriate data type (e.g.int16
) for the data- Compression but still supports same operations
- Suffix arrays 🎉
- Sarah Bird / Mozilla
- Bokeh
- Cookie syncing - correlating cookies from multiple websites
- Zombie cookies - re-creating a cookie using local storage etc.
- fingerprintjs2
- dask ~distributed version of
pandas
- Mario Corchero / Bloomberg
side_effect
can be an arraypatch(..., autospec=True)
seal(mock)
- stops undefined attributes being mockedMock(wraps=func)
- a spysentinel
- You can name your mocks -
Mock(name="bob")
- Lynn Root / Spotify
- Used in Spotify's chaos monkey service
- Event driven hostname generation for DNS
- SLI - Service Level Indicators
- Be careful to avoid accidentally swallowing exceptions
asyncio.create_task
- Might want to handle cancelled tasks exceptions
asyncio.gather
swallows exceptions by default- http://rogue.ly/aio
- Bernat Gabor / Bloomberg
- tox
- detox - run environments in parallel
- Testing with different version of Python, Django etc.
- Hynek Schlawack / Variomedia
- Use environment variables
- environ_config library
- Don't put secrets in environment variables 🤔
/-/
prefix for internal endpoints (e.g./-/version
)- Liveness vs health endpoint
- Nicole Harris / PeopleDoc
- Pronounced Py-P-I
- Costs ~$1M/year to run
- Almar Klein / Consultant
- Video
- Slides
- Low-level representation of code
- Doesn't make too many assumptions about the architecture it will be run on
- Browsers (or other platforms) turn the WASM into native instructions
- WASM is a binary format but it has a human readable version
- It's safe - primarily designed to run in the browser
- ppci (pure python compiler infrastructure)
- Can compile a subset of Python to WASM
- Can compile WASM to native code
- Therefore Python to native code 🎉
- Can import WASM modules written in other languages
- Wow!
- Isabel Lopez / Smarkets
- Luigi for building the pipeline
- Apache Parquet column-oriented data store on Hadoop
- Spark (Amazon EMR)
- Columnar (colum-nar)
- RDD - Resilient Distributed Dataset
- 8M events (overkill?)
- Ed Singleton / Consultant
- Video
- Repetition, social awkwardness, over-stimulation, stubbornness, meltdowns, "fizzy mind"
- Over-stimulation - flood of information coming in
- Stubbornness - unreasonableness
- Correlation: insomnia, clumsiness, alcoholism, ADD / ADHS, low muscle tone, easting disorders
- Neurological differences: larger brains, more neurons
- Benefits: systemising, repetition / obsession, radical honesty, originality of thinking, more spare time, attention to detail
- Aspergers archetype: obsessive, blunt, intelligent, original thinker, not suited to physical work...
-
"It seems that for success in science or art a dash of autism is essential." - Hans Asperger
- Social problems: reseting "whatever" face, don't force socialising, try to tolerate meltdowns, less likely to be mentored
- Work patterns: the unknown is scary, Agile / Kanban, prefigure changes
- Meetings: don't expect people to speak up, actively manage turn-taking
- Emmanuel Leblond / Scille
- https://github.com/python-trio/trio
- Trio aims to be a more user-friendly API to
asyncio
asyncio
- difficult to debug- Coroutine tree
- Simpler than
asyncio
,twisted
, etc. - Happy eyeballs
- Cancels couroutines correctly
- Owen Campbell / Consultant
- ICI
- Leadership can be learnt
- Must be practiced
- Bounce
- Find opportunities outside of work to practice
- Schmooze 'em, bruise 'em or lose 'em. eek 🤔
- Priorities: customers, investors, other teams, processes, resources, deliverables. Must ensure good communication with these.
- Leader should work on lower priorities tasks in case they need to be dropped
- Leadership styles: dictatorial, paternalistic, consensual, democratic, hands off
- Dictator → Observer
- Rookie → Expert
- Dictator + Rookie = Git
- Craig Kerstiens / Citus
psqlrc
(\x auto
,\timing
, history)- pgtune
- Cache hit rate should be
>=99%
- Index hit rate should be
>=95%
EXPLAIN
andEXPLAIN ANALYSE
pg_stat_statements
(extension)- Use
GIN
index when a column has multiple values (e.g. an array) - Use
GIST
for shapes (geo-spatial) and full text search SP-GIST
andBRIN
for large tables (e.g. timeseries)- Composite, conditional, and functional indexes
- Safe migrations:
- Allow nulls but set default value
- Backfill
- Add constraint
CREATE INDEX CONCURRENTLY
- doesn't lock the table- Connection pooling - at application layer or daemon (pgBouncer)
- Replication
wal-e
/wal-g
orbarman
- Sharding with Citus (appears as a single database) - see Instragram talk
- Logical backup (
pg_dump
) vs physical backup - Use physical backups for larger databases
- Less load on system 👍
- Not portable 👎
pg_dump
won't work for databases>~50GB
- Yury Selivanov / EdgeDB
asyncio.run
serve_forever
@coroutine
will be deprecated soon- Try to avoid using event loop
- Context variables
- Trio
create_supervisor
TaskGroup
gather
doesn't cancel the other tasks when one fails 💥- Tokio - asyncio event loop in Rust
- David Beazley
- @dabeaz