Google Summer of Code - Django - Ichlasul Affan - Universitas Indonesia
Table of Contents
- 1 Abstract
- 2 Implementation
- 3 Schedules and Milestones
- 4 About Me
As it stands right now, parallel support in Django Test are using
multiprocessing
library that will spawn multiple different processes
with 1 parent process. Django Test currently only supports spawning
child processes with fork
start method. fork
start method will
clone all resources and file descriptors, and it will run the child
process right from the same state as the parent process has.
Meanwhile, Microsoft Windows and recent version of Apple Mac OS X only
supports spawn
start method. spawn
start method means starting
the child process from scratch, using only the initialization
function defined via init_function
parameter when initializing
multiprocess.Pool
. This also means it will be almost completely
separated with the parent process, making it harder to synchronize
previously initiated components (App
registry, testing database
configurations, unittest
library).
- Properly initializing or linking
App
registry and initial states for each process spawned usingspawn
start method. - Ensure database cloning, operations, and cleanup works on
spawn
method for all of Django's officially supported DBMS. - (Optional Goal) I see there is some issues in coverage analysis when using parallel testing. The optional goal is to make sure coverage analysis works properly by splitting the analysis to also work for each child processes.
The main benefit of this project is to enable a reliable parallel testing support for Windows and recent Mac OS X.
Before I'm writing this proposal, I tried to understand Django parallel
test runner workflow at first. After my first observation, current
workflow of Django testing is initializing the App
registry first
using django.setup()
when it first touches management
subpackage, then it initializes test databases, cloning them if parallel
testing is enabled. And then, test suites will be detected and passed to
ParallelTestSuite
. Current workflow of ParallelTestSuite
is
grouping every test suite to be queued for each process, then initialize
every process via an init function (assign each process with a counter
and a cloned database) before running each test suite using
run_subsuite()
method.
Aside from that, I tested several database in the meantime: SQLite,
MySQL, and PostgreSQL. They are all seems fine with current database
cloning method. I think the problem might lie on SQLite with in-memory
database. In my test app, the parallel testing creates new database
files in disk, not in memory. As spawn
method only shares minimum
resource to run init function (Python won't clone SQLite in-memory
database also), we might need to force SQLite test database to use
file-based database regardless of user settings.
This is my pull request on my initial work for this proposal (PR #12607): django/django#12607
As the Python multiprocessing
documentation said, spawn
will
only share resources necessary to run the init function. Currently, the
init function only increments the counter and assigns database to that
process. So, the first thing to do is modify the init function
specifically for spawn
method to run django.setup()
and
setup_test_environment()
. Secondly, after doing the setup, we need
to make sure that the process will use the existing test environment and
database.
After altering the init function, we need to pay attention on database support. As described on the "Initial Observation Result", we might need to adjust some of the existing default settings for supported test databases. There are some possibilities for SQLite, though:
- Force using file-based database
- Get deferred SQL from parent DB's migrations result, and use that to initialize new database.
Lastly, I see some problems when using coverage analysis alongside
parallel testing. Existing coverage.py
usage only shows a line as
covered when it is being run by the parent process. The rough solution
idea for this problem is to include the coverage analysis thread to all
test processes, and then merge the result. In my opinion, merging the
coverage analysis result is pretty simple, by only using OR operator.
This topic will need intense discussion, and if approved, it will be my
optional task if mandatory tasks finished earlier than expected.
The implementation will be as simple as possible. There will be a minimum amount of changes on existing Django parallel test runner code.
If we choose the latter possibility for SQLite (get deferred SQL from parent DB's migration result and reuse them), it might use more RAM than the first possibility. That is caused by dumping SQL and reinitializing for each cloned database, which in the first option, the same effect will be achieved by just copying the file. But, the latter possibility gives users more flexible option regarding performance.
This is my planned schedules and milestones based on current possibilities at the time I write this proposal. I will document my progress (at least monthly) via Medium (https://medium.com/ichlaffterlalu).
I will try to communicate as much as possible about this idea proposal even after the proposal is submitted, to make sure I have enough knowledge needed to kickstart this project.
During the community bonding period, I will try to:
- Do more research about how can I work on Django Test multiprocessing
with
spawn
start method. - Contribute on more patches especially about Django Test.
- Look up more often on Django dev's IRC and Google Groups to familiarize myself among Django community.
The goal is to familiarize myself with Django's style of work and to finalize my idea about implementations of this project.
Ied Al-Fitr holiday will be on May 25-27, 2020. I will stay at home, as COVID-19 status here seemingly worser each day, thus will increase my availability for this project.
I will implement better version of the rough POC that I've explained in
the "Implementation: Overview" section. Based on my first experiment,
there is little difference in performance, mainly on the first
initiation of the Process
. But there's almost no difference in
performance on running test cases in general.
I will examine further regarding other Django components that might need to be adjusted.
I will consult more with the community about how I should design database integration changes.
I will allocate 1 week to test on MySQL/MariaDB. Based on my initial observation, my rough Proof of Concept have no problems on MySQL/MariaDB yet. I will look on further issues after submission of this proposal.
In this subphase, I will document my work on Django official documentation draft and my Medium blog. In the evaluation session, this phase's target is: using MySQL, I can do parallel testing on some random projects in Windows.
During this phase, I might have final exam for this semester on June 8 - June 19, 2020, if COVID-19 emergency status aren't lifted up earlier. The final exam will most likely be a take home task, so I might reduce my work speed to half or 3/4. I will try to compensate them after the final exam period ends.
At this phase, I will start fixing database issues on Django Test
multiprocessing for spawn
start method. I will allocate phase 2 to
specifically troubleshoot database integration for each officially
supported DBMS.
- PostgreSQL (1 week)
- SQLite (1 week)
- Oracle (1.5 week)
In this subphase, I will document my work on Django official documentation draft and my Medium blog. In the evaluation session, this phase's target is to present that Django will be able to do parallel testing in Windows using all of currently supported DBMS.
In this phase I will do further bugfixing and testing for this project. Here is the specific timeline for this phase:
- Ensure compatibility with 3rd party DB backends (1 week). I will try
to use
django-mssql
as my experiment field. - MacOS Testing, User Acceptance Test (1 week)
- Bugfixing and Minor Improvements (1.5 week)
If the bugfixing phase is done earlier, I will start on another branch to do the optional goal: coverage analysis integration.
In this subphase, I will document my work on Django official documentation draft and my Medium blog. In the evaluation session, this phase's target is to present that Django will be able to do parallel testing in Windows and Mac OS using all of currently supported DBMS.
Ied Al-Adha holiday will be on July 31 - August 1, 2020. I will do some inner city trip, but if COVID-19 emergency status at Greater Jakarta weren't lifted up at that time, I will stay at home, thus will increase my availability for this project.
Unfortunately, I don't have a MacBook. I will try my best to compensate that legally: maybe by borrowing a Macbook from my neighbour to create official ISO files for virtual machine.
If the optional goal (better coverage analysis integration) is also accepted, I will get to work on it. My university semester is on early September, so I still have 2 weeks to do the job.
My name is Ichlasul Affan (please call me Affan). I am first year
Computer Science master student at Universitas Indonesia. I live in
Gunung Sindur, Bogor Regency, West Java, Indonesia. My current time zone
is UTC+7 (Asia/Jakarta)
. I've been coding in Python for the past 4
years. I also code in Java, JavaScript, TypeScript, and a little bit of
C and Erlang.
I have involved in many web projects using Django as the main framework, as my faculty mainly uses Python for teaching and learning purposes. Here are lists on some of my projects made using Django.
- EXAMICS: This website is made using Django REST Framework and Vue.js. Main purpose of this website is to arrange exam schedules based on several constraints: room availability, room capacity, student enrollments for each course, and time slots.
- Tot.bio: This is an Internet of Things project to analyze diseases based on human waste using specifically built Raspberry Pi. This project is part of Microsoft Imagine Cup 2020 Asia Semifinals. The web app is made using Django REST Framework and Vue.js. Source code for the web app is in https://gitlab.com/potay/company-profile (front end) and https://gitlab.com/potay/webservice (back end).
I started contributing to Django in March this year. Currently, I have worked on an easy-picking ticket at #31351. I alsow worked on at #31451. I also looking for further contributions during and after the submission of this proposal.
You can contact me via email at ichlaffterlalu (at) gmail.com
(primary) or ichlasul.affan (at) ui.ac.id
(secondary but faster
response). You can find me on IRCs, Django forum, and Google Groups as
ichlaffterlalu
or Ichlasul Affan
.
Currently, my work hours is from 10.00 - 00.00 (UTC+7) every day (including weekends). I most likely will not going anywhere out because of COVID-19 emergency status on Greater Jakarta at the time I write this proposal. I am also having break time at 12.00, 16.00, 18.30, and 20.00 due to my daily prayers.