Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ichlaffterlalu/db244e6b304bf499a3732774ab222d90 to your computer and use it in GitHub Desktop.
Save ichlaffterlalu/db244e6b304bf499a3732774ab222d90 to your computer and use it in GitHub Desktop.
Make ParallelTestSuite works on Windows and latest Mac OS

Django Parallel Test Runner for Windows and MacOS

Google Summer of Code - Django - Ichlasul Affan - Universitas Indonesia

Table of Contents

Abstract

Current Support for Parallel Testing in Django

As it stands right now, parallel support in Django Test are using multiprocessing library that will spawn multiple different processes with 1 parent process. Django Test currently only supports spawning child processes with fork start method. fork start method will clone all resources and file descriptors, and it will run the child process right from the same state as the parent process has.

Meanwhile, Microsoft Windows and recent version of Apple Mac OS X only supports spawn start method. spawn start method means starting the child process from scratch, using only the initialization function defined via init_function parameter when initializing multiprocess.Pool. This also means it will be almost completely separated with the parent process, making it harder to synchronize previously initiated components (App registry, testing database configurations, unittest library).

Goals

  1. Properly initializing or linking App registry and initial states for each process spawned using spawn start method.
  2. Ensure database cloning, operations, and cleanup works on spawn method for all of Django's officially supported DBMS.
  3. (Optional Goal) I see there is some issues in coverage analysis when using parallel testing. The optional goal is to make sure coverage analysis works properly by splitting the analysis to also work for each child processes.

Benefits

The main benefit of this project is to enable a reliable parallel testing support for Windows and recent Mac OS X.

Implementation

Initial Observation Result

Before I'm writing this proposal, I tried to understand Django parallel test runner workflow at first. After my first observation, current workflow of Django testing is initializing the App registry first using django.setup() when it first touches management subpackage, then it initializes test databases, cloning them if parallel testing is enabled. And then, test suites will be detected and passed to ParallelTestSuite. Current workflow of ParallelTestSuite is grouping every test suite to be queued for each process, then initialize every process via an init function (assign each process with a counter and a cloned database) before running each test suite using run_subsuite() method.

Aside from that, I tested several database in the meantime: SQLite, MySQL, and PostgreSQL. They are all seems fine with current database cloning method. I think the problem might lie on SQLite with in-memory database. In my test app, the parallel testing creates new database files in disk, not in memory. As spawn method only shares minimum resource to run init function (Python won't clone SQLite in-memory database also), we might need to force SQLite test database to use file-based database regardless of user settings.

Overview

This is my pull request on my initial work for this proposal (_PR #12607): django/django#12607

ParallelTestSuite Process Initialization

As the Python multiprocessing documentation said, spawn will only share resources necessary to run the init function. Currently, the init function only increments the counter and assigns database to that process. So, the first thing to do is modify the init function specifically for spawn method to run django.setup() and setup_test_environment(). Secondly, after doing the setup, we need to make sure that the process will use the existing test environment and database.

Database Integration and Further Testing

After altering the init function, we need to pay attention on database support. As described on the "Initial Observation Result", we might need to adjust some of the existing default settings for supported test databases. There are some possibilities for SQLite, though:

  1. Force using file-based database
  2. Get deferred SQL from parent DB's migrations result, and use that to initialize new database.

Coverage Analysis for Multiprocessed Tests

Lastly, I see some problems when using coverage analysis alongside parallel testing. Existing coverage.py usage only shows a line as covered when it is being run by the parent process. The rough solution idea for this problem is to include the coverage analysis thread to all test processes, and then merge the result. In my opinion, merging the coverage analysis result is pretty simple, by only using OR operator. This topic will need intense discussion, and if approved, it will be my optional task if mandatory tasks finished earlier than expected.

Advantages

The implementation will be as simple as possible. There will be a minimum amount of changes on existing Django parallel test runner code.

Disadvantages

If we choose the latter possibility for SQLite (get deferred SQL from parent DB's migration result and reuse them), it might use more RAM than the first possibility. That is caused by dumping SQL and reinitializing for each cloned database, which in the first option, the same effect will be achieved by just copying the file. But, the latter possibility gives users more flexible option regarding performance.

Schedules and Milestones

This is my planned schedules and milestones based on current possibilities at the time I write this proposal. I will document my progress (at least monthly) via Medium (https://medium.com/ichlaffterlalu).

Pre-GSOC

I will try to communicate as much as possible about this idea proposal even after the proposal is submitted, to make sure I have enough knowledge needed to kickstart this project.

Phase 0: Community Bonding (May 5 - June 2, 2020)

During the community bonding period, I will try to:

  1. Do more research about how can I work on Django Test multiprocessing with spawn start method.
  2. Contribute on more patches especially about Django Test.
  3. Look up more often on Django dev's IRC and Google Groups to familiarize myself among Django community.

The goal is to familiarize myself with Django's style of work and to finalize my idea about implementations of this project.

Ied Al-Fitr holiday will be on May 25-27, 2020. I will stay at home, as COVID-19 status here seemingly worser each day, thus will increase my availability for this project.

Phase 1: Initial Implementation (June 2 - July 4, 2020)

App Registry Initialization (1 week)

I will implement better version of the rough POC that I've explained in the "Implementation: Overview" section. Based on my first experiment, there is little difference in performance, mainly on the first initiation of the Process. But there's almost no difference in performance on running test cases in general.

Look for Other Testing Issues (0.5 weeks)

I will examine further regarding other Django components that might need to be adjusted.

Initialize Database Integration POC (1 week)

I will consult more with the community about how I should design database integration changes.

MySQL/MariaDB Integration (1 weeks)

I will allocate 1 week to test on MySQL/MariaDB. Based on my initial observation, my rough Proof of Concept have no problems on MySQL/MariaDB yet. I will look on further issues after submission of this proposal.

Evaluation and Documentations on Phase 1 (0.5 weeks)

In this subphase, I will document my work on Django official documentation draft and my Medium blog. In the evaluation session, this phase's target is: using MySQL, I can do parallel testing on some random projects in Windows.

Additional Notes on Phase 1

During this phase, I might have final exam for this semester on June 8 -June 19, 2020, if COVID-19 emergency status aren't lifted up earlier. The final exam will most likely be a take home task, so I might reduce my work speed to half or 3/4. I will try to compensate them after the final exam period ends.

Phase 2: Database Integration (July 4 - August 1, 2020)

At this phase, I will start fixing database issues on Django Test multiprocessing for spawn start method. I will allocate phase 2 to specifically troubleshoot database integration for each officially supported DBMS.

  1. PostgreSQL (1 week)
  2. SQLite (1 week)
  3. Oracle (1.5 week)

Evaluation and Documentations on Phase 2 (0.5 weeks)

In this subphase, I will document my work on Django official documentation draft and my Medium blog. In the evaluation session, this phase's target is to present that Django will be able to do parallel testing in Windows using all of currently supported DBMS.

Phase 3: Further Testing (August 1 - September 1, 2020)

In this phase I will do further bugfixing and testing for this project. Here is the specific timeline for this phase:

  1. Ensure compatibility with 3rd party DB backends (1 week). I will try to use django-mssql as my experiment field.
  2. MacOS Testing, User Acceptance Test (1 week)
  3. Bugfixing and Minor Improvements (1.5 week)

If the bugfixing phase is done earlier, I will start on another branch to do the optional goal: coverage analysis integration.

Evaluation and Documentations on Phase 3 (1 weeks)

In this subphase, I will document my work on Django official documentation draft and my Medium blog. In the evaluation session, this phase's target is to present that Django will be able to do parallel testing in Windows and Mac OS using all of currently supported DBMS.

Additional Notes on Phase 3

Ied Al-Adha holiday will be on July 31 - August 1, 2020. I will do some inner city trip, but if COVID-19 emergency status at Greater Jakarta weren't lifted up at that time, I will stay at home, thus will increase my availability for this project.

Limitations

Unfortunately, I don't have a MacBook. I will try my best to compensate that legally: maybe by borrowing a Macbook from my neighbour to create official ISO files for virtual machine.

Post-GSOC

If the optional goal (better coverage analysis integration) is also accepted, I will get to work on it. My university semester is on early September, so I still have 2 weeks to do the job.

About Me

Background

My name is Ichlasul Affan (please call me Affan). I am first year Computer Science master student at Universitas Indonesia. I live in Gunung Sindur, Bogor Regency, West Java, Indonesia. My current time zone is UTC+7 (Asia/Jakarta). I've been coding in Python for the past 4 years. I also code in Java, JavaScript, TypeScript, and a little bit of C and Erlang.

My Latest Projects

I have involved in many web projects using Django as the main framework, as my faculty mainly uses Python for teaching and learning purposes. Here are lists on some of my projects made using Django.

  1. EXAMICS: This website is made using Django REST Framework and Vue.js. Main purpose of this website is to arrange exam schedules based on several constraints: room availability, room capacity, student enrollments for each course, and time slots.
  2. Tot.bio: This is an Internet of Things project to analyze diseases based on human waste using specifically built Raspberry Pi. This project is part of Microsoft Imagine Cup 2020 Asia Semifinals. The web app is made using Django REST Framework and Vue.js. Source code for the web app is in https://gitlab.com/potay/company-profile (front end) and https://gitlab.com/potay/webservice (back end).

Current Contributions to Django

I started contributing to Django in March this year. Currently, I have worked on an easy-picking ticket at #31351. I alsow worked on at #31451. I also looking for further contributions during and after the submission of this proposal.

My Contacts and Availability

You can contact me via email at ichlaffterlalu (at) gmail.com (primary) or ichlasul.affan (at) ui.ac.id (secondary but faster response). You can find me on IRCs, Django forum, and Google Groups as ichlaffterlalu or Ichlasul Affan.

Currently, my work hours is from 10.00 - 00.00 (UTC+7) every day (including weekends). I most likely will not going anywhere out because of COVID-19 emergency status on Greater Jakarta at the time I write this proposal. I am also having break time at 12.00, 16.00, 18.30, and 20.00 due to my daily prayers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment