Skip to content

Instantly share code, notes, and snippets.

@tomdyson
Last active February 6, 2024 03:08
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save tomdyson/ef8c2f684620b84feaddfd7454e09647 to your computer and use it in GitHub Desktop.
Save tomdyson/ef8c2f684620b84feaddfd7454e09647 to your computer and use it in GitHub Desktop.
Create 35k Wagtail pages of Wikipedia film plots

Create Wagtail pages programmatically

This short recipe demonstrates how to create Wagtail pages programmatically. It may also be useful for testing Wagtail development against a reasonable volume of page data (about 35,000 film plots, from English Wikipedia).

Instructions

In a virtualenv:

pip install wagtail
wagtail start wagtailfilms
cd wagtailfilms
./manage.py migrate
./manage.py createsuperuser
./manage.py startapp films

add films to INSTALLED_APPS in wagtailfilms/settings/base.py

Edit films/models.py:

from django.db import models

from wagtail.models import Page
from wagtail.fields import RichTextField
from wagtail.admin.panels import FieldPanel


class FilmIndexPage(Page):
    pass


class FilmPage(Page):
    parent_page_types = ["films.FilmIndexPage"]

    release_year = models.IntegerField("Release year")
    director = models.CharField(max_length=250, blank=True)
    wiki_page = models.URLField()
    plot = RichTextField(blank=True)

    content_panels = Page.content_panels + [
        FieldPanel("release_year"),
        FieldPanel("director"),
        FieldPanel("wiki_page"),
        FieldPanel("plot", classname="full"),
    ]
    
    search_auto_update = False

Migrate the changes:

./manage.py makemigrations
./manage.py migrate
mkdir -p films/management/commands
touch films/management/commands/importfilms.py

Edit importfilms.py:

import csv
from django.core.management.base import BaseCommand
from wagtail.core.models import Page
from films.models import FilmIndexPage, FilmPage


class Command(BaseCommand):
    help = "Imports 35k film plots from Wikipedia"

    def handle(self, *args, **options):
        # delete existing film index pages and film pages
        FilmPage.objects.all().delete()
        FilmIndexPage.objects.all().delete()
        # create a film index page
        home = Page.objects.get(id=3)
        films_index_page = FilmIndexPage(title="Films")
        home.add_child(instance=films_index_page)
        films_index_page.save_revision().publish()
        # import film pages
        reader = csv.DictReader(open("wiki_movie_plots_deduped.csv"))
        for row in reader:
            film_page = FilmPage(
                title=row["Title"],
                release_year=row["Release Year"],
                director=row["Director"],
                wiki_page=row["Wiki Page"],
                plot=row["Plot"],
            )
            films_index_page.add_child(instance=film_page)
            film_page.save_revision().publish()
            print("published film page " + row["Title"])

Download and unzip wiki_movie_plots_deduped.csv from https://www.kaggle.com/jrobischon/wikipedia-movie-plots.

Create the film index page and import the films:

./manage.py importfilms

To load the same data more quickly in another environment, first export it with Django's dumpdata command:

./manage.py dumpdata --natural-foreign --natural-primary --indent 2 \
    -e contenttypes -e auth.permission \
    -e wagtailcore.groupcollectionpermission \
    -e wagtailcore.grouppagepermission -e wagtailimages.rendition \
    -e wagtailsearch.sqliteftsindexentry \
    -e sessions > films.json

Then ./manage.py loaddata films.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment