sequenceDiagram
cmsCtrl->>dbCtrl: requests latest db update
dbCtrl-->>cmsCtrl: REST response
loop
cmsCtrl->>wordpress: request next latest update
wordpress-->>cmsCtrl: REST response
cmsCtrl->cmsCtrl: compares with latest update sent from dbCtrl
cmsCtrl->>dbCtrl: sends wordpress update if more recent than latest db update
dbCtrl->mongo: updates database (via nodejs driver)
dbCtrl->api: updates API instances (via websocket)
dbCtrl-->>cmsCtrl: REST response
end
dbCtrl->algolia: requests most recent update
loop
dbCtrl->dbCtrl: compares with latest update in db
dbCtrl->>algolia: deletes all records associated with updated article
dbCtrl->>dbCtrl: creates record(s) for the updated article
dbCtrl->>algolia: sends records for the updated article
end
webapp->>api: requests articles
api-->>webapp: REST response
webapp->algolia: search (via websocket)
-
-
Save vuldin/30dc49f6179076ef0cb2b66ec9dc9210 to your computer and use it in GitHub Desktop.
Right now our system has three components:
- Wordpress, with various plugins
- API
- webapp
The API is tightly coupled with Wordpress. On top of that, the API pulls data from Wordpress, modifies it, stores this modified Wordpress data in-memory, makes it available to webapp instances via REST, and then pulls updates from Wordpress via periodic REST calls. The API app does everything on the backend as far as the website is concerned.
Wordpress is not much more than a read-only database for our website's purposes. And it's not very good at that, since we have to constantly poll Wordpress to find out if any new post has been made or if any old post has been updated. Wordpress also scores bad when it comes to the post data it makes available. Wordpress mixes up the textual content (word, sentences, paragraphs, etc.) with the visualization data for that content. It's all one big string of text that is in rendered form, which includes all sorts of HTML tags and attributes that aren't needed (and get in the way) when it comes to our webapp deciding how best to display this content. The last problematic "feature" of Wordpress is the contstant need to ensure everything is up-to-date. In the past (and even today) Wordpress and the numerous plugins avaialble have had many issues with security, so keeping an up-to-date Wordpress instance is important. The main benefit of Wordpress is that people are able to easily create new content through the admin interface. Another feature that is important (but is not being handled in an ideal way) is image hosting.
Our web application is a React/Next/Mobx app running on Zeit's Now hosting platform. It has server-side rendering, dynamic routing, and currently pulls all data from our single API instance (also hosted in Now).
I’ve been thinking about how our system could be best designed in order to handle what we want it to do both now and in the future. I’m trying to take into account everything, including:
- what we currently need:
- a website that displays our wordpress content
- the features and capabilities we would like to have in the future:
- internet radio
- live streaming
- better image handling (responsive images, progressively loaded images, image CDN integration)
- membership communication management through mailing list and possibly SMS
- scheduling
- editable content
- and other features
- the fact that, for now, we have a very small number of people who are able to spend time improving or maintaining the site
- cost
So I’m trying to outline a system that allows for all future functionality without making the system so complicated that it becomes hard for people to jump in and modify things or implement new functionality. The way to do this is to make a modular system where every component handles one or two simple tasks. Then each component has an interface that remains unchanged. Implementation details of one component are hidden from the others.
The components I've come up with are:
- CMS controller
- DB controller
- Wordpress
- mongo database
- API
- Webapp
These components would handle tasks and work together in the following ways:
cmsCtrl
: CMS controller, interface intowordpress
- handles communication between Wordpress and
dbCtrl
- gets latest update info from
dbCtrl
- polls
wordpress
for latest update and compares it todbCtrl
's latest update - if these updates don't match, then it sends that update to
dbCtrl
and requests the next latest update fromwordpress
- repeats this process until there is a match
- handles communication between Wordpress and
dbCtrl
: database controller, interface intomongo
- stores articles in a form that is ready for
webapp
instances to use - starts a WebSocket channel for communicating with the
api
- each time an update comes in from
cmsCtrl
, an update is made tomongo
and sent to theapi
dbCtrl
also checks what the latest update is inalgolia
, and compares to the latest updates inmongo
- if there are any updates not found in
algolia
, then the affectedalgolia
records are deleted, new ones are created, and then pushed toalgolia
dbCtrl
is the most complicated component and we may want to split algolia-specific functionality into aseachCtrl
component
- stores articles in a form that is ready for
wordpress
: for creating and modifying content, hosting imagesmongo
: for storing articles in a format that is ready forwebapp
consumptionapi
: source of all information for thewebapp
- keeps an in-memory copy of all posts to send on
webapp
requests - receives updates to posts from
dbCtrl
- communicates with
webapp
instances via REST (for now)
- keeps an in-memory copy of all posts to send on
webapp
: react/mobx visualization of CMS content (posts, categories, etc.)
While REST could be used in all cases, WebSockets makes it possible to have real-time, subscription-based communication.
Apps communicating via WebSockets bring up the connection at startup, and then subscribe to channels where information is sent to them from either the server or other peers over time, when the update happens.
It would be easy for dbCtrl
to send changes via REST if we know we'll always have only a single api
instance.
But what if we want to have multiple api
instances in the future for scalability?
Then dbCtrl
would have to somehow manage a list of all api
instances in order to be able to send updates.
This complexity is handled by WebSockets: dbCtrl
sets up an update channel and then any api
instance subscribes to that channel at startup.
dbCtrl
then just sends the update message across the update channel and all api
instances get the broadcasted message.
This same bidirectional, real-time communication could eventually be used between our API instances and our webapp instances.
This would enable any instance of our site to be updated in real-time as new articles (or updates to articles) are being made.
A few awesome possibilities:
- readers could see how many readers are on the current page
- real-time social media features (too many to list)
- live audio/video streaming notifications
So there would be one database (and an associated controller), one CMS (and an associated controller), and then our API could then focus on making the latest version of our articles available to anyone who pulls up our site. The cool thing about this is that it would be fairly easy to spin up multiple API instances in the future if we needed depending on traffic… there would still only be a single database and a single CMS. Also the details around how to connect to a specific database (mongo in this case) or CMS (wordpress in this case) are all separated into the associated DB and CMS controller. In the future if we find a much better CMS (which I’m convinced wouldn’t be that hard to do) then it would be easy to switch over.
This would also get us much closer to having editable content on the website (being able to fix typos or even create new posts directly from the site, for example). Right now we are strictly read-only from wordpress. All changes happen in wordpress and then we only ever pull those changes into the API. The API makes changes to that content, but those changes are not persisted and have to be redone each time the API restarts. Once we persist this modified post data, then our API can always assume the latest updates are available form the DB controller. We still wouldn’t be able to easily edit the data in wordpress, but there are other CMS solutions that would make this part much easier. If we find one that is awesome, then we would just create a CMS controller that works with that CMS, and then everything else in our system (database, API, website, search) would keep working as normal.
There are more components to our system as it currently stands, but the benefit is that each component is very simple in comparison to what is going on now. Right now the API handles all of this functionality itself, in a single instance. This means that if anything needs to be modified, then we have look the all the functionality the API is handling and make sense of it before being able to change or add something to it.