Skip to content

Instantly share code, notes, and snippets.

@Swader
Last active April 7, 2018 15:21
Show Gist options
  • Save Swader/90859116a0f355c8cebd7d2f7977842a to your computer and use it in GitHub Desktop.
Save Swader/90859116a0f355c8cebd7d2f7977842a to your computer and use it in GitHub Desktop.

Having looked at ArticleBundle, I have the following DX improvements in mind:

  1. Add automatic bundle registration into post-install scripts, including the merging of config.yml values. Additionally, make it interactive, so users are asked about keeping the default type, route structure, etc. Everything should be interactive for ease of use unless -n is specified during installation. This interactive/automatic portion should also include the creation of default example templates, running post-install commands like index and translation generation, etc.

  2. It should be clarified (especially for new users) what the "default_type" option in sulu_core config does.

  3. Docs should contain instructions about configuring the current user role to be able to use the new bundle's functionality.

  4. ElasticSearch 5.x is notorious for powering down randomly due to missing RAM (it uses 1.5GB in idle mode), and this can break the Article UI in the Admin UI. A generic 500 error will be shown whereas if one visits this link that produced a 500 error, one can see the specific reason. I recommend this error message be parsed and presented to the user in the admin UI instead of the infinite loader of doom as it is right now.

I've been working with Sulu for a few weeks, getting my bearings and just trying to really learn it, and there's this one maddening rabbit hole that just keeps getting deeper. This is going to be a long post, so strap yourself in and grab a cuppa if you'd like to join the discussion.

There's the Sulu CMS. It's structured like this:

Image from here.

Okay, ignoring the features of Sulu itself in the top part, the "logic" of Sulu is based on Symfony and Symfony CMF, while the data is stored in ... all of that at the bottom? We'll get to that.

Why point out both Symfony and Symfony CMF, if Symfony CMF is just a collection of bundles that "lets users easily add CMS functionality to their Symfony apps"? Eh.

And of the three Data blocks, which one is used for what? Why are all of those listed? Let's look into each separately.

Data

Okay, so Sulu uses PHPCR presumably because Symfony CMF uses PHPCR. So far so good. So what is PHPCR?

PHPCR is a PHPized JCR specification. What? Okay, let's re-check that.

The PHP Content Repository is an adaption of the Java Content Repository (JCR) standard, an open API specification defined in JSR-283. The API defines how to handle hierarchical semi-structured data in a consistent way.

What does this even mean? Okay, let's look into JCR's JSR-283. It turns out it's a 9 year old specification from Java. Which now begs the questions, how do you PHPize a specification? A specification is a specification, and PHP implementations could just as well be respecting that, no? But okay, we have two specifications now, with PHPCR taking the upper hand because we're in PHP land now, so let's move on.

If it's a spec, then it works, right? Then it delivers what it was specced out to do, and no implementation can claim it implements it without actually implementing everything, otherwise it's not an implementation, right? You can't implement an interface in PHP unless you literally implement the required methods, right?

Yeah, about that... And indeed, if we check the feature table, none of the implementations have all of the spec-prescribed features.

So... WordPress has been powering 75% of the web for decades and has had post versions for half that time, all in MySQL with almost no insurmountable scaling issues, and a "new" implementation of a "new" specification of an "old" specification still doesn't? Even 4 whole years after development on it started?

but for unstructured data in a tree structure with optional schema its already much better than anything you will quickly whip up yourself

— Lukas Kahwe Smith (@lsmith) 6 June 2017
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

I think our definition of "quickly" differs. It would be trivially easy to slap content versions into an RDBMS WP-style, and it would scale just fine if indexed properly. "But this couples you to MySQL" some would say. Sure... and using one of the two different and incomplete implementations of a specification doesn't? Keep in mind that when using PHPCR, you're basically forced to use one of those two right now. Let's focus on that next.

Jackalope Doctrine DBAL

From the website:

Uses a conventional RDBMS (e.g. MySQL) to store the content repository.

So... went through 2 layers of specification and ended up with a partial implementation in order to, what, do the same thing WordPress has been doing, but worse? Because this implementation doesn't support versioning.

To summarize, to properly use a broken implementation of an outdated spec written for an obsolete language to store my data, I need to know not only the whole mess of things listed above, but also be familiar with Doctrine, which carries its own learning curve with it.

So what's my choice if I want to use versioning with Sulu, or any Symfony CMF-based CMS?

Jackalope Jackrabbit

Jackrabbit, it appears, is some kind of Java storage engine from Apache. Jackalope Jackrabbit is an implementation of PHPCR (not really, as we saw in the feature table).

Here's what's interesting about Jackrabbit. It can only support up to 10000 children per node. Since CMS pages are often tree-based, that's fine. A page will have a few children and nothing more to it. But consider, for example, an existing enterprise online magazine that's been online for a while. Consider SitePoint's 50k current posts, still getting hits every day. Consider wanting to migrate. In that case Jackrabbit becomes useless, and a Sulu installation needs something like this - this bundle will auto-shard content into smaller fragments (e.g. by month), so the 10k limit isn't reached. For all intents and purposes, a hack, but one that works. But a hack that's specific to Sulu, and that isn't even in that JCR "solution to all problems in the world" specification.

This is all after you install yet another piece of software (Jackrabbit), bog down your server's resources by an extra 30% (Java) and learn to use it.

Edit: just realized that to use the ArticleBundle, you actually need ElasticSearch (why, if PHPCR provides search?) which is a nightmare to install especially considering everything else we had to install so far. I'm now wondering about the target audience for Sulu. Surely a company which just needs a CMS for pages could easily get those up and running with a static page generator? And a company which wants to make an online magazine (like SitePoint) would do better using WP which has versioning and doesn't need even 20% of the software running on the server that Sulu does. So I'm kind of confused as to who the intended end user of Sulu apps is. Can someone list some companies and their areas of work that use Sulu so I can learn from examples?

Alternatives

After all this, after 2 specifications and 5 levels of tools spread across all of that, what is a good alternative for a CMS to store content? Here are some thoughts:

  • RDBMS like it's been done so far. It works. It's dirty, but it's no more dirty than incomplete specifications that don't work with each other. Add a new row per version and be done with it. It scales just fine, trust us.
  • MySQL and Postgre both now support JSON fields. These can easily store and search unstructured data.
  • Store content as files. Built in version control right there. Keep permissions, relations, etc, in RDBMS.

Help

With all this said, I'm desperately looking for human-friendly explanations on why PHPCR exists, why Symfony CMF exists, why it's all so complex and incomplete, and why it's in any way, shape or form a better solution than any of the alternatives listed above. I want to learn this, please help me understand, but try to think from a new user perspective, not someone who used or developed this and thus must love it.

I spent a good bit of time getting Vagrant to play along nicely with Sulu because I approached it from a new user perspective, several times, until it clicked and worked completely. I'd like to feel that click in terms of Sulu, PHPCR, and SymfonyCMF in general.

Patrik says that admittedly, the PHPCR implementations need more love and contributors, and sure, yeah, I can see that. But in my opinion, a project (PHPCR) so dramatically incomplete and lacking of features should never end up in a project as serious as Sulu that's now rapidly approaching version 2.0 and having people (and companies) depend on it. As a company owner looking to adopt a new CMS to peddle instead of WP, this concerns me greatly.

@danrot
Copy link

danrot commented Jun 7, 2017

One of our previous CMS also stored data in a RDBMS, and that didn't work out quite well. I think that's because there is a huge and crucial difference between Sulu and Wordpress: We support the definition of templates, which means our content is quite unstructured, and can be changed by some metadata. In Wordpress you have more or less only a title and a long text field. I know there are plugins allowing you to build more fields, but I don't think that the resulting db schema is very efficient (missing experiences here, just a guess).

Our old CMS was creating a separate database schema for every template, and I think we don't have to discuss that this solution is far from being optimal either. Whenever you change a template the database have to updated, which can also cause a lot of issues. Other systems still use that approach. Storing content as files is probably even more pain: I don't think that it is a lot of fun to maintain references and other stuff this way.

Regarding the different implementations in Jackalope (Jackrabbit and Doctrine DBAL): I totally agree with you in that point. It is really annoying, also for us during development, that they only cover parts of the specification. That's probably caused by the not too wide adoption of PHPCR.

We still decided to use PHPCR, but that decision is already 4 years ago. I can also agree that getting started with PHPCR is quite hard, because it's not really easy to grasp the concepts. We still decided to go for it, because it seemed like a good fit: It was structuring its data in a tree and supported unstructured data, which we liked a lot, because we were having these schema issues with RDBMS. Sure, we could have had a look at how that was implemented in jackalope-doctrine-dbal, find out they were storing the unstructured data as XML, and could have built that on our own without using PHPCR. But we were not experienced and confident enough to do so, so we decided to use a solution which was built by a few clever people. We still don't regret that decision, because we have learned a lot this way.

At that time there were also some people working on PHPCR, so that we were looking forward to a bright future. Sadly, the adoption didn't grow as we hoped, and so the project is not in the great state we hoped back then. In addition to that I was a little bit disappointed when implementing versioning and publishing, because the built-in PHPCR solution didn't fit our requirements, and I had to build a few hacks in order to get it to work. So would I choose PHPCR again when starting all over? I am not so sure anymore.

The other thing is that we would have other options today: RDBMS might be a better fit now, since PostgreSQL and MySQL (starting with version 8), will support JSON fields, which would also be a great fit for the unstructured part of our data. Adding a nested set data structure and a versions table to that and we might have a better solution as we have now. (Read that part carefully, haven't seriously tested that until now, and won't be sure if my statements would hold after serious tests).

However, these options were not available when we decided to use PHPCR in the first place. We are seriously considering removing PHPCR and implement stuff on our own, but that won't happen in Sulu 2.0, because that would delay the release too much. So we are probably going to reevaluate our PHPCR decision for the Sulu 3.0 release, and we are not sure about the outcome of that decision yet. But if really decide to drop PHPCR there will of course be some data migrations scripts.

@chirimoya
Copy link

My two cents. Storing hierarchical and unstructured data with drafting, versioning, performance and ease of use in mind isn't that easy to accomplish. If you have a look at all the different solutions they fail in one way or another. So what were the options 4 years ago? Reinventing the wheel once again or using a promising, independent movement like PHPCR. You know the answer ;-)

We learned a lot while using PHPCR in the last years and don't regret the decision. Sure we hit some walls and had some issues, but to be honest, I don't think we would have less headache with another solution.

So in the end of the day software evolves like RDBMS nowaday support unstructured datasets. Maybe PHPCR adapts and evolves or Sulu needs to move forward without.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment