Skip to content

Instantly share code, notes, and snippets.

@guiwoda
Last active February 25, 2023 21:11
  • Star 53 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save guiwoda/c2cd020d410c24cab98d to your computer and use it in GitHub Desktop.
AR (Eloquent) vs DM (Doctrine) gist

Common pitfalls found in AR / Eloquent

Data-driven modeling

AR focuses on data. Moreover, Eloquent makes this data public. Objects designed around an Eloquent model assume public access to those pieces of data, so encapsulation is harder and cohesion is blurred. Tracking where properties are accessed or modified is harder, even with advanced IDEs, because of magic properties and weak type hinting.

Validation and invariants

In the context of a Laravel project, data gets validated as arrays (most likely user input extracted from the Request) and, later on, added to an Eloquent model through generic methods:

According to its author, "Laravel has no opinion on where or how you do validation." By default, all your models will inherit generic methods to be constructed or updated:

Model::create(array $attributes);
Model::update($id, array $attributes);
$model->fill(array $attributes);
new Model(array $attributes);

This behavior moves the responsibility of validation and protecting invariants outside of the model. Depending on the architecture of the project, this may be a service layer, a command or an http layer (controller, form request). Having this responsibility outside of the object makes for weak objects, whose state can't be trusted to be valid.

Intercepting this behavior is very difficult, as it involves either overriding multiple methods of the Model class, some of which the Model itself assumes to be safe to call (empty constructors, for example).

Performance optimizations

While it is entirely possible to do performance optimizations in the AR pattern, Eloquent lacks work in this area. It has no IdentityMap to prevent hitting the database for the same record, does not handle join queries into joined models (with possible column collisions if done wrong!) and has removed it's small cache implementation since version 5.0.

The only methods that allows for query optimization are the with / load relation eager-loading methods.

Implementing any sort of cache means intercepting internal ORM calls, which by the level of coupling between Model, Query\Builder and Relations, would need to be done at the ORM level. Overriding methods at the Model level cannot accomplish any of this.

Generic API

Eloquent models inherit a large, generic API that assumes all models will need. This can be a problem when working in large teams, because knowledge of how things should be done is passed through convention instead of enforced by code. For example, a Model::all() method call could be a self-destruct button in a rather large table. Preventing calls from an already existing public method is more difficult than preventing the creation of a yet-to-be-added method.

While this is true for Doctrine as well (generic repositories also have self-destruct findAll() methods), Eloquent using inheritance makes this worst: it gets all of these methods closer to the consumer, which is a negative point in the case of unwanted API, and it statically couples to it, so you can't hide it behind an injected dependency.

Common pitfalls found in DM / Doctrine

Object - Relational impedance mismatch

DataMapper assumes that the object's state can be modeled in a relational manner, but most of the times we end up adapting our modeling decisions to this restriction. This topic is older than Doctrine itself, and while DMs have evolved through the years, it's still a very important constraint.

Wikipedia on O-R impedance

Complexity

While database access and usage is no simple task, the mapping layer adds an extra level of complexity to it, one that ActiveRecord explicitly avoids. Reconsitution of objects from the database is dealt by the ORM, and that assumes an internal structure of the mapped objects, which also limits design. Hooking to those processes is possible, but demonstrates how much more complex it is than just overriding a method.

Anemic domain modeling

Anemic Domain Modeling is modeling objects with public setters and getters and no real behavior outside of transporting data around. This incurs in the cost of domain modeling, without the benefits of actually adding behavior related to the domain, as described by Fowler in his bliki.

While this is not intrinsic of the DM pattern, it does have something to do with it. In an empty AR model, behavior is always present: AR gives you database access for all your models. But if you design an Entity that does not know about database and does not have any relevant behavior, then you are arguably worst than with AR.

<?php
namespace App\ActiveRecord;
class PostRepository
{
private $cache;
public function __construct(Cache $cache)
{
// Any set() / get() cache implementation.
$this->cache = $cache;
}
public function find($id)
{
$result = $this->cache->get("posts:$id");
if (! $result) {
$result = Post::find($id);
$this->cache->set("posts:$id", $result);
}
return $result;
}
public function findFromAuthor(User $author)
{
$results = $this->cache->get("posts:author:" . $author->id);
if (! $results) {
$results = Post::fromAuthor($author)->get();
$this->cache->set("posts:author:" . $author->id, $results);
}
return $results;
}
}
/**
* Something as simple as this already has problems:
*
* 1. Cache done in this repository doesn't affect in any way the relationship between User <-> Post.
* This means that $user->posts will still call the database, so architecture has to force the relation
* to be loaded from a Repository, breaking the ActiveRecord pattern.
*
* 2. Invalidation of a single Post has to invalidate all queries, otherwise stale data will still be found
* through the cached query results. This results in very poor cache scenarios and possible cache slams
* that make for a very fragile performance optimization.
*
* 3. Even if posts get cached by id, any other model will still hit database through their relations,
* for example a Comment's posts belongsTo relation. This makes caching harder because it's not on the ORM level.
*/
<?php
namespace App\ActiveRecord\Models;
use Illuminate\Database\Eloquent\Model;
class Post extends Model
{
protected $guarded = [];
public function user()
{
return $this->belongsTo(User::class);
}
public function comments()
{
return $this->hasMany(Comment::class);
}
}
// Q: What does this do?
// A: It models a post record. It has public access to the post data (read and write) and to its relations.
// Q: How am I supposed to use it?
// A: All its API is inherited and used through magic public properties. Database access is modeled through public
// methods such as save(), update(), create(), static find() and by using the query builder. Magic methods called
// scopes can be added to model specific data access or add global restrictions.
// Q: What data does it have?
// A: Check the database table.
// Q: Is there any pre-conditions that I should care about?
// A: All invariants and pre-conditions are delegated to consumers, at least by default. You have to add a magical
// set[prop]Attribute method if you want to protect invariants in this class. Setter methods can be bypassed
// through this magic properties if you don't, so you'd also have to throw exceptions or override the __set
// method if you go that way.
<?php
namespace App\ActiveRecord;
class PostConsumer
{
public function publish(User $user, $title)
{
return Models\Post::create([
'user_id' => $user->id,
'title' => $title,
]);
}
public function find($id)
{
return Models\Post::find($id);
}
public function complexListing()
{
return Models\Post::where('a_database_field', 'a_value')
->where('db_field_2', 'another_value')
->orderBy('db_field_3')
->get();
}
}
// Pros:
// Very easy to use.
// Cons:
// Data structure leaked out.
// Static access makes PostConsumer hard to unit test without database calls.
// Common:
// Both have flexible data access implementations.
// Both strategies are easily tested through integration tests with a real database.
<?php
namespace App\DataMapper\Entities;
class Post
{
private $id;
private $title;
private $author;
private $comments;
public function __construct(User $author, $title)
{
$this->title = $title;
$this->author = $author;
$this->comments = new ArrayCollection();
}
public function getTitle()
{
return $this->title;
}
public function getAuthor()
{
return $this->author;
}
public function addComment(Comment $comment)
{
$this->comments[] = $comment;
}
public function getComments()
{
return $this->comments->getValues();
}
}
// Q: What does this do?
// A: It models a post. It has private access to its data and its relations, and it exposes some of it
// through public methods.
// Q: How am I supposed to use it?
// A: All its API is explicit in the object. It has no inherited behavior.
// Q: What data does it have?
// A: Its data is explicit in its private properties and some of it may be exposed through its public API.
// Q: Is there any pre-conditions that I should care about?
// A: Constructors and mutators are modeled on a per-case basis, so each mutation will be able to enforce its
// invariants. It has no defaults, as it has no inherited code.
<?php
namespace App\DataMapper;
class PostConsumer
{
public function publish(User $user, $title)
{
$post = new Post($user, $title);
$this->entityManager->persist($post);
$this->entityManager->flush();
}
public function find($id)
{
return $this->entityManager->find(Entities\Post::class, $id);
}
public function complexListing()
{
$repo = $this->entityManager->getRepository(Entities\Post::class);
return $repo->findBy([
'aPostObjectField' => 'a_value',
'anotherPostObjectField' => 'another_value',
], 'orderableField');
}
}
// Pros:
// Easy to unit test without database calls (dependency on EM and Repository can be mocked)
// No database structure leaked
// Cons:
// More complex.
// string references to private field names suggest leaks as well.
// Common:
// Both have flexible data access implementations.
// Both strategies are easily tested through integration tests with a real database.
<?php
namespace App\DataMapper;
class PostRepository
{
private $posts;
public function __construct(ObjectRepository $posts)
{
$this->posts = $posts;
}
public function find($id)
{
return $this->posts->find($id);
}
public function findFromAuthor(User $author)
{
return $this->posts->findBy([
'author' => $author
]);
}
}
/**
* I leave this here because Doctrine has, since 2.5, a second level cache implementation that would take
* care of both scenarios using the default Repository implementation.
*/
@uxweb
Copy link

uxweb commented Mar 3, 2016

IMHO AR API feels more natural in this example, that is really valuable for developer happiness :)

@taylorotwell
Copy link

Entirely wrong on multiple points.

1 - Laravel has no opinion on where or how you do validation.
2 - Laravel doesn't require you to "intercept" Eloquent calls to do caching. Use a repository.
3 - The idea that Doctrine is more testable because you don't need to hit the database when testing your repository is true if your project is just ->find() everywhere. But it won't be. And you will have to hit the database.

@guiwoda
Copy link
Author

guiwoda commented Mar 3, 2016

Hi @taylorotwell, glad you point all of that out!
Let me respond to each point:

  1. FormRequest objects suggest validation on the Http layer. Then, validation on construction is impossible without breaking other features, because Model calls new static; in multiple points.
  2. Using a repository would effectively intercept a database request an cache a result. But if relations get accessed after the repository call, those relations will generate queries. And their relations will, and so on. That's why I pointed out about coupling between Model, Query\Builder and Relation objects.
  3. You are right, you will have to hit the database. But Eloquent methods may / can call the database at any time (that's their intended behavior), so unit testing them without hitting a database is harder. Actually, we could rephrase that to "it's easier to write code that not only represents business behavior but also calls on the database". Those pieces of code are harder to unit test in isolation.

@davzie
Copy link

davzie commented Mar 3, 2016

I'm a big fan of both, but DM does prevent me hitting the database. I can make a load of test object scenarios if I wanted and have assertions run on them, even asserting relational data and behaviour and it doesn't have to touch the database because it's all just PoPos. I'm not on about the 5-0, I mean plain old PHP objects.

AR is really nice though for getting stuff done and getting laid. I do believe we just need to pick whatever one makes sense for the project and its potential future roadmap.

Great write up though mate, very informative.

@davzie
Copy link

davzie commented Mar 3, 2016

Getting paid... Not getting laid. I can't even edit that comment on mobile either. I do not use AR to get laid.

@guiwoda
Copy link
Author

guiwoda commented Mar 3, 2016

Hahahaha I can edit it, @davzie, want me to? Or is it better if we leave it there by now? ;-)

@j0an
Copy link

j0an commented Mar 3, 2016

💣

@davzie
Copy link

davzie commented Mar 3, 2016

Let's just leave it and see if anyone kicks up .

@guiwoda
Copy link
Author

guiwoda commented Mar 3, 2016

I've updated the gist with an AR PostRepository with a simple cache implementation, and comments on the problems it brings.

Please @taylorotwell, help me fix this repository so that Eloquent supports caching.

@taylorotwell
Copy link

  1. Wrong and I don't have time to elaborate. But you're still simply wrong on that point entirely and completely, so I would suggest removing it.

@guiwoda
Copy link
Author

guiwoda commented Mar 3, 2016

@taylorotwell Although I disagree with you, and I still want us to discuss your point whenever you have time, I have rephrased the validation argument to state your opinion. Even then, the point on validation in Eloquent (not Laravel) is still valid.

@valorin
Copy link

valorin commented Mar 3, 2016

Validation in Eloquent is easy, take a look at: https://laravel.com/docs/5.2/eloquent-mutators#accessors-and-mutators

My favourite method is to define a mutator (set*Attribute()) that either typehints the parameter, or throws an exception if it's invalid.

@nilportugues
Copy link

Caching
Caching strategy here is terrible. Your model should not do caching and have a single responsability. Make a wrapper class doing the caching or delegating to the model if missed the cache.

Testing
Hitting the database? Meh. Testing? Both are testable using an inmemory database using mysql, sqlite... or a real database for the test environment.

Performance
If we're actually tackling performance, Doctrine uses more memory, and has an issue... cyclic references. Go serialize some piece of large data with something like JMS Serializer or Sf2 (until it got patch this past winter) and enjoy fatal error or htting the memory limit.

Leakage
Eloquent is not perfect either, it leaks the domain. But PoPo's can be used, not one is actually forcing you to pass Eloquent Models to the final controller anyways... it's just your choice as a developer to do so or not. Passing around doctrine collections doesn't sound great either, even if it's available as a separate library. It's a leakage to me.

My personal overall opinion
I might be wrong, you may not share my opinion, but here it goes.

Differrent approaches, for the same output. Go use the repository pattern, use eloquent or doctrine in their sql or mongodb versions. Who cares. Community should cut the drama.

Your business logic should not rely on eloquent or doctrine, but on the usage you do of the repository's DATA, not its structure or representation.

In the end, if your application drags, most likely it's more an issue of a set of bad decisions and lack of abstractions allowing to pivot from one to the other if required, or even leveraging using both.

@codermarcel
Copy link

@nilportugues

Can you show us a caching example for laravel? I'm interested how you solve the relationship problem.

@acasar
Copy link

acasar commented Mar 4, 2016

@guiwoda Eloquent only calls new static when hydrating models. If you wish to ensure that every new Post is constructed with $author and $title then you need to use named constructor:

class Post extends Eloquent
{
    private function __construct($attributes = [])
    {
        parent::__construct($attributes);
    }

    public static function createAs(User $author, $title)
    {
        return new static(compact('author', 'title'));
    }
}

By making constructor private you are no longer able to call new Post. Instead you are forced to call Post::createAs($user, $title); every time you need a new instance of Post.

@guiwoda
Copy link
Author

guiwoda commented Mar 4, 2016

@acasar that doesn't block Post::create($attributes) nor any findOr*($arrtributes) method, nor the newInstance($attributes) method.
Your Post can still be created without your invariants being protected.

@guiwoda
Copy link
Author

guiwoda commented Mar 4, 2016

@valorin that's not all there is to validation. As @acasar intended, you have to ensure that an object in memory is always in a valid state, and without preventing invalid construction, you just can't do that. Setter interception is a nice feature to have, but I prefer no setters at all, at least until I find the use case for that entity to be modified.

@guiwoda
Copy link
Author

guiwoda commented Mar 4, 2016

@nilportugues
Caching: Not a single model here is doing any caching. The caching implementation showed here is a Repository, and I didn't add a level of indirection to a "database repository" just to keep the example simple.

Testing: I understand that some people are more extreme than others on unit vs integration testing. I will compromise with both: Given a very, very fast DB engine, if you can keep your complete test suite running on a manageable time, then I wouldn't mind hitting the database once in a few tests.

Now, the definition of manageable may vary: For me, a test suite should be fully executed faster than what it takes my brain to lose focus on the task at hand. That is probably somewhere below 30 seconds, ideally below 5 seconds. Suites that take more than 10 seconds will make me run only the current test while testing, then the whole suite when finished. Suites that take longer than that will probably be skipped for "small changes", checked only on CI instead.

Performance: Do you have a reproducible way to verify Doctrine vs Eloquent memory usage? I'd love to see that.

Serialization has nothing to do with performance. If your serializer of choice doesn't deal with cyclic references, PR the fix.

Leakage: I've tried my best to limit this to the Model / Entity and its direct consumer. In different architectures the consumer will take a different role, and that's ok, it's up to the application to understand how much indirection it needs. In both Eloquent and Doctrine you can either use from Controllers or use from a Command handler, App service layer, repository, etc.

If you need to model both a Model class and a Model PoPo, then you're going into a lot of trouble to deal with the tool. I'd rather choose a tool that doesn't force me to duplicate all my models when I don't want that sort of leakage. Which is why I firmly agree with what was said in the Laravel Podcast: if you're using Eloquent, embrace it. Don't try to go around it with that kind of practices, you'll end up doing the same method call, but with layers and layers of indirection.

And you were the first one here to bring "the drama" up. This gist is not drama, I'm trying to share something useful to everyone, either Laravel adopters that are thinking about using Doctrine, or Doctrine users that are starting to work in Laravel. Don't make this a drama because we have different opinions.

@nilportugues
Copy link

@guiwoda "the drama" is just because it's a recurring topic. Not to be rude...not my intention at all. I said "drama" because it will just pop up every 3 months here and there. As developers we should look at the big picture, and abstract as much as possible. Technologies are replacable and will change. Business rules should remain unaffected. Know the trade offs, embrace both.

About Doctrine, well, just look at Doctrine blog posts or the overall architecture, if having 3 (if I recall correctly: proxies, registry and the mapping itself) caching layers and proxies isn't an overkill for a PHP application that works as a "share nothing" ...

Just try running it without cache and run a heavy query. And yeah, you'll get improvements if you do not hydrate the data, but still. They even reinvented SQL doing DQL. Doctrine is great, and does a lot of abstractions, implements eager and lazy loading. Don't get me wrong.

My main concern in the AR vs DM ongoing topic is, all the abstractions are decided by a library, not by the developer, and this is usually bad, and ends up leading most people doing things around the ORM. And this goes for both Doctrine and Eloquent in this case.

@guiwoda
Copy link
Author

guiwoda commented Mar 12, 2016

@nilportugues You should have led with that! That I can agree with. Except for the Doctrine part, which I think should be measured. I've had first hand experience with Doctrine's cache layers being very useful for some PHP applications, where you want to make each apache worker / fpm thread as small and cpu light as possible. Again, all of this is better with data.

@ehongyu
Copy link

ehongyu commented Aug 4, 2016

Thanks for the good code samples, which speak louder than just words. The simple code of the Doctrine caching layer indeed looks more elegant.

@hallboav
Copy link

Eloquent < Doctrine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment