public
Created

Collection Filters - Readme Driven Development

  • Download Gist
README.md
Markdown

Filter Language for Collections

Why?

You often need subsets of objects in a collection and want to access them efficiently in your domain model. But you certainly don't want to access the EntityManager or any other object manager here to craft a query. FilterExpressions for collections allow to go back to the database and query for all objects matching the crafted expression. Additionally they also work against in meemory ArrayCollection exactly the same. This way you don't (except for the SQL performance when it haunts you ;)) have to think about the context and can focus on your domain logic.

In Doctrine ORM this will be done by building DQL under the hood, in memory it will be done using Collection#filter(Closure $closure);

Technical Requirements:

  1. Should allow filtering depending on the "persistence" backend, i.e. in memory for Arraycollection and using sql for PersistentCollection
  2. Should be very simple to be adoptable in many persistence providers
  3. Are always either accepting "Expr op Expr" xor "Field op value". A new Expression Language is needed for that, cannot reuse the ORM one.
  4. Assumes that for "Field" a getter "getField" exists on target object and that the field is mapped in any corresponding persistence provider.
example.php
PHP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
<?php
 
class Post
{
/**
* @OneToMany(targetEntity="Comment", mappedBy="post", fetch="EXTRA_LAZY")
* @var ArrayCollection
*/
private $comments;
public function __construct()
{
$this->comments = new ArrayCollection();
}
public function getRecentComments()
{
$expr = new ExpressionBuilder();
return $this->comments->select(
$expr->gt("created", new \DateTime("-7 days"))
);
}
public function getCommentsByAuthor($author)
{
$expr = new ExpressionBuilder();
return $this->comments->select(
$expr->equals("author", $author)
);
}
public function getAllRecentSpamComments()
{
$expr = new ExpressionBuilder();
return $this->comments->select(
$expr->and(
$expr->equals("status", Comment::SPAM),
$expr->gt("created", new \DateTime("-7 days"))
)
);
}
}
 
class Comment
{
/**
* @ManyToOne(targetEntity="Post", inversedBy="comments")
* @var Post
*/
private $post;
/**
* @ManyToOne(targetEntity="User")
*/
private $author;
/**
* @Column(type="datetime")
* @var DateTime
*/
private $created;
/**
* @Column(type="integer")
* @var integer
*/
private $status = self::PUBLISHED;
}
 
// Both ArrayCollection and PersistentCollection will implement this.
interface FilteredCollection extends Collection
{
/**
* Match all objects against the given expression return a NEW collection.
*
* @return Collection
*/
public function select(Expression $expr);
}

Shouldn't select be filter? I agree that expr doesn't belong here.

@bschussek yes it should be ->filter() but that method is already taken on the interface :-(

I see, on top of that, both methods basically do the same. What a shame. filterByExpr?

@bschussek well - The "old" one restricts usage to closures. But maybe you're onto something.

/**
 * If this is an ExpressionObjekt (__invoke()) it may use efficient query means.
 *
 * @var callable $fn
 */
public function filter($fn)

That way we could also extend this to ALL method son the interface.

Yes, maybe this. I think that you should enable the expression behaviour for most of Collection's methods accepting a closure: exists, filter, forAll and partition. map doesn't make sense with expressions.

I guess this is what you were saying anyway :)

Yes in that case we need to change the interface, but only to remove the Closure typehint. it will still be a BC break.

Hmm what's worse about the BC break is that it's impossible to write code that supports both the old and the new interface

@beberlei: Wouldn't that ruin the purity of the Collection interface? This expression filtering certainly has no business being supported by ArrayCollection.

I think something like filterByExpr() is more explicit than relaxing the \Closure type-hint from Collection::filter().

@jmikola no the thing has EXPLICITLY the requirement to be supported by ArrayCollection. It has to work exactly the same in memory or against the database.

I like this idea A LOT :+1: And could actually really really use it now :)

I just re-read DDC-1637 and realized that purpose. Brilliant! I didn't expect that expressions would be useful outside of a DB-querying context :)

Great idea! Must say that I'm also +1 on removing the Closure typehint.

Would filtering just iterate over passed in expressions as if it were assertions?
Anyway, this looks really cool and makes usage of the ORM much much simpler!

@Ocramius filtering today just iterates over stuff and that would be the same for ArrayCollection (i assume) but for PersistenCollections it would build DQL based on the Expression objects and issue that against the database.

@henrikbjorn: yeah, that was clear, I'm just wondering how this would fit any other non-ORM based project... Collections become more and more interesting :)
Also, would the check use reflection somehow?

The idea was that the filtering uses "get" + $field or ArrayAccess to get to fields, if not throw exceptions. Same for ORM it would check if a persistent field or assocation exists, if not exception. So in your code you have to actually take care of filtering only fields that exist both with getter and as persistent field.

Hmm, not convinced by it because that's not really what DQL does... But I understand this is not so strictly related with the ORM.
What about sorting? Second parameter? Expressions allowed (if makes sense)?

Well, I personally think this is a playground only, because most of the time I never fetch collections in lazy way. One HTTP request one SQL query, best what can be expected. So regarding this the dql looks something like:

SELECT p, c FROM Entity\Post p
LEFT JOIN p.comments c
WHERE c.createdAt > :timestamp 

Ordering also is done usually in php.

What if such collection definitions would decorate the Post proxy query to join the comments + order them or filter in the defined way

I just read the title of the Jira issue, which told about linq-like filters. So why not looking into linq to take inspiration about the function naming, and other functionality?

I am proficient in C#, and on of the thing I really miss in PHP is something like Linq. So I am very thrilled about this proposal.

So why, not naming this filter function where? After all, the implied operation is a where.

FYI, the base interface used in linq is here : http://msdn.microsoft.com/en-us/library/system.linq.enumerable.aspx .

Also, I don't really see why this work could not work with the current querybuilder system. EntityFramework is actually using the same interface for in memory collections, and DB collections, which makes it so powerful.

@l3pp4rd if you fetch your entities this way fine, but you should still use this API as then you can ensure that you can use the collection both filtered and unfiltered in the same request without running into troubles about assumptions what is actually in your collection.

@michelsalib Yes i know this interface, however i don't want to implement LINQ fully. First its implemented on language level, so it allows much more features vs a PHP based approach that is on the library level. Second, linq took ages to implement with a huge team. I want this to be a good mix of powerful vs implementable in a reasonable time-frame. Also it should allow us to support many data-providers, so the actual language has to find a least common denominator.

@beberlei, I see your point and agree. Except for the naming part. Why not naming this function where ?

@michelsalib: One big reason against where IMO is that there is already a metod filter in the parent interface that has the same purpose.

@bschussek, well seems legit.

I completely agree with @l3pp4rd as far as ensuring your backend queries are highly optimized, but this feature would be amazing to have in a lot of cases:

For example, in @l3pp4rd's example, you would have the query:

SELECT p, c, l FROM Entity\Post p
LEFT JOIN p.comments c
LEFT JOIN c.likes l
WHERE c.createdAt > :timestamp 

and in your controller, do:

<?php

    // ...

    $topComments = $this->comments->select(
        $expr->gt('c.likes', 1)
    );

    $comments = $this->comments->select(
        $expr->isNull('c.likes')
    );

Of course, for speed, it makes sense to do the sorting in one go-around... I'm not sure how hard it would be to make it so that it just iterates through the collection only one time and does all the expression matches, but that would be pretty rad if it works this way too.. for example:

<?php

    // ...

    // iterate through all the comments in one go-around
    list($topComments, $comments) = $this->comments->select(array(
        $expr->gte('c.likes', 1),  // get comments with likes
        $expr->isNull('c.likes')  // get comments with no likes
    );

Also, having limits and orderBys, limits, notIns would be even more powerful then having the ability to do::

<?php

    // ...

    // get the top 5 comments
    $topFiveExpr = $expr->gte('c.likes', 1)->limit(5)->orderBy('c.likes ASC');  // get the top 5 comments

    // get the rest of the comments excluding the top 5
    $commentsExpr = $expr->notIn($topFiveExpr);

    list($topFiveComments, $comments) = $this->comments->select(array(
        $topFiveExpr,
        $commentsExpr
    );

I see a pretty awesome twig extension coming out of this too ;P

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.