Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Collection Filters - Readme Driven Development

Filter Language for Collections

Why?

You often need subsets of objects in a collection and want to access them efficiently in your domain model. But you certainly don't want to access the EntityManager or any other object manager here to craft a query. FilterExpressions for collections allow to go back to the database and query for all objects matching the crafted expression. Additionally they also work against in meemory ArrayCollection exactly the same. This way you don't (except for the SQL performance when it haunts you ;)) have to think about the context and can focus on your domain logic.

In Doctrine ORM this will be done by building DQL under the hood, in memory it will be done using Collection#filter(Closure $closure);

Technical Requirements:

  1. Should allow filtering depending on the "persistence" backend, i.e. in memory for Arraycollection and using sql for PersistentCollection
  2. Should be very simple to be adoptable in many persistence providers
  3. Are always either accepting "Expr op Expr" xor "Field op value". A new Expression Language is needed for that, cannot reuse the ORM one.
  4. Assumes that for "Field" a getter "getField" exists on target object and that the field is mapped in any corresponding persistence provider.
<?php
class Post
{
/**
* @OneToMany(targetEntity="Comment", mappedBy="post", fetch="EXTRA_LAZY")
* @var ArrayCollection
*/
private $comments;
public function __construct()
{
$this->comments = new ArrayCollection();
}
public function getRecentComments()
{
$expr = new ExpressionBuilder();
return $this->comments->select(
$expr->gt("created", new \DateTime("-7 days"))
);
}
public function getCommentsByAuthor($author)
{
$expr = new ExpressionBuilder();
return $this->comments->select(
$expr->equals("author", $author)
);
}
public function getAllRecentSpamComments()
{
$expr = new ExpressionBuilder();
return $this->comments->select(
$expr->and(
$expr->equals("status", Comment::SPAM),
$expr->gt("created", new \DateTime("-7 days"))
)
);
}
}
class Comment
{
/**
* @ManyToOne(targetEntity="Post", inversedBy="comments")
* @var Post
*/
private $post;
/**
* @ManyToOne(targetEntity="User")
*/
private $author;
/**
* @Column(type="datetime")
* @var DateTime
*/
private $created;
/**
* @Column(type="integer")
* @var integer
*/
private $status = self::PUBLISHED;
}
// Both ArrayCollection and PersistentCollection will implement this.
interface FilteredCollection extends Collection
{
/**
* Match all objects against the given expression return a NEW collection.
*
* @return Collection
*/
public function select(Expression $expr);
}
@webmozart

This comment has been minimized.

Show comment Hide comment
@webmozart

webmozart Feb 8, 2012

Shouldn't select be filter? I agree that expr doesn't belong here.

Shouldn't select be filter? I agree that expr doesn't belong here.

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Feb 8, 2012

@bschussek yes it should be ->filter() but that method is already taken on the interface :-(

Owner

beberlei commented Feb 8, 2012

@bschussek yes it should be ->filter() but that method is already taken on the interface :-(

@webmozart

This comment has been minimized.

Show comment Hide comment
@webmozart

webmozart Feb 8, 2012

I see, on top of that, both methods basically do the same. What a shame. filterByExpr?

I see, on top of that, both methods basically do the same. What a shame. filterByExpr?

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Feb 8, 2012

@bschussek well - The "old" one restricts usage to closures. But maybe you're onto something.

/**
 * If this is an ExpressionObjekt (__invoke()) it may use efficient query means.
 *
 * @var callable $fn
 */
public function filter($fn)

That way we could also extend this to ALL method son the interface.

Owner

beberlei commented Feb 8, 2012

@bschussek well - The "old" one restricts usage to closures. But maybe you're onto something.

/**
 * If this is an ExpressionObjekt (__invoke()) it may use efficient query means.
 *
 * @var callable $fn
 */
public function filter($fn)

That way we could also extend this to ALL method son the interface.

@webmozart

This comment has been minimized.

Show comment Hide comment
@webmozart

webmozart Feb 8, 2012

Yes, maybe this. I think that you should enable the expression behaviour for most of Collection's methods accepting a closure: exists, filter, forAll and partition. map doesn't make sense with expressions.

Yes, maybe this. I think that you should enable the expression behaviour for most of Collection's methods accepting a closure: exists, filter, forAll and partition. map doesn't make sense with expressions.

@webmozart

This comment has been minimized.

Show comment Hide comment
@webmozart

webmozart Feb 8, 2012

I guess this is what you were saying anyway :)

I guess this is what you were saying anyway :)

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Feb 8, 2012

Yes in that case we need to change the interface, but only to remove the Closure typehint. it will still be a BC break.

Owner

beberlei commented Feb 8, 2012

Yes in that case we need to change the interface, but only to remove the Closure typehint. it will still be a BC break.

@webmozart

This comment has been minimized.

Show comment Hide comment
@webmozart

webmozart Feb 8, 2012

Hmm what's worse about the BC break is that it's impossible to write code that supports both the old and the new interface

Hmm what's worse about the BC break is that it's impossible to write code that supports both the old and the new interface

@jmikola

This comment has been minimized.

Show comment Hide comment
@jmikola

jmikola Feb 8, 2012

@beberlei: Wouldn't that ruin the purity of the Collection interface? This expression filtering certainly has no business being supported by ArrayCollection.

I think something like filterByExpr() is more explicit than relaxing the \Closure type-hint from Collection::filter().

jmikola commented Feb 8, 2012

@beberlei: Wouldn't that ruin the purity of the Collection interface? This expression filtering certainly has no business being supported by ArrayCollection.

I think something like filterByExpr() is more explicit than relaxing the \Closure type-hint from Collection::filter().

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Feb 8, 2012

@jmikola no the thing has EXPLICITLY the requirement to be supported by ArrayCollection. It has to work exactly the same in memory or against the database.

Owner

beberlei commented Feb 8, 2012

@jmikola no the thing has EXPLICITLY the requirement to be supported by ArrayCollection. It has to work exactly the same in memory or against the database.

@henrikbjorn

This comment has been minimized.

Show comment Hide comment
@henrikbjorn

henrikbjorn Feb 8, 2012

I like this idea A LOT 👍 And could actually really really use it now :)

I like this idea A LOT 👍 And could actually really really use it now :)

@jmikola

This comment has been minimized.

Show comment Hide comment
@jmikola

jmikola Feb 8, 2012

I just re-read DDC-1637 and realized that purpose. Brilliant! I didn't expect that expressions would be useful outside of a DB-querying context :)

jmikola commented Feb 8, 2012

I just re-read DDC-1637 and realized that purpose. Brilliant! I didn't expect that expressions would be useful outside of a DB-querying context :)

@loicfrering

This comment has been minimized.

Show comment Hide comment
@loicfrering

loicfrering Feb 8, 2012

Great idea! Must say that I'm also +1 on removing the Closure typehint.

Great idea! Must say that I'm also +1 on removing the Closure typehint.

@Ocramius

This comment has been minimized.

Show comment Hide comment
@Ocramius

Ocramius Feb 8, 2012

Would filtering just iterate over passed in expressions as if it were assertions?
Anyway, this looks really cool and makes usage of the ORM much much simpler!

Ocramius commented Feb 8, 2012

Would filtering just iterate over passed in expressions as if it were assertions?
Anyway, this looks really cool and makes usage of the ORM much much simpler!

@henrikbjorn

This comment has been minimized.

Show comment Hide comment
@henrikbjorn

henrikbjorn Feb 8, 2012

@Ocramius filtering today just iterates over stuff and that would be the same for ArrayCollection (i assume) but for PersistenCollections it would build DQL based on the Expression objects and issue that against the database.

@Ocramius filtering today just iterates over stuff and that would be the same for ArrayCollection (i assume) but for PersistenCollections it would build DQL based on the Expression objects and issue that against the database.

@Ocramius

This comment has been minimized.

Show comment Hide comment
@Ocramius

Ocramius Feb 8, 2012

@henrikbjorn: yeah, that was clear, I'm just wondering how this would fit any other non-ORM based project... Collections become more and more interesting :)
Also, would the check use reflection somehow?

Ocramius commented Feb 8, 2012

@henrikbjorn: yeah, that was clear, I'm just wondering how this would fit any other non-ORM based project... Collections become more and more interesting :)
Also, would the check use reflection somehow?

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Feb 8, 2012

The idea was that the filtering uses "get" + $field or ArrayAccess to get to fields, if not throw exceptions. Same for ORM it would check if a persistent field or assocation exists, if not exception. So in your code you have to actually take care of filtering only fields that exist both with getter and as persistent field.

Owner

beberlei commented Feb 8, 2012

The idea was that the filtering uses "get" + $field or ArrayAccess to get to fields, if not throw exceptions. Same for ORM it would check if a persistent field or assocation exists, if not exception. So in your code you have to actually take care of filtering only fields that exist both with getter and as persistent field.

@Ocramius

This comment has been minimized.

Show comment Hide comment
@Ocramius

Ocramius Feb 8, 2012

Hmm, not convinced by it because that's not really what DQL does... But I understand this is not so strictly related with the ORM.
What about sorting? Second parameter? Expressions allowed (if makes sense)?

Ocramius commented Feb 8, 2012

Hmm, not convinced by it because that's not really what DQL does... But I understand this is not so strictly related with the ORM.
What about sorting? Second parameter? Expressions allowed (if makes sense)?

@l3pp4rd

This comment has been minimized.

Show comment Hide comment
@l3pp4rd

l3pp4rd Feb 8, 2012

Well, I personally think this is a playground only, because most of the time I never fetch collections in lazy way. One HTTP request one SQL query, best what can be expected. So regarding this the dql looks something like:

SELECT p, c FROM Entity\Post p
LEFT JOIN p.comments c
WHERE c.createdAt > :timestamp 

Ordering also is done usually in php.

What if such collection definitions would decorate the Post proxy query to join the comments + order them or filter in the defined way

l3pp4rd commented Feb 8, 2012

Well, I personally think this is a playground only, because most of the time I never fetch collections in lazy way. One HTTP request one SQL query, best what can be expected. So regarding this the dql looks something like:

SELECT p, c FROM Entity\Post p
LEFT JOIN p.comments c
WHERE c.createdAt > :timestamp 

Ordering also is done usually in php.

What if such collection definitions would decorate the Post proxy query to join the comments + order them or filter in the defined way

@michelsalib

This comment has been minimized.

Show comment Hide comment
@michelsalib

michelsalib Feb 8, 2012

I just read the title of the Jira issue, which told about linq-like filters. So why not looking into linq to take inspiration about the function naming, and other functionality?

I am proficient in C#, and on of the thing I really miss in PHP is something like Linq. So I am very thrilled about this proposal.

So why, not naming this filter function where? After all, the implied operation is a where.

FYI, the base interface used in linq is here : http://msdn.microsoft.com/en-us/library/system.linq.enumerable.aspx .

Also, I don't really see why this work could not work with the current querybuilder system. EntityFramework is actually using the same interface for in memory collections, and DB collections, which makes it so powerful.

I just read the title of the Jira issue, which told about linq-like filters. So why not looking into linq to take inspiration about the function naming, and other functionality?

I am proficient in C#, and on of the thing I really miss in PHP is something like Linq. So I am very thrilled about this proposal.

So why, not naming this filter function where? After all, the implied operation is a where.

FYI, the base interface used in linq is here : http://msdn.microsoft.com/en-us/library/system.linq.enumerable.aspx .

Also, I don't really see why this work could not work with the current querybuilder system. EntityFramework is actually using the same interface for in memory collections, and DB collections, which makes it so powerful.

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Feb 9, 2012

@l3pp4rd if you fetch your entities this way fine, but you should still use this API as then you can ensure that you can use the collection both filtered and unfiltered in the same request without running into troubles about assumptions what is actually in your collection.

@michelsalib Yes i know this interface, however i don't want to implement LINQ fully. First its implemented on language level, so it allows much more features vs a PHP based approach that is on the library level. Second, linq took ages to implement with a huge team. I want this to be a good mix of powerful vs implementable in a reasonable time-frame. Also it should allow us to support many data-providers, so the actual language has to find a least common denominator.

Owner

beberlei commented Feb 9, 2012

@l3pp4rd if you fetch your entities this way fine, but you should still use this API as then you can ensure that you can use the collection both filtered and unfiltered in the same request without running into troubles about assumptions what is actually in your collection.

@michelsalib Yes i know this interface, however i don't want to implement LINQ fully. First its implemented on language level, so it allows much more features vs a PHP based approach that is on the library level. Second, linq took ages to implement with a huge team. I want this to be a good mix of powerful vs implementable in a reasonable time-frame. Also it should allow us to support many data-providers, so the actual language has to find a least common denominator.

@michelsalib

This comment has been minimized.

Show comment Hide comment
@michelsalib

michelsalib Feb 9, 2012

@beberlei, I see your point and agree. Except for the naming part. Why not naming this function where ?

@beberlei, I see your point and agree. Except for the naming part. Why not naming this function where ?

@webmozart

This comment has been minimized.

Show comment Hide comment
@webmozart

webmozart Feb 9, 2012

@michelsalib: One big reason against where IMO is that there is already a metod filter in the parent interface that has the same purpose.

@michelsalib: One big reason against where IMO is that there is already a metod filter in the parent interface that has the same purpose.

@michelsalib

This comment has been minimized.

Show comment Hide comment
@michelsalib

michelsalib Feb 9, 2012

@bschussek, well seems legit.

@bschussek, well seems legit.

@j

This comment has been minimized.

Show comment Hide comment
@j

j Mar 5, 2012

I completely agree with @l3pp4rd as far as ensuring your backend queries are highly optimized, but this feature would be amazing to have in a lot of cases:

For example, in @l3pp4rd's example, you would have the query:

SELECT p, c, l FROM Entity\Post p
LEFT JOIN p.comments c
LEFT JOIN c.likes l
WHERE c.createdAt > :timestamp 

and in your controller, do:

<?php

    // ...

    $topComments = $this->comments->select(
        $expr->gt('c.likes', 1)
    );

    $comments = $this->comments->select(
        $expr->isNull('c.likes')
    );

Of course, for speed, it makes sense to do the sorting in one go-around... I'm not sure how hard it would be to make it so that it just iterates through the collection only one time and does all the expression matches, but that would be pretty rad if it works this way too.. for example:

<?php

    // ...

    // iterate through all the comments in one go-around
    list($topComments, $comments) = $this->comments->select(array(
        $expr->gte('c.likes', 1),  // get comments with likes
        $expr->isNull('c.likes')  // get comments with no likes
    );

Also, having limits and orderBys, limits, notIns would be even more powerful then having the ability to do::

<?php

    // ...

    // get the top 5 comments
    $topFiveExpr = $expr->gte('c.likes', 1)->limit(5)->orderBy('c.likes ASC');  // get the top 5 comments

    // get the rest of the comments excluding the top 5
    $commentsExpr = $expr->notIn($topFiveExpr);

    list($topFiveComments, $comments) = $this->comments->select(array(
        $topFiveExpr,
        $commentsExpr
    );

I see a pretty awesome twig extension coming out of this too ;P

j commented Mar 5, 2012

I completely agree with @l3pp4rd as far as ensuring your backend queries are highly optimized, but this feature would be amazing to have in a lot of cases:

For example, in @l3pp4rd's example, you would have the query:

SELECT p, c, l FROM Entity\Post p
LEFT JOIN p.comments c
LEFT JOIN c.likes l
WHERE c.createdAt > :timestamp 

and in your controller, do:

<?php

    // ...

    $topComments = $this->comments->select(
        $expr->gt('c.likes', 1)
    );

    $comments = $this->comments->select(
        $expr->isNull('c.likes')
    );

Of course, for speed, it makes sense to do the sorting in one go-around... I'm not sure how hard it would be to make it so that it just iterates through the collection only one time and does all the expression matches, but that would be pretty rad if it works this way too.. for example:

<?php

    // ...

    // iterate through all the comments in one go-around
    list($topComments, $comments) = $this->comments->select(array(
        $expr->gte('c.likes', 1),  // get comments with likes
        $expr->isNull('c.likes')  // get comments with no likes
    );

Also, having limits and orderBys, limits, notIns would be even more powerful then having the ability to do::

<?php

    // ...

    // get the top 5 comments
    $topFiveExpr = $expr->gte('c.likes', 1)->limit(5)->orderBy('c.likes ASC');  // get the top 5 comments

    // get the rest of the comments excluding the top 5
    $commentsExpr = $expr->notIn($topFiveExpr);

    list($topFiveComments, $comments) = $this->comments->select(array(
        $topFiveExpr,
        $commentsExpr
    );

I see a pretty awesome twig extension coming out of this too ;P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment