public
Created

YUI Attribute Value Filters proposal

  • Download Gist
gistfile1.md
Markdown

Attribute Value Filters

With more and more application logic moving to the client, and with YUI becoming more popular on the server, it's increasingly important to design APIs that handle user input safely. Currently, YUI modules that store user input in attributes must do one of two things: either escape user strings before setting an attribute, or escape them manually before using them.

Escaping automatically before storing the value is safest, but also inconvenient if you sometimes need the unescaped value, since you must then store two versions (probably in two different attributes). This can lead to API clutter and confusion. Escaping manually before use avoids API clutter but increases the likelihood of mistakes, and also clutters up the codebase in general. It significantly increases the chances that another developer who is unaware of the need to escape the value will inadvertently introduce a security vulnerability.

Proposal

Attribute should provide a consistent, pluggable API for retrieving a filtered string value. Internally, string attributes would be stored as raw strings, and would be filtered on demand using the specified filter when the attribute is retrieved. This avoids the need for module developers to write custom getter functions or store both filtered and unfiltered values, and allows for flexible and safe usage in a variety of scenarios.

Filters should be pluggable via a static API on Y.Attribute. This will allow the escape module to provide a set of default filters, and custom modules can provide their own filters to meet custom needs.

Getting and Setting Values

Setting a value continues to work the same as it does today:

klass.set('username', '<b>joe</b>'); // stores "<b>joe</b>" as the raw attr value

Getting a value also works the same:

klass.get('username'); // => "<b>joe</b>"

To get a filtered version of a value, specify the name of the desired filter as the second argument to get():

klass.get('username', 'html'); // => "&lt;b&gt;joe&lt;/b&gt;"
klass.get('username', 'url');  // => "%3Cb%3Ejoe%3C%2Fb%3E"

Adding a Filter

Filters are registered statically on Y.Attribute. Once registered, a filter is available for use on any class instance that uses Attribute, even if it was instantiated before the filter was registered.

To register a filter:

// Registers a new "html" filter unless one already exists.
Y.Attribute.addFilter('html', Y.Escape.html);

// Arbitrary filter.
Y.Attribute.addFilter('disemvowel', function (value) {
    return value.replace(/[aeiou]+/g, '');
});

Internally, Attribute should store the filter function in a static object hash, with the name as the key.

If addFilter() is called with the name of a filter that already exists, it should log an error and refuse to overwrite the existing filter.

Removing a Filter

To remove a previously added filter:

// Removes the "html" filter if it exists.
Y.Attribute.removeFilter('html');

When removeFilter() is called with the name of a filter that doesn't exist, it should simply do nothing.

Default Filters

A new attribute config property named filter would allow module developers to specify a default filter to be used for an attribute. For example, I could define an attribute that should always be filtered as HTML by default:

// ...
ATTRS: {
    username: {
        filter: 'html'
    }
}
// ...

This would cause get('username') to run the "html" filter. I could still specify another filter if desired, or get('username', 'raw') to get the raw, unfiltered value.

Caching

To improve performance, Attribute could cache filtered values internally, clearing the cached value whenever an attribute's raw value is updated. There may be dragons here.

get() Logic

Internally, get() or its underyling implementation should take the following steps:

  1. Let attrName be the value of the first argument to get().

  2. Let filterName be the value of the second argument to get().

  3. If attrName refers to a nonexistent attribute, return undefined.

  4. Let rawValue be the raw value of the attribute or sub-attribute named by attrName, after passing through the attribute's getter function if one is set.

  5. If filterName is undefined or null, then

    1. If a default filter has been configured for the attribute, then

      1. Let filterName be the name of the attribute's default filter.

      2. If filterName refers to a nonexistant filter, return undefined.

      3. Execute the default filter function, passing rawValue as the only argument.

      4. Return the filter function's return value.

    2. Otherwise, return rawValue.

  6. Otherwise:

    1. If filterName equals "raw", return rawValue.

    2. Otherwise, if filterName refers to a nonexistent filter, return undefined.

    3. Execute the filter function, passing rawValue as the only argument.

    4. Return the filter function's return value.

My initial impression is that this is somewhat similar to getter in ATTRS, but with the option of having the raw value, which is good :)
I don't have any problem with the functionality listed. It seems like a straightforward way to abstract away repeated uses of getters which should be generic anyway.

The get() implementation mentions (1.f) that the value would come from calling .toString(). I don't think that this is a fair assumption, even though it would cover most cases.
The filter idea could be more flexible using the raw value regardless of its type. Eg. I could abuse your filter to retrieve date parts if the filter received a date object from the model, maybe by constructing a date part filter to grab parts of a date object. I could call klass.get('date:day'); or something similar and receive only the day part.

The other question that comes from comparing filter to a genericised version of a getter, is whether this design fits into a setter equivalent (yeah, this is going out of scope now :)). The model has the ability to parse itself, but not at the individual value level. genericised versions of value parsers could also be useful (not at model.load, but for individual .sets, due to too much overhead generated by filtering a whole modellist). Again, a date example: klass.set('date:iso9660', d) for parsing or klass.set('date:day', 2) for an example of modifying part of a value. This kind of filter could take logic out of the application and into a basic reusable component.

@mosen: Thanks for the feedback!

My initial impression is that this is somewhat similar to getter in ATTRS, but with the option of having the raw value, which is good :)

They're similar in that getters and filters do a similar thing, but the difference is that an attribute can only have a single getter, and it must be predefined. A filter, once registered, can be applied to any attribute of any class at the time of use, even if that attribute already has a getter. What's more, you can apply different filters to the same attribute in different circumstances -- for example, in one case you may be inserting the value into HTML, and in another you may be using it in a URL, which require different kinds of escaping.

The filter idea could be more flexible using the raw value regardless of its type. Eg. I could abuse your filter to retrieve date parts if the filter received a date object from the model, maybe by constructing a date part filter to grab parts of a date object. I could call klass.get('date:day'); or something similar and receive only the day part.

That's true; the explicit toString() conversion does prevent uses like this, but it also serves to prevent unintentional (and possibly unsafe? maybe that's being too paranoid) side effects of applying a filter that expects a string to a value that isn't a string. I'd be open to removing the explicit toString() call if other people agree that allowing filters to accept non-string values is desirable.

The other question that comes from comparing filter to a genericised version of a getter, is whether this design fits into a setter equivalent (yeah, this is going out of scope now :)).

Definitely out of scope for this proposal, but that's not a bad idea. Once we have a solid design for filtering retrieved values, maybe we can look into how to adapt that to provide nifty functionality for setting values as well.

An initial thought (which as it turns out is inline with mosen's original reaction):

Can we get 80% of the value without most of the engineering and new attribute concepts?

It seems like if we added a getRawVal() method to attribute (which is exists already, but is just private), it seems like you can cover the 80% use case with this:

{
    a: {
        value : '<b>Foo</b>', 
        getter : Y.Escape.html
    }
}

I understand that this won't give us the chained value of filters on top of getters, or the ability to dynamically filter output (this part is just sugar), but what I'm saying is, would the above, or maybe:

a: {
    value: ...,
    type : 'html' // implies a default getter impl.
}

Serve the 80% use case, without adding as much complexity or adding new attribute concepts (raw, filters, setters, getters) into the attribute retrieval flow.

If we feel like we really need the remaining 20% (assuming we agree it's 20%) the proposal looks good in general to kick things off.

@sdesai:

If we feel like we really need the remaining 20% (assuming we agree it's 20%) the proposal looks good in general to kick things off.

I feel like those numbers should be reversed. A getter that always escapes as HTML and a getRawVal() method that always returns the raw value give us 20% of what we need. But the remaining 80% includes the most valuable feature: the ability to easily choose an appropriate filter based on context, without needing to do explicit escaping or filtering in implementation code.

Having a value automatically escaped as HTML in all cases only makes that value safe in the context of HTML. But the reality is that attribute values are used in a variety of contexts. Sometimes they're used in HTML. Sometimes they're used as plain text. Sometimes they're used in URLs. Sometimes they're used to build regular expressions. Each of these contexts requires different escaping or filtering logic in order to be safe, and we should provide an easy and extensible way to do that.

The design of the filtering API is also intended to improve safety by encouraging safe programming patterns. Are you inserting a value into innerHTML? Use get('foo:html'). Are you appending a value to a URL? Use get('foo:url'). If you see the filter suffix, you know the value is being used safely. If you don't see the suffix, that should indicate potentially unsafe code. Over time, this pattern will become ingrained and YUI developers will find it easy and natural to be safe by default, and will hopefully also find it easy to spot mistakes that make code unsafe.

Fair enough. I agree with the value of ingraining patterns over time.

SIDE NOTE: "|" (pipe) would be a reasonably geeky replacement for ":", but it has event connotations [ we use it for category prefixes ].

I had the same thought as @mosen about non-string attributes, but I think for the initial cut of this feature it would be fine to stick with this restriction since it would be the most common use case.

@rgrove I agree with this:

To improve performance, Attribute could cache filtered values internally, clearing the cached value whenever an attribute's raw value is updated. There may be dragons here.

The, filters-only-work-with-strings restriction would help to alleviate stale objects in the cache passed around by reference.

I think I like @sdesai's idea of using the pipe char '|' as the separator. This looks nice as long as people don't think of it as a Bitwise OR operation :)

Y.one('.name').set('text', user.get('name|html'));

@sdesai, @ericf: Yeah, I avoided the pipe character because of the potential for confusion with bitwise OR and event categories as you mentioned. It also looks similar to an uppercase letter "I" in some fonts.

I prefer : because it's visually distinct and doesn't carry much baggage with it, so it stands out more as being specific to this purpose. I don't feel strongly about it, but I do worry about the potential for | to confuse people.

I haven't read the full proposal (that's scheduled for tomorrow afternoon), but my initial thoughts, which could be in line with previous comments, are:

  • we should avoid more string parsing if possible
  • if string parsing, having the filter be after the name is reasonable, but the opposite to event categories and event prefixes.
  • How about thing.getAs('attr', 'html'), reserving get(attr) for existing functionality, or even routing through getAs with a default filter that does the current core behavior.

Now's the part where I reserve the right to completely reneg on this after I've actually read it all and have formed (hopefully) educated opinions.

Ok, I've read through the proposal and the comments.

What it seems to me you are describing is similar to what exists today in DataTable as column formatters. In fact, it's probably more accurate to s/filter/formatter/g because "filter" suggests the removal of some part of the thing rather than reformatting it.

Registered formatters were in YUI 2, and are one of the todo items for 3.6.0, allowing

var table = new Y.DataTable({
    columns: [{ key: 'price', formatter: 'currency' }, ... ]
    ...
});

I think the idea is good, though it might complicate DT's use of formatters.

That said, I am very uncomfortable with the string parsing approach. Attribute is upstream of a lot of code, including most customer class implementations. Adding more work to get() will slow the library and implementation code down across the board. And, more recently relevant to DataTable, the new delimiter will decrease the usable characters in attribute names. This might break existing code. From 3.4.1 to 3.5.0, the move from Record to Model breaks implementations that use periods in their column keys because the Model instances think they are trying to retrieve subattributes. I don't want this to happen again.

Related aside: the implementation steps for get() are absent consideration of subattributes.

This also seems related to the notion of attribute types. This has come up a number of times in the past, and I am certainly in favor of it. The premise being that attributes can be configured with a type (sibling of value, setter, getter, etc), and other config settings would be applied based on that type.

The parallel for mosen's suggested setter filter is DataSchema's parsers, or the type config hinted above. Those are both static configurations as opposed to per-operation configurations which is the focus of this proposal.

So, I'm sticking with my original suggestion for getAs(name, formatter), though I think get(name, formatter) will be sufficient, since AttributeCore's implementation of get() only passes through one argument to _getAttr() anyway.

To be clear, I recognize that the proposal is per-operation functionality, and that differs from column configuration and the ability to configure attribute types. The similarity is in registered name => common formatting function. I like that pattern, so I like the proposal, save the string parsing.

@lsmith:

What it seems to me you are describing is similar to what exists today in DataTable as column formatters. In fact, it's probably more accurate to s/filter/formatter/g because "filter" suggests the removal of some part of the thing rather than reformatting it.

You're right that filters and formatters are similar, but I'm wary of diluting the intent of this feature by targeting too broad a use case or referring to it as a "formatter". Filtering -- specifically, filtering for purposes of security -- is the primary use case, and I think the name is accurate: "filter this value through this function". The function may remove something, it may escape something, or it may just make something pretty, but the point is it acts as a filter between the raw value and the value you get back. It's a bonus that it can be used as a general-purpose attribute formatting mechanism, but that's not the focus, and general formatting isn't something that I think warrants support in Attribute's core. Filtering is.

I understand your concerns with the string parsing, and I'm open to changing that. What I like about the get('foo:html') or get('foo|html') API is that it makes the filter conceptually explicit and seems (to me anyway) to stand out more than get('foo', 'html'). This is definitely subjective though, so if you say this API doesn't do it for you, I believe you. Given that the string parsing here would consist of a simple regex match I don't think it would have a measurable perf impact compared to all the other work that goes on when retrieving an attr value, but if string parsing is a blocker, I'm willing to concede.

I don't like getAs(), though. As you may have noticed, I'm pretty intent on keeping the conceptual overhead of this API as low as possible. I don't want people to have to use a new method, or even a method signature that's too far off from what they're used to. I think filters need to be baked into get() and they need to become so ubiquitous that a get() call without a filter will look weird to people. I'd be okay with get(name, filterName).

Using get() rather than a new method is also pretty important for the case where an attribute is defined with a default filter.

I think the idea is good, though it might complicate DT's use of formatters.

My goals are pretty different from DataTable's goals. I want to avoid stepping on DT's toes for sure, but as I said above, formatting is not the focus here.

This also seems related to the notion of attribute types. This has come up a number of times in the past, and I am certainly in favor of it. The premise being that attributes can be configured with a type (sibling of value, setter, getter, etc), and other config settings would be applied based on that type.

Not quite. Context is very important for filtering. A static type won't suffice here, because regardless of what that type is, its value will need different filtering depending on the context in which it's used. HTML escaping isn't sufficient for URLs, URL escaping isn't sufficient for HTML, etc. Types are certainly useful for formatters of the sort that DataTable uses, but not for context-sensitive user input filtering.

I've specified that it should be possible to define a default filter for an attribute, but that's just a convenience, and doesn't obviate the need to filter appropriately for the context in which a value is being used.

In the get() logic section 5.1.2 I would have thought returning undefined when the default filter is non-existent, rather than rawValue, would have unexpected consequences for developers. Otherwise this all looks good.

@jinsley: Returning the rawValue there when it's expected to be filtered could have even worse consequences, which is why we return undefined if the specified filter isn't found. It might also make sense to log an error in this case.

@rgrove: I think a non-silent failure would be good - logging an error whilst returning undefined would do that.

Hey guys,
I like the avoidance of string parsing for the filter for a couple of reasons, one of which Luke mentioned, which is adding even more overhead into everything that uses Attribute, but the other is related to a request:

What about having the filter param be either a function or a string name. If it's registered as part of addFilter, you can pass it's name. If it's a function, just use that to filter the value.

The upside of this I see is that there are many cases where I may have an existing filtering method, but I don't want to register it as a filter, or I have one that's slightly different from a registered one.

@natecavanaugh:

What about having the filter param be either a function or a string name. If it's registered as part of addFilter, you can pass it's name. If it's a function, just use that to filter the value.

Seems reasonable.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.