grkvlt/entityresourcemanager.md

## entityresourcemanager.md

      
    Raw
  

              entityresourcemanager.md
            
          
    Brooklyn Proposal

Managing Entity Resources

Pull request #528 - Make copyResource and copyTemplate usable by all entity drivers - allows all entities to take advantage of features that were previously present in only the TomcatServer and QpidBroker drivers. It deprecates the copyFile(...) methods and adds copyResource(String, String), copyResources(Map) and other convenience resource handling methods. This makes resources specified as URI strings the preferred mechanism for describing file content that is required by entities at install- or run-time.
Entities can now:

Pull HTTP resources at the remote location, rather than downloading them locally and transferring across with sftp
Transfer a collection of local resources at runtime to the entity, at different names and paths remotely
Filter resources before transfer, using FreeMarker, including a collection as above
Treat resource name strings as URIs with support for file: and classpath: protocols
Convert File arguments to file:/// URI resource strings when dealing with resources

However AbstractSoftwareProcessSshDriver (the parent class for most entity drivers) is in danger of becoming a dumping ground for many confusingly similar methods and their overloads. This is not the best place for these methods and this proposal describes a new component for Brooklyn that will manage resources and handle their processing within entities, while also providing some new and useful features.
EntityResourceManager

Manages resources for all Brooklyn entities. Each entity can obtain a reference to the manager from the global management context.
The managed resources can be global or restricted to a single entity or type of entity. They will consist of a set of static source URIs with mappings to paths and filenames on remote entity instances and dynamic URIs that will be resolved at runtime by substituting configuration key and attribute values through placeholders in the source and destination URI strings. These would be configured as a well known ConfigKey of type Map<String, String> (from source URI to destination path) on all entities.
It will also be possible for entities to specify particular transformations that should be applied to the name mappings. When the resources are static, and un-filtered, the resolution from URI can be done remotely, via wget or curl access to an HTTP server, AWS S3 or similar. It will also be possible to resolve the resource locally, filter and publish it to a cloud BLOB storage system like S3 [3] for remote retrieval by entities in the appropriate locations, giving speed and possibly cost savings. Cacheing of content would also be performed by the manager, minimising the amount of network traffic and speeding up entity deployment. This will make use of a repository hierarchy, where local disk is queried first (the .brooklyn/repository directory) then a known mirror site or sites, finally the canonical source.
All entities should have the ability to be configured with additional resources to be copied at runtime, without having to add code specifically to any existing or new entities. Additionally, where files and artifacts are being transferred to remote locations during entity creation, this should always be mediated through the resource manager. When entities have specific requirements, they would load a custom strategy or transformation for their particular context, to enable those features. For example, web application servers would have a particular filename transform for their .war files, which will be the same for all members of a cluster, but the configuration XML would need filtered each time for a particular entity instance and location, before copying.
Example

This shows a JSON representation of two mappings for a Cassandra linked data application. The first set of files contains static files that are copied as-is, while the second group contains a single file that will be processed using FreeMarker before being copied.
{
  "runtime-resources" : {
      "classpath:resources/{location.name}.properties" : "conf/cassandra-rack.properties",
      "file:///home/grkvlt/apps/linked-data/config/passwd.md5" : "conf/passwd",
      "mvn:linked-data-bindings:com.grkvlt:0.1.0-SNAPSHOT" : "lib/custom-bindings.jar",
  },
  "filtered-resources" : {
      "classpath:resources/cassandra.yaml.template" : "conf/cassandra.yaml",
  }
}

The files will be managed as follows:

A rack definition property file, named after the current location and found on the current classpath. This would copy aws-ec2:us-west-1.properties to every Cassandra node in that EC2 region, as conf/cassandra.properties; localhost.properties would be used on localhost, and so on.
A file named passws.md5 on the filesystem is copied to every node.
A Jar file with artifactId linked-data-bindings, groupId of com.grkvlt and with version 0.1.0-SNAPSHOT will be resolved using PAX-URL from the most appropriate Maven repository, and deployed to lib/custom-bindings.jar.
The cassandra.yaml.template FreeMarker template will be loaded from the classpath, processed and copied to conf/cassandra.yaml.

As the mappings all give relative paths, the default behaviour will be to place the remote files in the appropriate runDir directory for each entity. If a different naming transformation function was supplied, this could be changed easily.
Main Features


PAX-URL [1] will be used to resolve source URIs
Filtering will be performed by FreeMarker [4] or other suitable plugins
Mapping of names via entity provided transformer functions
Cacheing of contents at the management plance, before transfer
Remote resolution and transfer of static resources where possible
Cloud aware, using jclouds BlobStore [2]

Classes and Interfaces

This is simply an idea of the names of some of the Java classes that will be written.

EntityResourceManager
ResourceResolver

ClasspathResolver
BrooklynRepositoryResolver
PaxUrlResolver


ResourceCache
ResourceFilter

FreeMarkerFilter
RegularExpressionFilter
SedCommandFilter


ResourceNameTransformer

RunDirPrefix
EntityNameAndVersionPrefix
UniqueSuffix


Download Manager

i like the direction of this
DownloadPropertiesResolver javadoc needs clarification to better describe:
...all.url property is a prefix, with /EntityType/version/filename.tgz appended (the same format and filename(s) as used when the file is downloaded to /tmp/brooklyn/installs/...
...entity.Xxx.url property in contrast is an absolute path (?) [how useful is this? what happens when the files vary quite a lot; or are we maybe better off using a custom resolver class already at this point?]
fallback.url -- again feels like might be YAGNI -- perhaps handle by custom resolver class?
(You may have good use cases for the latter items above; I'm just giving my 2p here.)
Should also say how local repos get populated. E.g. describe that items are copied to /tmp/brooklyn/installs on any target machine; this can be rsync'd (one direction) to any mirror. And running InstallRepositoryFiles.main will populate a local /tmp/brooklyn/installs for exactly that purpose.
Finally I think the defaults should be explained. Are you thinking:

look in file:///~/.brooklyn/repository on the target) machine
look at http://developers.cloudsoftcorp.com/brooklyn/repository/ (cloudfronted URL)
look in original download location (e.g. mysql.net)

all.url is not a prefix; it's the actual URL. You'd do something like:
brooklyn.downloads.all.url=http://repo.acme.com/${simpletype}/${version}/${simpletype}-${version}.${driver.fileSuffix!.tgz}
If we did say that only a single URL / file layout was supported, it would certainly be simpler. An enterprise could always set up such a server.
I like the way the above is not enforcing our file-system structure on an enterprise's existing repos. The file layout on cloudsoftcorp.com and the rsync approach is a good default, but we could support other file layouts by letting people specify their own URLs.
I also like how one could override the download URL of a single entity to point at somewhere different. For example, to download a patched FooServer from a different public URL rather than having the two extremes of either the default or having to host it yourself in the required file layout.
But I need to think about file names more. Currently it works well if there is a single download artifact for an entity. The driver doesn't get to enforce what the name should be - the properties file gets to override it. But if there is more than one file (e.g. for nginx's sticky module or pcre download) then we need something more.
entity.Xxx.url - not sure I follow your comment about absolute path. If you want to override the URL to use for a single entity type, then you use this.
fallback.url I'm imagining we could use this mechanism for pointing at http://downloads.cloudsoftcorp.com/brooklyn/repository/${simpleversion}/${version}/.... We try the public URL first (rather than hammering cloudsoftcorp) and fallback to this if MySql or whoever withdrew deleted their artifact.
But I need to think more about how a "fallback" URL is really applied. If it really is fallback, then DownloadsRegistry should go onto the next resolver to get other URL(s). Just now, it relies on the default properties-resolver also delegating to Attributes.DOWNLOAD_URL with various overrides/additions (which is duplicating what the next resolver does).
Defaults: I was imagining slightly different: cloudsoftcorp and mysql.net are switched round, to do mysql.net first.
For file:///~/.brooklyn/repository, it would be nice to remove that from being hard coded in CommonCommands.downloadUrlAs. The "resolvers" could add that by default.
For more sophisticated (i.e. more complicated) ability of resolvers to contribute download options, perhaps it needs to be a strongly typed thing rather than a List. It would contribute primary URLs and fallback URLs, and say whether to fall-through to let other resolvers contribute other URLs.
By fallback.url mechanism for http://downloads.cloudsoftcorp.com/brooklyn/repository/${simpleversion}/${version}/...., I'm still mulling it over...
I'm thinking that URL would be hard-coded, but could be overridden in the properties file. In the resolver code, it would be using that fallback mechanism
of course, using the template structure for all.url makes sense -- i was being silly there -- and we can use a template var for the filename also. this makes entity-specific URL's useful also. this supports all the use cases except where a driver needs multiple downloads and we need to define custom filenames for each, and that's fringe enough.
it is more efficient to have cloudsoftcorp.com be the primary since it is cloudfronted but that can be a tweak users do themselves in their brooklyn.properties.
so +1
WDYT about having an easy way to download the binaries / populate the download repo?
+1 for easy way to download / populate local repo. Giving an rsync command sounds good. Would we need to run an rsync daemon, given that most people won't have ssh access to the downloads.cloudsoftcorp.com box? Or is there a way for rsync to point at a URL directory?!
Comments

Issue #541 has been created, against Milestone 0.6.0. Development of this feature is planned to start after Brooklyn 0.5.0-M2 is released.
The design is not finalized yet, so any suggestions or architecture ideas are welcome, as are any other capabilities or features that would be useful or appropriate, or any other comments on the proposal.

References

http://team.ops4j.org/wiki/display/paxurl/Pax+URL
http://www.jclouds.org/documentation/userguide/blobstore-guide/
http://aws.amazon.com/s3/
http://freemarker.sourceforge.net/