Skip to content

Instantly share code, notes, and snippets.

@maxov
Last active August 29, 2015 14:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maxov/45d92bec946b1275e76d to your computer and use it in GitHub Desktop.
Save maxov/45d92bec946b1275e76d to your computer and use it in GitHub Desktop.

Data Merging

Data merging is the process by which new data as set by a plugin is put back into someplace where it carries more meaning, like an entity or block.

The current method

The Sponge API has had DataPriority for a while. It essentially dictates how data from a copier and copiee (?) should be merged, applying specifically to the case of filling data in a mainpulator or offering it back to the containing holder.

It has four modes of operation:

  • DATA_HOLDER: All values in the data holder should be retained.
  • DATA_MANIPULATOR: All values in the data manipulator should be retained.
  • PRE_MERGE: Values from the manipulator should be inserted before values in the holder
  • POST_MERGE: Values from the mainpulator should be insterted after values in the mainpulator

While the DataPrority enum is useful for choosing more specifically how data should be treated during merges, it suffers from a few problems:

  • The terms "before" and "after" are specified by example in the javadocs in DataPriority do give some idea of what the function does, but it is not strictly defined for all collection-like data. The developer does not know and cannot control the behavior of collection data in a merge beyond "append after" or "append before".
  • DataPriority exposes several useful modes, but that is as much as it exposes. The developer has no control of possibly giving more granular behavior to specific data values beyond creating their own DataManipulator implementations. Even then the values in these manipulator implementations are, by definition, different from those in the existing manipulators in the API because they are defined as getter/setter methods on different types.
  • We can only assume that manipulators and holders contain data in some way and cannot say more without restricting the type further for more information. This is not only restrictive from the plugin developer's point of view, it also means that there is no generic way to handle data merging, leading to a lot of extra implementation. This could be done by some form of object-based polymorphism (as is currently done) or external if/else chains, but it is still required by virtue of how manipulators of defined.

These problems are symptomatic of the fact that aggregations of data within manipulators or holders is not well defined. The notion of "these objects store data" is there, but no one can know what data that is except in cases where the concrete types of manipulators is known. Even if that is known, the implementation for data merges must happen in a case-by-case method instead of some method that is generic and works for all possible data manipulator types.

By introducing the notion of Values into the API, we are able to solve these problems and actually gain more control during data merges. Let's take a look at how that happens.

A quick review of the current Value API

The Value API is able to solve the above problems and a few more with the current Data API by exposing data as essentially key-value maps from "values" to their actual set value within a container of data. This concept provides the genericity required by data merges and other, more complicated methods of handling data. As a nice side effect, most possible ways to work with data actually become simpler since the developer no longer deals with aggregations of data as the primitive form of manipulation, but the data itself.

At the base level, we can strip out manipulators in their entirety and replace them with a set of methods on data holders (the following list is simplified for the sake of example):

public interface MyContainerOfData {
  <V> void set(Value<V> value, V valueToSet);
  <V> V get(Value<V> value);
  boolean supports(Value<V> value);
  boolean isSet(Value<V> value);
}

The get and set methods should be self-explanatory. The supports method checks for whether a given value is conceptually supported within the container. For example, it makes no sense to set health on a block, but it does make sense to set health on a living entity. The isSet method checks for whether a given value is not only supported, but actually set within the container. It is possible for a container to support a certain value but not care about what that value actually is.

The simplified MyContainerOfData interface provided here is simply a container of data by example. Similar concepts have gone by several names, including DataObject or DataHolder.

The only thing missing is the definition of values themselves, which we provide a in another catalog:

public class Values {
  // ...
  public static final Value<Integer> HEALTH = /* ... */;
  public static final Value<Integer> MAX_HEALTH = /* ... */;
  // ...
}

In this example, HEALTH and MAX_HEALTH are two values that are related to an entity's possible health-related values. They do two different things and have two different outcomes when set.

However, one thing conspicously missing from this API is a method for aggregating related values together and storing them in some sort of container external from the container they eventually want to be "merged" into. The first iteration of the Value API had generics in order to keep the concept of aggregated data, but this was switched in favor of the easier-to-use "DataProjection" format.

We define a HealthData projection in order to aggregate these two values:

public class HealthData extends DataProjection {

  public final BoundValue<Integer> health = bind(Values.HEALTH);
  public final BoundValue<Integer> maxHealth = bind(Values.MAX_HEALTH);
 
}

The DataProjection class's name comes from the notion of projection in SQL, which essentially selects certain columns from tables of data. In the same manner, a data projection selects certain values from containers of data.

The signature and implementations of the health and maxHealth fields are not very important to the purpose of this gist. The point of projections is to be able to manipulate aggregations of data in a manner like so:

Entity myEntity;

// check if all the values in HealthData are supported
if (myEntity.supports(HealthData.class)) {
  // If so, perform the restriction to health data.
  HealthData healthD = myEntity.restrictTo(HealthData.class);
  // Set the health to the maximum it can possibly be
  healthD.health.set(healthD.maxHealth.get());
}

With these types and contracts, we now have powerful-enough tools to express more complicated setups like data restrictions.

Values and Data Merging

Now that we have the primitives to express data merging as a more complex manner, let's do so.

We wish to handle merges between two different sets of data (now expressed as our container-like objects of values) in a generic fashion with some sort of interface. Note that these merges are not symmetric; all merges are inbalanced and happen from one data container attempting to merge its data with the other. In this way merges always have an origin container and a destination container. It is completely possible to allow symmetric merges within the API, but that is out of scope of this gist.

The first realization is that a data merge is simply trying to resolve conflicts between two sets of values from two containers. Following this realization, the interface for generically handling a data merge practically writes itself:

public interface DataMergeStrategy {
  
  boolean mergeValue(Value value, MyContainerOfData origin, MyContainerOfData dest);

}

In this way, a DataMergeStrategy acts on each value that should be set from the origin and destination containers. It is called for each value in either the origin and destination and mutates the destination accordingly. The merge strategy returns a boolean. This is true when it has actually mutated the destination and false when it hasn't.

As a sanity check let's implement each of the merge priorities expressed by DataPriority, to make sure that this interface can really do what we say it does:

public class RetainDestination implements DataMergeStrategy {
  
  boolean mergeValue(Value value, MyContainerOfData origin, MyContainerOfData dest) { 
    if (dest.supports(value) && !dest.isSet(value) && origin.isSet(value)) {
      dest.set(value, origin.get(value));
      return true;
    }
    return false;
  }

}

public class RetainOrigin implements DataMergeStrategy {
  
  boolean mergeValue(Value value, MyContainerOfData origin, MyContainerOfData dest) {
    if (dest.supports(value) && origin.isSet(value)) {
      dest.set(value, origin.get(value));
      return true;
    }
    return false;
  }

}

public class PrependList implements DataMergeStrategy {
  
  boolean mergeValue(Value value, MyContainerOfData origin, MyContainerOfData dest) {
    // only perform this for lists of data
    if (List.class.isAssignableFrom(value.getValueClass())) {
      if (dest.supports(value) && origin.isSet(value)) {
        ((List) dest.get(value)).addAll(0, (List) origin.get(value));
        return true;
      }
    }
    return false;
  }

}

public class AppendList implements DataMergeStrategy {
  
  boolean mergeValue(Value value, MyContainerOfData origin, MyContainerOfData dest) {
    // only perform this for lists of data
    if (List.class.isAssignableFrom(value.getValueClass())) {
      if (dest.supports(value) && origin.isSet(value)) {
        ((List) dest.get(value)).addAll((List) origin.get(value));
        return true;
      }
    }
    return false;
  }

}

Now that we know DataMergeStrategy has the power to actually merge data, we can use it in our aggregations. We add a copy method to our container interface:

public interface MyContainerOfData {
  // ...
  DataTransactionResult copy(MyContainerOfData that, DataMergeStrategy mergeStrategy);
  // ...
}

The DataTransactionResult is a concept that is already in the API for dealing with transactions of large amounts of data.

Conclusion

At the end of it we can see that the Value API has the power to handle existing things, like merges and priority, in the Data API. Not only can it do that, but it can handle them in a more generic manner allowing even more flexibility than is already provided in the existing Data API.

Please note that this document describes the procedure by which data merging could appear in a Value API. Names, interfaces, and method contracts are subject to change but the basic idea of how these merges work will stay the same. The actual semantics for how values operate are currently rather well-defined, but some more complicated features like optional values and list values are not. However, as the Value API matures and becomes ready for a possible merge these and other potential problems will be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment