Ziobrando's Lair: About Entities, Aggregates and Data Duplication.

Tuesday, June 15, 2010

About Entities, Aggregates and Data Duplication.

There's been an interesting discussion about Aggregates on the Italian DDD mailing lists. When things become complex, a simple example might just turn too simple. So I came up with this medium-sized one. Hope it won't be too long. Ok, so let's start from our first User Story

User Story #1: Placing an order
As a Customer
I Want to place an order
In order to purchase some goods

The simplest implementation of the story is essentially stateless: every time a customer wants to order something, needs to re-enter the data. In the DDD perspective, the resulting model is based on a single aggregate (we're deliberately ignoring the catalog for now) whose root is the Order class.

The stateless nature of the service makes it really easy to implement: a Customer is just a Value Object, created and eventually dropped at needs. Also we took some shortcut: we've chosen to implement Address as a String. Item is sort of natural value object, while LineItem is somewhat in the middle: we can change quantities, while the order is in open state, but we can implement this also using droppable Value Objects, easing the integrity burden for the aggregate root.

Some businesses (like buying train tickets) might just work like this, but our marketing is more inclined to manage customers in a more long-term way, so here are two more user stories.

User Story #2: Returning customer
As a Customer
I Want to retrieve my profile
In order to place more orders

Story #2 breaks our assumption about the aggregate boundaries. If we stick to the aggregate rule-of-thumb, if we want to delete an Order, we probably don't want to delete also the corresponding Customer. So we need a separate aggregate for that. What would happen if we decide to delete a Customer? Should we delete all orders? We don't have enough information to answer that, yet, we'll mark it as an outstanding question for our next meeting with the domain expert. Let's try the model with two aggregates and see how does it perform.

We now have a relationship crossing the aggregate boundary. We have promoted Customer to become the root of the newly created aggregate, so there is no potential integrity violation. Still we have to watch it closely, because this is where problems related to lazy/eager loading will arise. We also added username and password to Customer.

User Story #3: Different shipping address
As a Customer
I Want to specify a valid shipping address
In order to ship to a different destination

Multiple addresses are a call for a separate type to manage Address. We don't have so many responsibilities so far, for this class, except validation (which as Udi Dahan would say, doesn't necessarily belong to the domain layer) but the smell of duplication is probably enough to go for a separate class. We try to keep the model as simple as possible, so we treat Address like a value object.

DDD order story 3.jpg

User Story #4: Editable customer profile
As a Customer
I Want to edit my profile
In order to update it if needed

Story #4 makes explicit what we've been suspecting: Customer needs to be an entity, because it has a nontrivial lifecycle. No revolutions at Domain Model level, but this triggers a question: "What happens if I have an outstanding order and the customer changes its data before the order is dispatched?" This is the type of questions you don't want to answer as software developer. So we walk up the stairs to have a talk with the Domain Expert. We come back with two fresh user stories:

User Story #5: Specify Billing and Shipping address
As a Customer
I Want to specify independent billing and shipping addresses
In order to deliver goods to different locations

This one is relatively easy: just reinforcing our design. We'll now have two references from Order to Address, which are managed by the Customer. We just store a default Address in the Customer aggregate (we could do better) but's ok for now.

DDD order story 5.jpg

Story #6 is a little trickier, it comes from the legal department and tell us what we probably expected.

User Story #6: Track past orders
As a Legal Department
I Want to track orders
In order to in order to manage litigations

The domain expert state it clear: once an order is placed, it can't be changed in any of its parts, be it the content or the Customer. In case of litigation, it must behave exactly like printed paper. But Customer does not, its lifecycle is different from our needs, we'd need a separate class for that. We're lacking fantasy and call it CustomerData.

DDD order story 6.jpg

Everything is looking a lot different from the beginning. The two aggregates are now largely decoupled: we can change or delete an order without affecting the customer or deleting or unregistering a customer without losing tracks of its past orders. On the other hand we have explicit duplication here. Customer and CustomerData look so similar we're feeling guilty. Did we violate DRY principle? At first look the data is the same, but if we think about behavior, or class lifecycle, Customer and CustomerData are clearly two different beasts. But more often than not, when the starting point is the data model, instead of the domain model we end up thinking that's the same data, hence the same class. I bet experienced data modelers do not fall into these pitfalls as well, but I've seen these problems recurring quite often.

Once we accept that little bit of data duplication we have a system which is a lot easier to evolve and maintain, with aggregate roots as integrity enforcers within their boundaries, and agnostic about the rest.

To add some salt, don't forget aggregates are also building blocks of distributed systems: suppose we'll need to send orders to a remote system. Sending just the single aggregate and the referenced value objects is probably the cleanest way.

11 comments:

fabio boldrini said...: On chrome the image are cutted by the right column. Seem like the central column is too short, or the images must downfit; 10:36 AM
Unknown said...: Hi Fabio.

You're right. Thanks for the feedback :-(, same behavior on Firefox. I used a different editor to write this post and discovered this nice feature with larger images.

The workaround while I try to fix this one is to open the images in a different page. Sorry; 11:02 AM
Unknown said...: Should be ok right now.; 11:16 AM
Unknown said...: The example is very clear (we can always bank on this) so first of all thanks for taking the time to condense everything that was written in the list in such an organic way.

That said, let me play the devil's advocate (partly for discussion's sake, partly for a small provocation, but don't take these as negative remarks):

1. I found it curious that all your class diagrams contain properties but not methods... would you please elaborate on this? Or are you just trying to trick us into the common modeling pitfall? ;-)

2. originally part of the discussion, they seem to have vanished into the blue: how do repositories fit in all this?

...I'm not bad, I'm just drawn that way :-); 12:50 PM
Unknown said...: Hi Andrea,

I basically tried to dig into the problem of "A Customer seem to belong to 2 aggregates", not the whole discussion.

1) You're right on the no-methods remark. I probably tried to make it too simple, or gave some thing for granted. But I was also trying to make the point that "there's more to behavior than methods", I intentionally used the word lifecycle instead: behavior is even more powerful in showing up that concepts are different, but the need for two classes arose even without considering it.

2)Quick answer. Repositories were part of the original discussion, but not of the problem I wanted to show here (it ended up being longer than I expected). But Repositories fit in quite easily... just one per aggregate.; 1:02 PM
Anonymous said...: Thank you for a great blog post!

When handling the duplication between the aggregates. Is there any good practice to keep them both in sync?; 6:27 PM
Unknown said...: The short answer is "Use Domain Events", for a longer answer ... I am working on the next post. :-); 11:23 PM
Anonymous said...: On the last diagram there is no relationship between Order and Customer - is this an oversight?; 3:27 PM
Unknown said...: No, that relation is quite relaxed. If we limit the exploration to only the stories included in the example, that's it!

In a more complex scenario, you'll still might want to retrieve the Customer from the Order. You could store the Customer id in CustomerData, for example. It really depends on your specific business case. Reeference are not forbidden, but in general you can't expect the Customer to be consistent with what's inside the Order aggregate.

... but beware: this decoupling is hard to achieve and easy to lose.; 6:30 PM
Anonymous said...: Great post, one of the best article I have ever read, what mostly I like about it, your style of exposing the concept and the way you promote and describe your idea

Really thanks; 2:40 PM
Unknown said...: Wow... Thanks Mohamed! :-); 3:03 PM