Time is one of those things we all take for granted. Time marches on and does a dozen other things described by pithy sayings on T-shirts and motivational posters. When it comes to software, however, time is often our worst enemy. In this blog post, I talk about some patterns for dealing with the passage of time in event sourced applications.
Before I get into the core of this post, I want to distinguish between the two kinds of time that we run into when working with event sourced applications:
- Audit Time
- Modeled Time
The first is the easiest to deal with. In the case of audit time, the only thing we're concerned with is the timestamp when events occur. Our business logic doesn't use that information, and we only use it after the fact for queries, compliance, and other things that exist outside the business domain. To get audit time "right", all we need to do is ensure that there are proper timestamps on the events (hopefully in UTC).
So what do we do if our application needs to model the passage of time? How do we handle this kind of thing when we know we can't violate any of these rules of event sourcing:
- Events are immutable
- No event sourced component can ever interact with the "wall clock"
- Event replays must be deterministic
This means that our aggregates, projectors, and process managers are all expressly forbidden from accessing any source of "real" time. If real time is accessed by an event sourcing component, that becomes a side effect, and will guarantee that we get different results from successive replays.
Do We Really Need Time?
Since we know dealing with time in event sourced applications is hard, the first thing we should ask ourselves is whether we really need it. In some cases, our business model only needs to know the relative order of things. We need to know that one event occurred before another event because the order in which the events are processed matters.
Modeling Time Order
Predictable ordering is something that we should be getting from our event sourced framework. I don't need to write special code for this in my aggregate, it just knows that if it gets a withdrawal command that leaves the account in overdrawn status, it emits the
AccountOverdrawn event. The code we write shouldn't ever have to check high watermark values or enforce global ordering--that's what tools like NATS and Kafka are for. Most importantly, the code we write shouldn't know how events are being stored, streamed, or ordered.
If I don't want to immediately issue the overdrawn event, then I could wait for something like an
EndOfDayClosed event and check the balance then, giving the customer until the end of the day to put their balance back above zero.
EndOfDayClosed event is an example of an event that signals the passage of time.
The passage of time is marked by events.
This rule is just as important as the others listed at the beginning of the post. Let's take the previous bank example. In that scenario, the bank needs to perform an end of day clearing, well, at the end of the day. Since none of the event sourced components are allowed to read the wall clock, something else needs to issue the
PerformEndOfDayClearing command at the appropriate time, which in turn emits the
Note that these events aren't things like
ItBecameFiveOClock. Explicitly referring to times like this in your events is a dead giveaway that you're going to have model problems in the future. Our business model just needs to know that the clearing operation happened. Nothing inside the model cares whether the event showed up at
17:30. If we replay the event stream, we'll get the exact same result no matter when we run the replay.
Performing Work over Time
Let's say you're building a simulation (or a game). In this simulation, you need to move an entity from point A to point B with a given velocity. This means that its position needs to change over time. How can you accomplish this without checking against real time?
First, ask yourself what event you really need. Do you care that time passed, or do you care that the entity's position changed? Hopefully it's the latter. In a case like this, you'd likely have a
system (in ECS terminology) running outside your event sourced environment. Other terms for a component like this might be an
injector or a
gateway. This system, which could be something like
physics, would then be responsible for emitting
PositionChanged events into your event stream.
This is replay safe because no matter when you run the replay, the immutable fact that an entity's position changed cannot be disputed, and it'll always result in the same state.
If you model what happened as a result of the passage of time rather than the passage of time itself, then you're setting yourself up for a far easier time managing and maintaining that model.
Dilating and Scaling Time
As mentioned earlier, sometimes we don't actually care about specific times, we only care about chronological order. One special case of modeling chronological order is when we take advantage of time scales. Let's say we we're still working on the simulation from before. But now I want to be able to replay the simulation on demand, with playback (VCR) style controls. I want to be able to speed up and slow down the playback without altering the event stream or looking at a wall clock.
In this pattern, instead of an entity's position changing at
03:01 UTC and again at
03:03 UTC, the entity's position would change at time index (or slice, or more commonly,
tick) 1 and time index 3.
If the entire simulation ran in a total of 700 ticks, then we can tell that the relative amount of time it took for that entity to change position was 3/700ths of the max time. With this knowledge in hand, we can play back the simulation in "real" time by using whatever speed is the 1:1 scale. If we want to play it back in slow-mo, we can change the scale to 1:10. If we want to play it back as fast as the computer can possibly run it so we can get to the end result, then we can run the playback with no time scale.
Note that we're not modeling an event like
TickOccurred. Instead, we're putting a time/tick index on the domain model portion of the event.
In this pattern as well, we have an external source responsible for emitting events into the stream with relative time slice data.
Elapsed Time and Timeouts
The next type of scenario that comes up a lot when people are trying to design event sourced systems is processes that can time out. Let's say that you're working on an app that facilitates trades of some kind. The exchange might be initiated with
TradeInitiated, and the counter party then agrees to the trade with
TradeAccepted (I'm simplifying the handshake, usually there are more steps). In this scenario, if the other party doesn't show up in a
TradeAccepted event within 3 hours, a
TradeAborted event shows up with a reason of
A process manager can then terminate the process when it sees
TradeAborted. But remember, a process manager can't read the wall clock, nor can it start a background timer, so how do we know when to abort the trade?
You might've guessed that you can use an external process that can inject that event. If your domain model and problem is fairly complicated and you're doing something like anti-cheat detection or looking for abnormal trading patterns in a stock application, then you might want to use a Complex Event Processing (CEP) tool.
For simpler domains, it's probably enough just to have a monitor process running. It could wait for a
TradeInitiated event and then start its own decay timer. Or, maybe the process just checks for all expired trades every minute. The implementation is up to you, so long as the core event sourcing rules aren't violated. This means that if a trade timed out live in real time, then it needs to remain timed out during a replay (so you might turn off the timers while regenerating aggregates and projections).
I've already covered the idea of recording what happened as a result of the passage of time rather than the passage of time itself. But there are systems where things change as the result of time hundreds or even thousands of times per second.
In realtime systems like this, you probably don't want to spam your event logs with these events. If you can sample some subset of them, then you can definitely keep the load down. If you can perform some higher-level operation on observed events (CEP or your own monitor process), then you can write the higher-level events into the log and keep the high frequency messages ephemeral.
Dealing with time in any application is difficult. Dealing with time in an environment that expressly forbids you from accessing a clock is exponentially more so. But, if we keep in mind a couple of the core rules of event sourcing and some common sense advice, we can still create some amazing applications that work with time and not against it.
If you're really curious about this topic and want to learn more about it in depth, then stay tuned for updates on the release of my forthcoming book on Event Sourcing.
I'll emit a
BookPublished event when you can grab it 😃.